April 2020

# Surrounded by Elligators: Implementing Crypto With Nothing to Compare to

When I started working on Monocypher three and a half years ago, I thought it would be easy. All I wanted to do was provide the cryptographic software needed to communicate securely. Authenticated encryption, key exchange, hashing, signatures, done.

How *naive*.

I quickly discovered that secure communications are about much more
than just hiding the contents. Who you’re talking with, how much, at
what time, *the fact you’re using encryption at all*… often
matter as much as the contents themselves. For instance, how
confidential would a patient/doctor relationship be if we knew who was
contacting whom, when, and how often?

One particular goal worth having is making the entire encrypted
communication indistinguishable from random. This makes it easier to *deny*, and easier to *hide*. This can have a number of
applications:

- Is this random looking disk sector really random, or does it contain encrypted data?
- Is this audio file just noisy, or does the noise hide encrypted data?
- Is this internet user really uploading lots of cute cat pictures, or do the pictures actually are a clever HTTP tunnel to a VPN beyond our Great Firewall?

You get the idea. Fully random data can hinder mass surveillance,
helps get past censorship, can hide the very *existence* of
encryption, and overall just plain reduces metadata leaks. One brilliant
example is the Padded Uniform Random Blob, which minimises
metadata leaks by making sure the data looks fully random if you don’t
have the key. They hide the contents of the message, its exact size, the
number of recipients, even the crypto system used for the
encryption.

There’s just one problem: **public keys don’t look
random**.

On the face of it, well *duh*. Public keys are like
identifiers, *of course* they will be recognised when you send
them across the network. But that can be worked around. Modern protocols use *ephemeral
keys* (temporary random key pairs) to achieve all kinds of security
properties (forward
secrecy, identity hiding, Key Compromise Impersonation resistance…);
and to encrypt your long term keys, which will then really look
random.

Still, **ephemeral keys don’t look random**.

Depending on the cryptographic system used, public keys have various recognisable patterns, that can easily be spotted with the right tools (deep packet inspection, forensic analysis…). For instance, the X25519 public keys used in Monocypher uses satisfy equations that random numbers verify only some of the time. Looking at a a number of those keys would quickly tell whoever is spying on us not only that encryption is going on, but that we’re using X25519 keys. Easy mitigations can reduce those patterns, but they can’t completely hide them.

Elligator can.

Clearly, Elligator is an important tool that could help Monocypher’s
goal: enabling secure communications. I had to add it. But it’s also a
fairly specialised tool, most useful in situations where people not only
need to hide information, they need to *hide the fact they’re hiding
it*. Think dissidents in an oppressive regime, possibly under
targeted surveillance. People who potentially rely on cryptography not
to get dragged away and summarily executed.

The stakes are *high*, so I had to do it right.

## First contact

When I first learned about Elligator, I sought the reference implementation so I could get a feel of what was going on. There were none, though. Even SUPERCOP limited itself to a Curve448goldilocks instantiation, there was nothing for Curve25519. Oh well, at least there’s no harm in looking at the paper for now.

My first read through was brutal. I hardly understood anything, and I had to set it aside for a while. Later, I was referred to Adam Langley’s blog post, which proved much more readable, but still not obvious to me. Then, after days of banging my head around that problem, it finally clicked, and I could read the damn paper.

Once I got over the first mental block, the paper proved quite readable. The problem was that it presents the mappings in the form of theorems and proofs, instead of just saying “here’s the formula”, and stating the theorems separately. That, and an intimidating use of Greek letters.

Elligator comes in two versions, Elligator 1 and Elligator 2. The one you want is Elligator 2, which works on all Montgomery curves over a prime field, including Curve25519 (extention fields aren’t supported). For the rest of this article, I’ll use the parameters of Curve25519 as an example. First, we need to define a few constants and functions:

```
p = 2^255 - 19 -- curve constant
A = 486662 -- curve constant
B = 1 -- curve constant
n = 2 -- non-square
chi(x) = x^((p-1)/2)
abs(x) = if x < (p-1)/2 then x else -x
sqrt(-1) = 2^((p-1)/4)
sqrt(x) = if x^((p+3)/4) == x
then abs(x^((p+3)/8) )
else abs(x^((p+3)/8) × sqrt(-1))
```

The forward map takes a random number `r`

(in the prime
field), and gives you a curve point `(u, v)`

:

```
w = -A / (1 + n × r²)
e = chi(w³ + A×w² + B×w)
u = e×w - (1-e)×(A/2)
v = -e × sqrt(u³ + A×u² + B×u)
```

The *reverse* map takes a point `(u, v)`

, and
yields a random looking representative `r`

. Note the
divisions and the square root: because of them, the reverse mapping only
works when `u`

!= `A`

and
`-n × u × (u+A)`

is a square modulo p.

```
if v ≤ (p-1) / 2
then r = sqrt(-u / (n×(u+A)))
else r = sqrt(-(u+A) / (n×u ))
```

Dense, but surprisingly simple. Once I finally got familiar with all this, I’ve learned a few neat things:

The only possible values for

`chi(x)`

are 0, 1, and -1, and the result tells us whether`x`

has a square root modulo p.The non-square

`n`

can be any number that doesn’t have a square root in GF(p). No number, multiplied by itself, will equal n (modulo p). The paper suggests 2, as does the Hash to Curve RFC draft, but any non-square works. The Ristretto group for instance chose`sqrt(-1)`

instead. I personally went the way of the paper and RFC.

Once I had the maths down, I started to write my own “reference”
implementation (first in SAGE, then I translated it in
regular Python3). The goal was to have a working prototype I could later
translate to C. *(Note: when I was working on optimising EdDSA 18
months ago, I worked directly in C, and it was brutal. In hindsight, I
was being stupid: prototyping makes these things much easier.)*

I’m pretty proud of this Python script. I even encourage prospective implementers to use it as a reference. I don’t know of anything better anyway.

## Looking for references

I had at some point what I thought was a fairly reliable Python
prototype. But if I was to be serious about correctness and security, I
really needed to seek out references to compare to. Implementations,
test vectors… I searched. I asked.
And I gradually realised that *I was on my own*.

- The original authors didn’t provide a reference implementation. They do have verification scripts, but those are just about checking the validity of their theorems. They’re nothing like a reference implementation, and cannot be used as such.
- The Hash to curve RFC draft only describes the forward map, from a
representative to a a point on the curve. But even there, it was using a
256-bit encoding hat differed from the Elligator paper.
*(Note: I eventually managed to salvage some test vectors from the RFC, but most required pre-processing.)* - Libsodium seems to be doing the same as the RFC (forward map only, 256 bit encoding). It also maps to Edwards25519, while I was aiming to stick with Curve25519 (in Montgomery space).
- Adam Langley’s implementation was flawed, and he could no longer maintain it.
- Libelligator was based on the above, suffered the same flaw, and I couldn’t even file an issue about it (the repository is now read only).
- The Ristretto Group uses a different non-square constant than the
RFC and the Elligator paper do (turns out that
`sqrt(-1)`

makes optimised code a bit more elegant, and a tiny bit more efficient). Plus, they map to the Ristretto group of Curve25519, not the raw curve itself.

There I was, implementing my own crypto, with nothing to compare to.
No reference implementation, no test vectors, nothing. For use cases
that are best justified when the stakes are frighteningly high. With the
searing memory that things have once gone very wrong, and the
firm resolution that it must *never*. *Happen*.
*Again*.

Did I mention I had to do it right?

*Edit, June 2021: Actually, there was something: Kleshni’s
implementation. I wasn’t aware of it at the time, and learned of it
only later, when Nadim Kobeissi told me about it as part of Monocypher’s
audit.*

## Fending for myself

I wasn’t *completely* alone. Andrew Moon for instance volunteered
to share a very neat performance trick based on the inverse square
root that helped me simplify not only Elligator, but EdDSA point
decompression as well. It also confirmed I was on the right track.
Still, I alone was ultimately responsible for any mistake.

Fortunately, working on Monocypher for over 3 years taught me a few
tricks. I’m a much better tester than I used to be, and I now know that
mathematical understanding isn’t enough, I need *confidence*. Any
seed of doubt must be sought out, investigated, and solved.

First, I used *Property based testing*. The idea is to check
whether the thing you’re testing behaves as expected. In this case, I
could verify that the opposite of a representative mapped to the same
point as the original one, or that the reverse map always gave me
numbers below `(p-1)/2`

.

Second, I started to *make and compare multiple
implementations*. Comparing independent implementations is
important, and it is still helpful even if you write all of them. So I
implemented scalar multiplication in two different ways (in Montgomery
space and in Edwards space), and I maintained two implementations of the
mappings: the simple one from the paper, and a faster one, whose code is
very different.

Third, I *played with other primitives*. With X25519 of course
(the whole point of sending public keys is to perform key exchange after
all), but other experiments are possible. In particular, the forward
mapping of Elligator is a critical building block for Password Authenticated Key Exchange. PAKE itself
often relies on Oblivious Pseudo-Random Functions, which can be build
upon *scalar inversion*. It works this way:

- Let
`P`

be some random point on the curve. - Let
`r`

be some random 256-bit number. - Let
`R`

be the scalar multiplication of`P`

and`r`

. - Let
`Q`

be the scalar multiplication of`R`

and`1/r`

(modulo the order of the curve). `P`

and`Q`

should be equal.

Reality however disagreed: When `P`

was a regular X25519
public key, generated as the product of the base point and some clamped,
random scalar, the above always worked. When `P`

was actually
a point generated by the Elligator forward map, the above
*sometimes* worked.

That’s my seed of doubt right there: why the heck wasn’t scalar inversion working every time?

## Dodging a bullet

What I was missing soon became clear. For security reasons, X25519
does not work on the whole curve. It works on the *prime-order
subgroup*. About an eighth of the curve. Elligator however maps
points on the *whole* curve. Worse, points on the prime-order
subgroup are trivially distinguishable from the others.

I couldn’t just map X25519 public keys. That would leak 3 bits of
evidence, even with Elligator. I needed to generate ephemeral keys over
the whole curve, and I needed a clean way to perform key exchange with
them. *Somehow.* I had some ideas about how to make it work, but
at that point, it became clear I was messing with maths I didn’t fully
understand.

That wouldn’t do, so I asked for help. This time, Elligator co-author Mike Hamburg himself kindly came to my rescue. In parallel, I was developing a keener understanding of Curve25519’s group structure than I ever thought I’d have. I finally became comfortable with the maths I was playing with, and designed the final solution:

- Generate a regular X25519 key pair.
- Add a low-order point to the public key.
- Randomly negate the result.
- Map that to an Elligator representative. If we cannot, go back to 1.

Steps 1 to 3 make sure we cover the whole curve (Mike Hamburg taught me how this works), thus solving our problem. For key exchange, we don’t have to do anything: this is compatible with X25519 out of the box. Reason being, X25519 is designed in such a way that it effectively ignores the low order component.

The implementation details however are a bit finicky. Arbitrary addition for Curve25519 in Montgomery space is feasible but it’s not implemented in Monocypher (or most X25519 libraries for that matter). There’s also the need to make sure the random selection covers all 8 low-order points.

I had the maths down cold, but I still couldn’t fully trust myself with the naive approach. Again, I wrote two very different implementations, verified they behaved the same, and made sure the keys they generated were equivalent to regular X25519 public keys.

The first implementation performs the scalar multiplication in Edwards space, adds the low order point there, then converts to Montgomery. This is the more complicated approach, but also the fastest, thanks to the pre-computed combs used in fixed base scalar multiplication. The speed is especially interesting for ephemeral keys, since (i) we generate new ones all the time, and (ii) we need twice as many because the reverse map of Elligator fails half the time on average.

The second implementation performs the scalar multiplication in Montgomery space, just like X25519 public key generation. Except this time, we use a point of order 8×ℓ instead of the usual base point of order ℓ. Properly massaging the scalar makes sure the result is equivalent to adding the low order point at the end.

## Conclusion

This Elligator business was more subtle than I expected. The lack of a reference implementation from the original authors didn’t help (I nevertheless deeply thank the one that so kindly helped me). I also sympathise with the authors of the flawed first attempts, for I almost made the same mistakes.

Still, it’s been *seven years* now. Seven years, and
Monocypher, a side project made by an amateur with no formal training,
is the most credible implementation of Elligator 2 over raw Curve25519 I
know of *Edit, June 2021: Again, I didn’t know about Kleshni’s
implementation yet.* Likewise, My own Python
scripts are the closest thing to reference implementation I know
of.

This feels a bit awkward to be honest.

*Discuss on /r/programming,
/r/crypto,
Lobsters,
or Hacker
News.*