Surrounded by Elligators: Implementing Crypto With Nothing to Compare to
When I started working on Monocypher three and a half years ago, I thought it would be easy. All I wanted to do was provide the cryptographic software needed to communicate securely. Authenticated encryption, key exchange, hashing, signatures, done.
I quickly discovered that secure communications are about much more than just hiding the contents. Who you're talking with, how much, at what time, the fact you're using encryption at all… often matter as much as the contents themselves. For instance, how confidential would a patient/doctor relationship be if we knew who was contacting whom, when, and how often?
- Is this random looking disk sector really random, or does it contain encrypted data?
- Is this audio file just noisy, or does the noise hide encrypted data?
- Is this internet user really uploading lots of cute cat pictures, or do the pictures actually are a clever HTTP tunnel to a VPN beyond our Great Firewall?
You get the idea. Fully random data can hinder mass surveillance, helps get past censorship, can hide the very existence of encryption, and overall just plain reduces metadata leaks. One brilliant example is the Padded Uniform Random Blob, which minimises metadata leaks by making sure the data looks fully random if you don't have the key. They hide the contents of the message, its exact size, the number of recipients, even the crypto system used for the encryption.
There's just one problem: public keys don't look random.
On the face of it, well duh. Public keys are like identifiers, of course they will be recognised when you send them across the network. But that can be worked around. Modern protocols use ephemeral keys (temporary random key pairs) to achieve all kinds of security properties (forward secrecy, identity hiding, Key Compromise Impersonation resistance…); and to encrypt your long term keys, which will then really look random.
Still, ephemeral keys don't look random.
Depending on the cryptographic system used, public keys have various recognisable patterns, that can easily be spotted with the right tools (deep packet inspection, forensic analysis…). For instance, the X25519 public keys used in Monocypher uses satisfy equations that random numbers verify only some of the time. Looking at a a number of those keys would quickly tell whoever is spying on us not only that encryption is going on, but that we're using X25519 keys. Easy mitigations can reduce those patterns, but they can't completely hide them.
Clearly, Elligator is an important tool that could help Monocypher's goal: enabling secure communications. I had to add it. But it's also a fairly specialised tool, most useful in situations where people not only need to hide information, they need to hide the fact they're hiding it. Think dissidents in an oppressive regime, possibly under targeted surveillance. People who potentially rely on cryptography not to get dragged away and summarily executed.
The stakes are high, so I had to do it right.
When I first learned about Elligator, I sought the reference implementation so I could get a feel of what was going on. There were none, though. Even SUPERCOP limited itself to a Curve448goldilocks instantiation, there was nothing for Curve25519. Oh well, at least there's no harm in looking at the paper for now.
My first read through was brutal. I hardly understood anything, and I had to set it aside for a while. Later, I was referred to Adam Langley's blog post, which proved much more readable, but still not obvious to me. Then, after days of banging my head around that problem, it finally clicked, and I could read the damn paper.
Once I got over the first mental block, the paper proved quite readable. The problem was that it presents the mappings in the form of theorems and proofs, instead of just saying "here's the formula", and stating the theorems separately. That, and an intimidating use of Greek letters.
Elligator comes in two versions, Elligator 1 and Elligator 2. The one you want is Elligator 2, which works on all Montgomery curves over a prime field, including Curve25519 (extention fields aren't supported). For the rest of this article, I'll use the parameters of Curve25519 as an example. First, we need to define a few constants and functions:
p = 2^255 - 19 -- curve constant A = 486662 -- curve constant B = 1 -- curve constant n = 2 -- non-square chi(x) = x^((p-1)/2) abs(x) = if x < (p-1)/2 then x else -x sqrt(-1) = 2^((p-1)/4) sqrt(x) = if x^((p+3)/4) == x then abs(x^((p+3)/8) ) else abs(x^((p+3)/8) × sqrt(-1))
The forward map takes a random number
r (in the prime field), and
gives you a curve point
w = -A / (1 + n × r²) e = chi(w³ + A×w² + B×w) u = e×w - (1-e)×(A/2) v = -e × sqrt(u³ + A×u² + B×u)
The reverse map takes a point
(u, v), and yields a random looking
r. Note the divisions and the square root: because of
them, the reverse mapping only works when
-n × u ×
(u+A) is a square modulo p.
if v ≤ (p-1) / 2 then r = sqrt(-u / (n×(u+A))) else r = sqrt(-(u+A) / (n×u ))
Dense, but surprisingly simple. Once I finally got familiar with all this, I've learned a few neat things:
The only possible values for
chi(x)are 0, 1, and -1, and the result tells us whether
xhas a square root modulo p.
ncan be any number that doesn't have a square root in GF(p). No number, multiplied by itself, will equal n (modulo p). The paper suggests 2, as does the Hash to Curve RFC draft, but any non-square works. The Ristretto group for instance chose
sqrt(-1)instead. I personally went the way of the paper and RFC.
Once I had the maths down, I started to write my own "reference" implementation (first in SAGE, then I translated it in regular Python3). The goal was to have a working prototype I could later translate to C. (Note: when I was working on optimising EdDSA 18 months ago, I worked directly in C, and it was brutal. In hindsight, I was being stupid: prototyping makes these things much easier.)
I'm pretty proud of this Python script. I even encourage prospective implementers to use it as a reference. I don't know of anything better anyway.
Looking for references
I had at some point what I thought was a fairly reliable Python prototype. But if I was to be serious about correctness and security, I really needed to seek out references to compare to. Implementations, test vectors… I searched. I asked. And I gradually realised that I was on my own.
- The original authors didn't provide a reference implementation. They do have verification scripts, but those are just about checking the validity of their theorems. They're nothing like a reference implementation, and cannot be used as such.
- The Hash to curve RFC draft only describes the forward map, from a representative to a a point on the curve. But even there, it was using a 256-bit encoding hat differed from the Elligator paper. (Note: I eventually managed to salvage some test vectors from the RFC, but most required pre-processing.)
- Libsodium seems to be doing the same as the RFC (forward map only, 256 bit encoding). It also maps to Edwards25519, while I was aiming to stick with Curve25519 (in Montgomery space).
- Adam Langley's implementation was flawed, and he could no longer maintain it.
- Libelligator was based on the above, suffered the same flaw, and I couldn't even file an issue about it (the repository is now read only).
- The Ristretto Group uses a different non-square constant than the RFC
and the Elligator paper do (turns out that
sqrt(-1)makes optimised code a bit more elegant, and a tiny bit more efficient). Plus, they map to the Ristretto group of Curve25519, not the raw curve itself.
There I was, implementing my own crypto, with nothing to compare to. No reference implementation, no test vectors, nothing. For use cases that are best justified when the stakes are frighteningly high. With the searing memory that things have once gone very wrong, and the firm resolution that it must never. Happen. Again.
Did I mention I had to do it right?
Edit, June 2021: Actually, there was something: Kleshni's implementation. I wasn't aware of it at the time, and learned of it only later, when Nadim Kobeissi told me about it as part of Monocypher's audit.
Fending for myself
I wasn't completely alone. Andrew Moon for instance volunteered to share a very neat performance trick based on the inverse square root that helped me simplify not only Elligator, but EdDSA point decompression as well. It also confirmed I was on the right track. Still, I alone was ultimately responsible for any mistake.
Fortunately, working on Monocypher for over 3 years taught me a few tricks. I'm a much better tester than I used to be, and I now know that mathematical understanding isn't enough, I need confidence. Any seed of doubt must be sought out, investigated, and solved.
First, I used Property based testing. The idea is to check whether
the thing you're testing behaves as expected. In this case, I could
verify that the opposite of a representative mapped to the same point as
the original one, or that the reverse map always gave me numbers below
Second, I started to make and compare multiple implementations. Comparing independent implementations is important, and it is still helpful even if you write all of them. So I implemented scalar multiplication in two different ways (in Montgomery space and in Edwards space), and I maintained two implementations of the mappings: the simple one from the paper, and a faster one, whose code is very different.
Third, I played with other primitives. With X25519 of course (the whole point of sending public keys is to perform key exchange after all), but other experiments are possible. In particular, the forward mapping of Elligator is a critical building block for Password Authenticated Key Exchange. PAKE itself often relies on Oblivious Pseudo-Random Functions, which can be build upon scalar inversion. It works this way:
Pbe some random point on the curve.
rbe some random 256-bit number.
Rbe the scalar multiplication of
Qbe the scalar multiplication of
1/r(modulo the order of the curve).
Qshould be equal.
Reality however disagreed: When
P was a regular X25519 public key,
generated as the product of the base point and some clamped, random
scalar, the above always worked. When
P was actually a point
generated by the Elligator forward map, the above sometimes worked.
That's my seed of doubt right there: why the heck wasn't scalar inversion working every time?
Dodging a bullet
What I was missing soon became clear. For security reasons, X25519 does not work on the whole curve. It works on the prime-order subgroup. About an eighth of the curve. Elligator however maps points on the whole curve. Worse, points on the prime-order subgroup are trivially distinguishable from the others.
I couldn't just map X25519 public keys. That would leak 3 bits of evidence, even with Elligator. I needed to generate ephemeral keys over the whole curve, and I needed a clean way to perform key exchange with them. Somehow. I had some ideas about how to make it work, but at that point, it became clear I was messing with maths I didn't fully understand.
That wouldn't do, so I asked for help. This time, Elligator co-author Mike Hamburg himself kindly came to my rescue. In parallel, I was developing a keener understanding of Curve25519's group structure than I ever thought I'd have. I finally became comfortable with the maths I was playing with, and designed the final solution:
- Generate a regular X25519 key pair.
- Add a low-order point to the public key.
- Randomly negate the result.
- Map that to an Elligator representative. If we cannot, go back to 1.
Steps 1 to 3 make sure we cover the whole curve (Mike Hamburg taught me how this works), thus solving our problem. For key exchange, we don't have to do anything: this is compatible with X25519 out of the box. Reason being, X25519 is designed in such a way that it effectively ignores the low order component.
The implementation details however are a bit finicky. Arbitrary addition for Curve25519 in Montgomery space is feasible but it's not implemented in Monocypher (or most X25519 libraries for that matter). There's also the need to make sure the random selection covers all 8 low-order points.
I had the maths down cold, but I still couldn't fully trust myself with the naive approach. Again, I wrote two very different implementations, verified they behaved the same, and made sure the keys they generated were equivalent to regular X25519 public keys.
The first implementation performs the scalar multiplication in Edwards space, adds the low order point there, then converts to Montgomery. This is the more complicated approach, but also the fastest, thanks to the pre-computed combs used in fixed base scalar multiplication. The speed is especially interesting for ephemeral keys, since (i) we generate new ones all the time, and (ii) we need twice as many because the reverse map of Elligator fails half the time on average.
The second implementation performs the scalar multiplication in Montgomery space, just like X25519 public key generation. Except this time, we use a point of order 8×ℓ instead of the usual base point of order ℓ. Properly massaging the scalar makes sure the result is equivalent to adding the low order point at the end.
This Elligator business was more subtle than I expected. The lack of a reference implementation from the original authors didn't help (I nevertheless deeply thank the one that so kindly helped me). I also sympathise with the authors of the flawed first attempts, for I almost made the same mistakes.
Still, it's been seven years now. Seven years, and Monocypher, a side project made by an amateur with no formal training, is the most credible implementation of Elligator 2 over raw Curve25519 I know of Edit, June 2021: Again, I didn't know about Kleshni's implementation yet. Likewise, My own Python scripts are the closest thing to reference implementation I know of.
This feels a bit awkward to be honest.