@ Loup's Impossible? Like that would stop me.

April 2020

Cofactor Explained: Clearing Elliptic Curves’ dirty little secret

Much of public key cryptography uses the notion of prime-order groups. We first relied on the difficulty of the Discrete Logarithm Problem. Problem was, Index Calculus makes DLP less difficult than it first seems. So we used longer and longer keys – up to 4096 bits (512 bytes) in 2020 – to keep up with increasingly efficient attacks.

Elliptic curves solved that. A well chosen, safe curve can only be broken by brute force. In practice, elliptic curve keys can be as small as 32 bytes.

On the other hand, elliptic curves were not exactly fast, and the maths involved many edge cases and subtle death traps. Most of those problems were addressed by Edwards curves, which have a complete addition law with no edge cases, and Montgomery curves, with a simple and fast scalar multiplication method.

Those last curves however did introduced a tiny little problem: their order is not prime.

(Before we dive in, be advised: this is a dense article. Don’t hesitate to take the time you need to digest what you’ve just read and develop an intuitive understanding. Prior experience with elliptic curve scalar multiplication helps too.)

Prime-order groups primer

First things first: what’s so great about prime-order groups? What’s a group anyway? What does “order” even mean?

A group is the combination of a set of elements “G”, and an operation “+”. The operation follows what we call the group laws.

Basically what you’d expect from good old addition.

The order of a group is simply the number of elements in that group.

To give you an example, let’s take G = [0..15], and define + as binary exclusive or. All laws above can be checked. It’s order is 16 (there are 16 elements). Note some weird properties. For instance, each element is its own inverse (a xor a is zero).

More interesting are cyclic groups, which have a generator: an element that repeatedly added to itself can walk through the entire group (and back to itself, so it can repeat the cycle all over again).

Cyclic groups are all isomorphic to the group of non-negative integers modulo the same order. Let’s take for instance [0..9], with addition modulo 10. The number 1 is a generator of the group:

1 = 1
2 = 1+1
3 = 1+1+1
4 = 1+1+1+1
5 = 1+1+1+1+1
6 = 1+1+1+1+1+1
7 = 1+1+1+1+1+1+1
8 = 1+1+1+1+1+1+1+1
9 = 1+1+1+1+1+1+1+1+1
0 = 1+1+1+1+1+1+1+1+1+1
1 = 1+1+1+1+1+1+1+1+1+1+1  (next cycle starts)
2 = 1+1+1+1+1+1+1+1+1+1+1+1
etc.

Not all numbers are generators of the entire group. 5 for instance can generate only 2 elements: 5, and itself.

5 = 5
0 = 5+5
5 = 5+5+5 (next cycle starts)
0 = 5+5+5+5
etc.

Note: we also use the word “order” to speak of how many elements are generated by a given element. In the group [0..9], 5 “has order 2”, because it can generate 2 elements. 1, 3, 7, and 9 have order 10. 2, 4, 6, and 8 have order 5. 0 has order 1.

Finally, prime-order groups are groups with a prime number of elements. They are all cyclic. What’s great about them is their uniform structure: every element (except zero) can generate the whole group. Take for instance the group [0..10] (which has order 11). Every element except 0 is a generator:

(Note: from now on, I will use the notation A.4 to denote A+A+A+A. This is called “scalar multiplication” (in this example, the group element is A and the scalar is 4). Since addition is associative, various tricks can speed up this scalar multiplication. I use a dot instead of “×” so we don’t confuse it with ordinary multiplication, and to remind that the group element on the left of the dot is not necessarily a number.)

1.1  =  1    2.1  =  2    3.1  =  3    4.1  =  4
1.2  =  2    2.2  =  4    3.2  =  6    4.2  =  8
1.3  =  3    2.3  =  6    3.3  =  9    4.3  =  1
1.4  =  4    2.4  =  8    3.4  =  1    4.4  =  5
1.5  =  5    2.5  = 10    3.5  =  4    4.5  =  9
1.6  =  6    2.6  =  1    3.6  =  7    4.6  =  2   etc.
1.7  =  7    2.7  =  3    3.7  = 10    4.7  =  6
1.8  =  8    2.8  =  5    3.8  =  2    4.8  = 10
1.9  =  9    2.9  =  7    3.9  =  5    4.9  =  3
1.10 = 10    2.10 =  9    3.10 =  8    4.10 =  7
1.11 =  0    2.11 =  0    3.11 =  0    4.11 =  0

You get the idea.

In practice, we can’t distinguish group elements from each other: apart from zero, they all have the same properties. That’s why discrete logarithm is so difficult: there is no structure to latch on to, so an attacker would mostly have to resort to brute force.

(Okay, I lied. Natural numbers do have some structure to latch on to, which is why RSA keys need to be so damn huge. Elliptic curves – besides treacherous exceptions – don’t have a known exploitable structure.)

The cofactor

So what we want is a group of order P, where P is a big honking prime. Unfortunately, the simplest and most efficient elliptic curves out there – those that can be expressed in Montgomery and Edwards form – don’t give us that. Instead, they have order P×H, where P is suitably large, and H is a small number (often 4 or 8): the cofactor.

Let’s illustrate this with the cyclic group [0..43], of order 44. How much structure can we latch on to?

 0  1  2  3
 4  5  6  7
 8  9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
36 37 38 39
40 41 42 43

Since 44 is not prime, not all elements will have order 44. For instance:

You get the idea. When we go over all numbers, we notice that the order of each element is not arbitrary. It is either 1, 2, 4, 11, 22, or 44. Note that 44 = 11 × 2 × 2. We can see were the various orders come from: 1, 2, 2×2, 11, 11×2, 11×2×2.

The order of an element is easy to test: just multiply by the order to test, see if it yields zero. For instance (remember A.4 is a short hand for A+A+A+A, and we’re working modulo 44 – the order of the group):

 8 . 11 =  0  -- prime order
24 . 11 =  0  -- prime order
25 . 11 = 33  -- not prime order
11 .  4 =  0  -- low order
33 .  4 =  0  -- low order
25 .  4 = 12  -- not low order

(“Low order” means orders below 11: 1, 2, 4. Not much lower than 11, but if we replace 11 by a large prime P, the term “low order” makes more sense.)

Understandably, there are only few elements of low order: 0, 11, 22, and 33. 0 has order 1, 22 has order 2, 11 and 33 have order 4. Like the column on the left, they form a proper subgroup. It’s easier to see by swapping and rotating the columns a bit:

 0 11 22 33
 4 15 26 37
 8 19 30 41
12 23 34  1
16 27 38  5
20 31 42  9
24 35  2 13
28 39  6 17
32 43 10 21
36  3 14 25
40  7 18 29

The low order subgroup is shown on the first line. And now we can finally see the structure of this group: a narrow rectangle, with 11 lines and 4 columns, where each element in this rectangle is the sum of an element of prime order, and an element of low order. For instance:

30 =  8 + 22 = 4.2 + 11.2
35 = 24 + 11 = 4.6 + 11.1
 1 = 12 + 33 = 4.3 + 11.3

You just have to look left and up to know which elements to sum. To do this algebraically, you need to multiply by the right scalar. You need to clear the (co)factor.

Let’s first look left. How do we clear the cofactor? We start by an element that can be expressed thus:

E = 4.a + 11.b

What we want to find is the scalar s such that

E.s = 4.a

Note that

E.s = (4.a + 11.b).s
E.s = 4.a.s + 11.b.s
E.s = 4.(a×s) + 11.(b×s)
E.s = 4.(a×1) + 11.(b×0)

Recall that 4 has order 11, and 11 has order 4. So s must follow these equations:

s = 1 mod 11  -- preserve a
s = 0 mod  4  -- absorb b

There’s only one such scalar between 0 and 44: 12. So, multiplying by 12 clears the cofactor, and preserves the prime factor. For instance:

13.12 = 24
27.12 = 16
42.12 = 20

Now we know how to look left. To look up, we follow the same reasoning, except this time, our scalar s must follow these equations:

s = 0 mod 11  -- absorb a
s = 1 mod  4  -- preserve b

That’s 33. For instance:

13.33 = 33
27.33 = 11
42.33 = 22

Now we can look up as well.

(Note: This “looking up” and “looking left” terminology isn’t established mathematical terminology, but rather a hopefully helpful illustration. Do not ask established mathematicians or cryptographers about “looking up” and “looking left” without expecting them to be at least somewhat confused.)

Torsion safe representatives

We now have an easy way to project elements in the prime-order subgroup. Just look left by multiplying the element by the appropriate scalar (in the case of our [0..43] group, that’s 12). That lets us treat each line of the rectangle as an equivalence class, with one canonical representative: the leftmost element – the one that is on the prime-order subgroup. This effectively gets us what we want: a prime-order group.

Let’s say we have a scalar s and an element E, which are not guaranteed to be on the prime-order subgroup. We want to know in which line their scalar multiplication will land, and we want to represent that line by its leftmost element. To do so, we just need to perform the scalar multiplication, then look left. For instance:

s      =  7  -- some random scalar
E      = 31  -- some random point
E.s    = 41
result = (E.s).12 = 8

Or we could first project E to the left, then perform the scalar multiplication. It will stay on the left column, and give us the same result:

E      = 31
E . 12 = 20
result = (E.12).s = 8

The problem with this approach is that it is slow. We’re performing two scalar multiplications instead of just one. That kind of defeats the purpose of choosing a fast curve to begin with. We need something better.

Let us look at our result one more time:

result = (E.s).12 = 8
result = (E.12).s = 8

It would seem the order in which we do the scalar multiplications does not matter. Indeed, the associativity of group addition means we can rely on the following:

(E.s).t = E.(s×t)
        = E.(t×s)
        = (E.t).s

Now you can see that we can avoid performing two scalar multiplications, and multiply the two scalars instead. To go back to our example:

s      =  7       -- our random scalar
E      = 31       -- our random point
result = E.(7×12) -- scalarmult and look left
result = E.(84)
result = E.(40)   -- because 84 % 44 = 40
result = 8

Remember we are working with a cyclic group of order 44: adding an element to itself 84 times is like adding it to itself 44 times (the result is zero), and again 40 times. So better reduce the scalar modulo the group order so we can have a cheaper scalar multiplication.

Let’s recap:

s        =  7  -- our random scalar
E        = 31  -- our random point
E.s      = 41
E.(s×12) =  8

41 and 8 are on the same line, and 8 is in the prime-order subgroup. Multiplying by s×12 instead of s preserved the main factor, and cleared the cofactor. Because of this, we call s×12 the torsion safe representative of s.

Now computing (s×12) % 44 may be simple, but it doesn’t look very cheap. Thankfully, we’re not out of performance tricks:

(s×12) % 44 = (s×(11+1)) % 44
            = (s×(11+1)) % 44
            = (s + (s   × 11)) % 44
            =  s + (s   × 11) % 44  -- we assume s < 11
            =  s + (s%4 × 11)       -- we can remove %44

We only need to add s and and a multiple of 11 (0, 11, 2×11, or 3×11). The result is guaranteed to be under the group order (44). The total cost is:

Compared to the scalar multiplication, that’s practically free.

Decoupling main factor and cofactor

We just found the torsion safe representative of a scalar, that after scalar multiplication preserves the line, and only chooses the left column. In some cases, we may want to reverse roles: preserve the column, and and only chose the top line. In other words, we want to do the equivalent of performing the scalar multiplication, then looking up.

The reasoning is the same as for torsion safe representatives, only flipped on its head. Instead of multiplying our scalar by 12 (which is a multiple of the low order, equals 1 modulo the prime order), we want to multiply it by 33: a multiple of the prime order, which equals 1 modulo the low order.

Again, modular multiplication by 33 is not cheap, so we repeat our performance trick:

(s×33) % 44 = (s×11×3) % (11×4)
            = ((s×3) % 4) × 11

In this case, we just have to compute (s×3) % 4 and multiply the prime order by that small number. The total cost is just the multiplication of the order by a very small number.

Now we can do the decoupling:

s      =  7  -- some random scalar
E      = 31  -- some random point
E.s    = 41  -- normal result

E.(s×12) =  8
E.(s×33) = 33
E.s      = 8 + 33 = 41 -- decoupled result

Cofactor and discrete logarithm

Time to start applying our knowledge.

Let’s say I have elements A and B, and a scalar s such that B = A.s. The discrete logarithm problem is about finding s when you already know A and B.

If we’re working in a prime-order group with a sufficiently big prime, that problem is intractable. But we’re not working with a prime-order group. We have a cofactor to deal with. Let’s go back to our group of order 44:

 0 11 22 33
 4 15 26 37
 8 19 30 41
12 23 34  1
16 27 38  5
20 31 42  9
24 35  2 13
28 39  6 17
32 43 10 21
36  3 14 25
40  7 18 29

We can easily recover the line and the column of A and B, so let’s do that. Take for instance A = 27 and B = 15. Now let’s find s.

A = 27
A = 16 + 11
A = 4.4 + 11.1

B = 15
B = 4 + 11
B = 4.1 + 11.1

B = A . s
4.1 + 11.1 = (4.4 + 11.1) . s
4.1 + 11.1 =  (4.4).s + (11.1).s

4 .1 = (4 .4).s
11.1 = (11.1).s

Now look at that last line. Since 11 has only order 4, there are only 4 possible solutions, which are easy to brute force. We can try them all and easily see that:

11 . 1  = (11.1).1  (mod 44)
s mod 4 = 1

The other equation however is more of a problem. There are 11 possible solutions, and trying them all is more expensive:

4 . 1    ≠ (4.4).1
4 . 1    ≠ (4.4).2
4 . 1    = (4.4).3
s mod 11 = 3

Now that we know both moduli, we can deduce that s = 25:

A . s = B
27.25 = 15

What we just did here is reducing the discrete logarithm into two, easier discrete logarithms. Such divide and conquer is the reason why we want prime-order groups, where the difficulty of the discrete logarithm is the square root of the prime order (square root because there are cleverer brute force methods than just trying all possibilities). Here however, the difficulty wasn’t the square root of 44. It was sqrt(11) + sqrt(2) + sqrt(2), which is significantly lower.

The elliptic curves we are interested in however have much bigger orders. Curve25519 for instance has order 8 × L where L is 2²⁵² + something (and the cofactor is 8). So the difficulty of solving discrete logarithm for Curve25519 is sqrt(2)×3 + sqrt(2²⁵²), or approximately 2¹²⁶. Still lower than sqrt(8×L) (about 2¹²⁷), but not low enough to be worrying: discrete logarithm is still intractable.

Cofactor and X25519

Elliptic curves can have a small cofactor, and still guarantee we can’t solve discrete logarithm. There’s still a problem however: the attacker can still solve the easy half of discrete logarithm, and deduce the value of s, modulo the cofactor. In the case of Curve25519, that means 3 bits of information, that could be read, or even controlled by the attacker.

That’s not ideal, so DJB rolled two tricks up his sleeve:

  1. The chosen base point of the curve has order L, not 8×L. Multiplying that point by our secret scalar, can then only generate points on the prime-order subgroup (the leftmost column). The three bits of information the attacker could have had are just absorbed by that base point.

  2. The scalar is clamped. Bit 255 is cleared and bit 254 is set to prevent timing leaks in poor implementations, and put a lower bound on standard attacks. More importantly, bits 0, 1, and 2 are cleared, to make sure the scalar is a multiple of 8. This guarantees that the low order component of the point, if any, will be absorbed by the scalar multiplication, such that the resulting point will be on the prime-order subgroup.

The second trick is especially important: it guarantees that the scalar multiplication between your secret key and an attacker controlled point on the curve can only yield two kinds of results:

(I’m ignoring what happens if the point you’re multiplying is not on the curve. Failing to account for that has broken systems in the past. X25519 however only transmits the x-coordinate of the point, so the worst you can have is a point on the “twist”. Since the twist of Curve25519 also has a big prime order (2²⁵³ minus something) and a small cofactor (4), the results will be similar, and the attacker will learn nothing. Curve25519 is thus “twist secure”.)

Cofactor and Elligator

Elligator was developed to help with censorship circumvention. It encodes points on the curve so that if the point is selected at random, its encoding will be indistinguishable from random. With that tool, the entire cryptographic communication is fully random, including initial ephemeral keys. This enables the construction of fully random formats, and facilitates steganography.

Communication protocols often require ephemeral keys to initiate communications. In practice, they need to generate a random key pair, and send the public half over the network. Public keys however will not necessarily look random, even though the private half was indeed random. Curve25519 for instance have a number of biases:

So, our problem is to generate a key pair, where the public half is a random point on the whole curve, not just the prime-order subgroup. Let’s again illustrate it with the [0..43] group:

 0 11 22 33
 4 15 26 37
 8 19 30 41
12 23 34  1
16 27 38  5
20 31 42  9
24 35  2 13
28 39  6 17
32 43 10 21
36  3 14 25
40  7 18 29

Recall that the prime-order subgroup is the column on the left, and the low order subgroup is the first line. X25519 is designed in such a way that:

For us here, this means our generator is 4, and our private key is a multiple of 4. You can check that multiplying 4 by a multiple of 4 will always yield a multiple of… 4. An element on the left column. (Remember, we’re working modulo 44).

But that’s no good. I want to select a random element on the whole rectangle, not just the left column. If we recall our cofactor lessons, the solution is simple: add a random low order element. The random prime-order element selects the line, and the random low order element selects the column. Adding them together gives us a random element over the whole group.

To add icing on the cake, this method is compatible X25519. Let’s take an example. Let’s first have a regular key exchange:

B   = 4                  -- Generator of the prime-order group
sa  = 20                 -- Alice's private key
sb  = 28                 -- Bob's   private key
SA  = B.sa  = 4.20  = 36 -- Alice's public key
SB  = B.sb  = 4.28  = 24 -- Bob's   public key
ssa = SB.sa = 24.20 = 40 -- Shared secret (computed by Alice)
ssb = SA.sb = 36.28 = 40 -- Shared secret (computed by Bob)

As expected of Diffie–Hellman, Alice and Bob compute the same shared secret. Now, what happens if Alice adds a low order element to properly hide her key? Let’s say she adds 11 (an element of order 4).

LO  = 11                 -- random low order point
HA  = SA+LO = 36+11 =  3 -- Alice's "hidden" key
ssb = HA.sb =  3.28 = 40

Bob still computes the same shared secret! Which by now should not be a big surprise: scalars that are a multiple of the cofactor absorb the low order component, effectively projecting the result back in the prime order subgroup.

Applying this to Curve25519 is straightforward:

There’s a little complication, though. X25519 works on Montgomery curves, which are optimised for an x-coordinate only ladder. That ladder takes advantage of differential addition. Adding a low order point requires arbitrary addition, whose code is neither trivial nor available.

We can work around that problem by starting from Edwards25519 instead:

The main advantage here is speed: Edwards25519 scalar multiplication by the base point often takes advantage of pre-computed tables, making it much faster than the Montgomery ladder in practice. (Note: pre-computed tables don’t really apply to key exchange, which is why X22519 uses the Montgomery ladder instead.)

This has a price however: we now depend on EdDSA code, which is not ideal if we don’t compute signatures as well. Moreover, some libraries, like TweetNaCl avoid pre-computed tables to simplify the code. This makes Edwards scalar multiplication slower than the Montgomery ladder.

Alternatively, there is a way to stay in Montgomery space: change the base point. Let’s try it with the [0..43] group. Instead of using 4 as the base point, we’ll add 11, a point whose order is the same as the cofactor (4). Our “dirty” base point is 15 (4+11). Now let’s multiply that by Alice’s public key:

O  = 11                -- low order point (order 4)
B  = 4                 -- base point      (prime order)
D  = B+LO = 11+4 = 15  -- "dirty" base point
sa = 20                -- Alice's private key
SA = B.sa = 4.20  = 36 -- Alice's public key
HA = D.sa = 15.20 = 36 -- ???

Okay we have a problem: even with the dirty base point, we get the same result. That’s because Alice’s private key is still a multiple of the cofactor, and absorbs the low order component. But we don’t want to absorb it, we want to use it, to select a column at random. Here’s the trick:

Note the parallel with EdDSA: we were adding points, now we add scalars. But the result is the same:

d  = 33               -- random multiple of the prime order
da = sa+d = 20+33 = 9 -- Alice's "dirty" secret key
HA = D.da = 15. 9 = 3 -- Alice's hidden key

Note that we can ensure both methods yield the same results by properly decoupling the main factor and the cofactor.

Now we can apply the method to Curve25519:

That way we no longer need EdDSA code.

(Note: you can look at actual code in the implementation of the crypto_x25519_dirty_*() functions in my Monocypher cryptographic library.)

Cofactor and scalar inversion

Scalar inversion is useful for exponential blinding to implement Oblivious Pseudo-Random Functions (OPRF). (It will make sense soon, promise).

Let’s say Alice wants to connect to Bob’s server, with her password. To maximise security, they use the latest and greatest in authentication technology (augmented PAKE). One important difference between that and your run of the mill password based authentication, is that Alice doesn’t want to transmit her password. And Bob certainly doesn’t want to transmit the salt to anyone but Alice, that would be the moral equivalent of publishing his password database.

Yet we must end up somehow with Alice knowing something she can use as the salt: a blind salt, which must be a function of the password and the salt only:

Blind_salt = f(password, salt)

One way to do this is with exponential blinding. We start by having Alice compute a random point on the curve, which is a function of the password:

P = Hash_to_point(Hash(password))

That will act as a kind of secret, hidden base point. The Hash_to_point() function can use the Elligator mappings. Note that even though P is the base point multiplied by some scalar, we cannot recover the value of that scalar (if we could, it would mean Elligator mappings could be used to solve discrete logarithm). Now Alice computes an ephemeral key pair with that base point:

r = random()
R = P . r

She sends R to Bob. Note that as far as Bob (or any attacker) is concerned, that point R is completely random, and has no correlation with the password, or its hash. The difficulty of discrete logarithm prevents them to recover P from R alone. Now Bob uses R to transmit the blind salt:

S = R . salt

He sends S back to Alice, who then computes the blind salt:

Blind_salt = S . (1/r)  -- assuming r × (1/r) = 1

Let’s go over the whole protocol:

P          = Hash_to_point(Hash(password))
r          = random()
R          = P . r
S          = R . salt
Blind_salt = S . (1/r)
Blind_salt = R . salt . (1/r)
Blind_salt = P .  r . salt . (1/r)
Blind_salt = P . (r × salt × (1/r))
Blind_salt = P .      salt
Blind_salt = Hash_to_point(Hash(password)) . salt

And voila, our blind salt depends solely on the password and salt. You need to know the password to compute it from the salt, and if Mallory tries to connect to Bob, guessing the wrong password will give her a totally different, unexploitable blind salt. Offline dictionary attack is not possible without having hacked the database first.

Now this is all very well, if we are working on a prime-order subgroup. Scalars do have an inverse modulo the order of the curve, hash to point will give us what we want… except nope, our group does not have prime order. We need to deal with the cofactor, somehow.

The first problem with the cofactor comes from Hash_to_point(). When Elligator decodes a representative into a point on the curve, that point is not guaranteed to belong to the prime-order subgroup. There’s the potential to leak up to 3 bits of information about the password (the cofactor of Curve25519 is 8). Fortunately, the point P is not transmitted over the network. Only R is. And this gives us the opportunity to clear the cofactor:

R = P . r

If the random scalar r is a multiple of 8, then R will be guaranteed to be on the prime-order subgroup, and we won’t leak anything. X25519 has us covered: after clamping, r is a multiple of 8.

This guarantee however goes out the window as soon as R is transmitted across the network: Mallory could instead send a bogus key that is not on the prime-order subgroup. (They could also send a point on the twist, but Curve25519 is twist secure, so let’s ignore that.) Again though, X25519 takes care of this:

S = R . salt

Just clamp salt, and we again have a multiple of 8, and S will be guaranteed to be on the prime-order subgroup.

Of course, Alice might receive some malicious S instead, so she can’t assume it’s the correct one. And this time, X25519 does not have us covered:

Blind_salt = S . (1/r)

See, X25519 clamping has a problem: while it clears the cofactor all right, it does not preserve the main factor. Which means clamping neither survives nor preserves algebraic transformations.

The solution is torsion safe representatives:

c = clamp(r)
i = 1/c    -- modulo L
s = i × t  -- t%L == 1  and  t%8 == 0

Where L is the prime order of the curve. For Curve25519, t = 3×L + 1. The performance trick explained in the torsion safe representatives section apply as expected.

(Note: you can look at actual code in the implementation of the crypto_x25519_inverse() function in Monocypher.)

Conclusion

Phew, done. I confess didn’t think I’d need such a long post. But you get the idea: properly dealing with a cofactor these days is delicate. It’s doable, but the cleaner solution these days is to use the Ristretto Group: you get modern curves and a prime order group.

(Discuss on Hacker News, Reddit, Lobsters)