Keys Derivation with Chacha20

The NaCl family doesn’t have protocols for key exchange. There’s the crypto_box() API, but it doesn’t provide forward secrecy or key compromise impersonation resistance. There’s Noise, but the specs are fairly unapproachable, complex, and give too many options. So I figured I would design my own protocol framework. Something simple, tailored for my crypto library, Monocypher.

Monocypher uses at least 3 primitives to establish and use a secure channel:

X25519 for key exchanges.
Chacha20 for secrecy.
Poly1305 for authentication.

This is enough to imitate NaCl’s crypto_box(): hash the X25519 shared secret with HChacha20 (NaCl uses HSalsa20, which is essentially the same), and voila we have our session key.

Alas, this does not have satisfying security properties. No forward secrecy, no key compromise impersonation resistance… For these, we need to imitate Noise and Signal, and perform several key exchanges, using temporary key pairs. Then we need to derive actual session keys from the resulting shared secrets. Signal and Noise perform this derivation with HKDF.

Signal’s key derivation looks like this:

SK = KDF(DH1 || DH2 || DH3)

DH1  DH2  DH3  DH4
 │    └─┐┌─┘    │
 └─────┐││┌─────┘
     ┌─┴┴┴┴─┐
     │ HKDF │
     └──┬───┘
        │
        SK

Noise uses a more iterative process (I’m over simplifying):

CK1, SK1 = HKDF(DH1)
CK2, SK2 = HKDF(DH2, CK1)
CK3, SK3 = HKDF(DH3, CK2)
CK4, SK4 = HKDF(DH4, CK3)

  DH1        DH2        DH3        DH4
   │          │          │          │
┌──┴───┐   ┌──┴───┐   ┌──┴───┐   ┌──┴───┐
│ HKDF ├───┤ HKDF ├───┤ HKDF ├───┤ HKDF ├──CK4
└──┬───┘   └──┬───┘   └──┬───┘   └──┬───┘
   │          │          │          │
  SK1        SK2        SK3        SK4

HKDF is is excellent, but it is based on HMAC, which needs a hash function to run. I don’t want a hash, I want to cash in on DJB’s conjecture that HSalsa20 is enough to hash X25519 shared secrets. I want to avoid using a fourth primitive if I can help it.

Problem is, HChacha20 can only hash up to 48 bytes. And it’s not even a real hash, it’s derived from a stream cipher. There’s no compression function, so we have to construct one somehow.

Stumbling in the dark

My first attempt tried to do the compression with XOR. It failed miserably:

CK1 = HChacha20(DH1)
CK2 = HChacha20(DH2) XOR CK1
CK3 = HChacha20(DH3) XOR CK2
CK4 = HChacha20(DH4) XOR CK3

DH1    DH2    DH3    DH4
 │┌───┐ │┌───┐ │┌───┐ │┌───┐
 └┤ H │ └┤ H │ └┤ H │ └┤ H │
  └─┬─┘  └─┬─┘  └─┬─┘  └─┬─┘
    │      │      │      │
    ├─────╴+╶────╴+╶────╴+
    │      │      │      │
   CK1    CK2    CK3    CK4

(The “H” blocks mean HChacha20, and “+” means XOR)

If two key exchanges happen to be the same (either because of user error or attacker interference), they cancel each other out. We need to separate the different outputs somehow. Lucky for us, HChacha20 has 16 more bytes to spare (the nonce and counter). We could use it to our advantage and increment it:

CK1 = HChacha20(DH1, 1)
CK2 = HChacha20(DH2, 2) XOR CK1
CK3 = HChacha20(DH3, 3) XOR CK2
CK4 = HChacha20(DH4, 4) XOR CK3

DH1     DH2     DH3     DH4
 │  1    │  2    │  3    │  4
 │┌─┴─┐  │┌─┴─┐  │┌─┴─┐  │┌─┴─┐
 └┤ H │  └┤ H │  └┤ H │  └┤ H │
  └─┬─┘   └─┬─┘   └─┬─┘   └─┬─┘
    │       │       │       │
    ├──────╴+╶─────╴+╶─────╴+
    │       │       │       │
   CK1     CK2     CK3     CK4

There, no more problem. Unless…

See, DJB’s conjecture did not mention such a hack. Granted, Chacha20 is a stream cipher, and when using it with a uniform random key, the streams produced with different nonces are independent. Except X25519 shared secret are not random. They’re on the curve, which means at most 2^252 bits of entropy. There are more effective attacks than brute force, and the security goal of X25519 is only 128 bits anyway.

Note that if I could reasonably assume this independence, then X25519 shared secret could reasonably be used directly as Chacha20 keys, and even NaCl does not go that far.

So, nope.

I then tried to take inspiration from the sponge construction, and have HChacha20 absorb keys 16 bytes at a time:

CK1 = HChacha20(HChacha20(zero, DH1[0:15]), DH1[16:31])
CK2 = HChacha20(HChacha20(CK1 , DH2[0:15]), DH2[16:31])
CK3 = HChacha20(HChacha20(CK2 , DH3[0:15]), DH3[16:31])
CK4 = HChacha20(HChacha20(CK3 , DH4[0:15]), DH4[16:31])

      DH1           DH2           DH3           DH4
       │             │             │             │
 ┌─────┼─────┐ ┌─────┼─────┐ ┌─────┼─────┐ ┌─────┼─────┐
 │ LSB │ MSB │ │ LSB │ MSB │ │ LSB │ MSB │ │ LSB │ MSB │
 └──┬──┴──┬──┘ └──┬──┴──┬──┘ └──┬──┴──┬──┘ └──┬──┴──┬──┘
    │     │       │     │       │     │       │     │
  ┌─┴─┐ ┌─┴─┐   ┌─┴─┐ ┌─┴─┐   ┌─┴─┐ ┌─┴─┐   ┌─┴─┐ ┌─┴─┐
┌─┤ H ├─┤ H ├─┬─┤ H ├─┤ H ├─┬─┤ H ├─┤ H ├─┬─┤ H ├─┤ H ├─┐
│ └───┘ └───┘ │ └───┘ └───┘ │ └───┘ └───┘ │ └───┘ └───┘ │
0            CK1           CK2           CK3           CK4

This is actually very similar to DJB’s XSalsa20 cascade, where chaining key would be the key, and the Diffie-Hellman key exchanges would be the extended nonce and counter. We even have a security reduction, so we should be good, right?

Wrong: the zeroth chaining key is not random. It’s zero. DJB’s security reduction does not apply at all. We could still conjecture that CK1 is random as long as DH1 is secure, but I’ve never saw such a conjecture anywhere, and I’m not pulling conjectures out of thin air.

So, nope. Again.

Calming down and stepping back

Clearly, throwing stuff up to see what sticks doesn’t work. I needed to look at the requirements more rigorously, and make sure I understand the security properties of Chacha20 (and HChacha20).

KDF requirements

My protocols have the outer structure of Noise, with the same kind of state machine. Each time there’s a new Diffie-Hellman key exchange, we inject it into the state machine, and get new symmetric keys to work with. My protocol specifically needs an authentication key AK, and an encryption key EK, both of which will be compromised after the first use (because I use XOR and a polynomial hash). Here’s what I need:

CK1, EK1, AK1 = KDF(DH1     )
CK2, EK2, AK2 = KDF(DH2, CK0)
CK3, EK3, AK3 = KDF(DH3, CK1)
CK4, EK4, AK4 = KDF(DH4, CK2)

  DH1       DH2       DH3       DH4
   │         │         │         │
┌──┴──┐   ┌──┴──┐   ┌──┴──┐   ┌──┴──┐
│ KDF ├───┤ KDF ├───┤ KDF ├───┤ KDF ├──CK4
└┬───┬┘   └┬───┬┘   └┬───┬┘   └┬───┬┘
 │   │     │   │     │   │     │   │
AK1 EK1   AK2 EK2   AK3 EK3   AK4 EK4

This KDF needs to have the following security property: If DH(n) is secure (meaning, the involved private keys are suitably random, and unknown to the attacker), then AK(i) and EK(i), for all i such that i≥n, are random, unknown to the attacker, and independent from each other and any other keys.

Especially important is that revealing AK(i) or EK(i), for all i, doesn’t give any further information away.

Properties of X25519

This is your regular Diffie-Hellman key exchange over an elliptic curve. Each party has a key pair, whose private half is a random secret, and the public half is revealed to the world (and any potential attacker). An exchange between two key pairs computes a shared secret from the private half of one key and the public half of the other key.

Diffie-Hellman operations are conjectured to be intractably hard to reverse. One cannot feasibly learn the private key of someone simply by knowing their public key, or even shared secrets.

An attacker who performs a key exchange with you does have some control, though. They know your public key, and they can chose their private key. They could perform several key exchanges and bias the resulting shared secret, or even force the shared secret to be zero (by choosing a low order public key).

Properties of Chacha20

Chacha20 is a stream cipher. Its permutation has 3 inputs: a 32-byte key, an 8-byte nonce, and an 8-byte counter. The output is a 64-byte block. The security goal of Chacha20 is that given an independent random key, a counter, and a nonce the output block is random and independent. Most notably independent from blocks generated with the same key, but with a different counter and nonce. (Counter and nonce put together are like a salt.)

HChacha20 is almost the same as Chacha20: it has the same input key, the same counter and nonce (except those are fused into a single auxiliary “input” for simplicity). The main difference is that it outputs only 32-bytes. Those bytes can be computed from the corresponding Chacha20 output block. The main reason for HChacha20’s existence is that it slightly cheaper to compute than a full Chacha20 block. Its security properties are the same as Chacha20.

There’s an additional conjecture about HChacha20: if we give it as an input an X25519 shared secret whose private keys are secure, then the output is random. This assumption comes from the Curve25519 paper, where DJB makes uses a Salsa20 based hash, and from NaCl’s use of HSalsa20 in underneath its crypto_box() API.

I conjecture that if it works with Salsa20, it works with Chacha20. They are similar enough that this conjecture should raise no eyebrow. This is why I use HChacha20 on Monocypher’s crypto_key_exchange() API.

Note that we can not use this conjecture to deduce that hashing the same shared secret with a different counter and nonce would yield independent random outputs. If we want independent outputs, we must use different shared secrets, computed with independent private keys.

Putting it back together

The starting point is to get random numbers from secure shared secrets. Getting those intermediate “key”s is the easy part:

IK1 = HChacha20(DH1, 0)
IK2 = HChacha20(DH2, 0)
IK3 = HChacha20(DH3, 0)
IK4 = HChacha20(DH4, 0)

DH1     DH2     DH3     DH4
 │  0    │  0    │  0    │  0
 │┌─┴─┐  │┌─┴─┐  │┌─┴─┐  │┌─┴─┐
 └┤ H │  └┤ H │  └┤ H │  └┤ H │
  └─┬─┘   └─┬─┘   └─┬─┘   └─┬─┘
    │       │       │       │
   IK1     IK2     IK3     IK4

If a shared secret is secure, the corresponding intermediate key will be as good as random. If it is not, that key will be known to the attacker. We must also keep in mind that an active attacker can have some control over the value of an intermediate key in some cases.

Now from those intermediate keys, we can try and generate chaining keys. The following works pretty well:

CK1 = IK1
CK2 = IK2 XOR HChacha20(CK1, 1)
CK3 = IK3 XOR HChacha20(CK2, 1)
CK4 = IK4 XOR HChacha20(CK3, 1)

IK1     IK2     IK3     IK4
 │   1   │   1   │   1   │
 │ ┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐ │
 ├─┤ H ├╴+╶┤ H ├╴+╶┤ H ├╴+
 │ └───┘ │ └───┘ │ └───┘ │
 │       │       │       │
CK1     CK2     CK3     CK4

If IK1 is random, so is its hash. Moreover, the hash will be independent from everything else, and IK2 in particular. So we can XOR them together without fearing a bad interaction (the XOR of two independent random sources is random). Thus, CK2 is random if IK1 or IK2 is random. CK3 is random if IK1 or IK2 or IK3 is random. And so on.

Note that we changed the auxiliary input. If we used zero everywhere, this could lead to the following:

CK2 = IK2               XOR HChacha20(CK1              , 1)
CK2 = HChacha20(DH2, 0) XOR HChacha20(CK1              , 1)
CK2 = HChacha20(DH2, 0) XOR HChacha20(IK1              , 1)
CK2 = HChacha20(DH2, 0) XOR HChacha20(HChacha20(DH1, 0), 1)

Now what if the attacker has control over DH2, and know the value of DH1? Then they may attempt to force it to the following value:

DH2 = HChacha20(DH1, 0)

Which would mean:

CK2 = HChacha20(DH2,               0) XOR
      HChacha20(HChacha20(DH1, 0), 0)

CK2 = HChacha20(HChacha20(DH1, 0), 0) XOR
      HChacha20(HChacha20(DH1, 0), 0)

CK2 = BIG FAT ZERO

Oops. If we’re using a different auxiliary input however, things are much better:

CK2 = HChacha20(DH2,               0) XOR
      HChacha20(HChacha20(DH1, 0), 1)

Simple control over DH2 won’t make the two terms of this XOR equals. The attacker must find a value such that:

HChacha20(Attacker_value, 0) = HChacha20(known_value, 1)
HChacha20(Attacker_value, 0) = arbitrary_value

This basically requires a preimage attack on HChacha20, and such an attack would mean the ability to break Chacha20 itself. We’re thus safe. (Note that we weren’t that unsafe to begin with: the control over an X25519 shared secret is far from perfect. It’s just simpler and safer to work around the assumption that it is.)

Now there’s still a problem: protocol separation. I am not designing a single protocol, I’m designing a protocol framework, with at least 3 different protocols. This is not a huge problem, but sometimes, the same shared secret could be involved in two different protocols. This can happen with exchanges between two long term keys, or because some clever fool tried to save computation during a protocol negotiation, and ended up reusing an ephemeral key.

So instead of using 1 as auxiliary input, we could use a protocol specific constant, like its name. We have 16 bytes to do it, this should be enough:

p   = protocol_name
CK1 = IK1
CK2 = IK2 XOR HChacha20(CK1, p)
CK3 = IK3 XOR HChacha20(CK2, p)
CK4 = IK4 XOR HChacha20(CK3, p)

IK1     IK2     IK3     IK4
 │   p   │   p   │   p   │
 │ ┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐ │
 ├─┤ H ├╴+╶┤ H ├╴+╶┤ H ├╴+
 │ └───┘ │ └───┘ │ └───┘ │
 │       │       │       │
CK1     CK2     CK3     CK4

That’s not quite enough, though: CK1 is not affected by the protocol name. We need to tweak things a little:

p   = protocol_name
CK1 = HChacha20(IK1        , p)
CK2 = HChacha20(IK2 XOR CK1, p)
CK3 = HChacha20(IK3 XOR CK2, p)
CK4 = HChacha20(IK4 XOR CK3, p)

IK1       IK2       IK3       IK4
 │   p     │   p     │   p     │   p
 │ ┌─┴─┐   │ ┌─┴─┐   │ ┌─┴─┐   │ ┌─┴─┐
 └─┤ H ├─┬╴+╶┤ H ├─┬╴+╶┤ H ├─┬╴+╶┤ H ├─┐
   └───┘ │   └───┘ │   └───┘ │   └───┘ │
         │         │         │         │
        CK1       CK2       CK3       CK4

Now we’re all set.

The complete key derivation scheme

Chaining keys are nice, but they’re just a stepping stone towards the keys I’m actually interested in: the authentication keys and the encryption keys. For those, we only need a single Chacha20 invocation (which unlike HChacha20 outputs 64 bytes, so we’re saving a bit of computation there). Care must be taken to use a different counter or nonce than we used for the chaining key’s HChacha20 invocations, though, or the keys won’t be truly independent. So:

AK1, EK1 = Chacha20(CK1, 1)
AK2, EK2 = Chacha20(CK2, 1)
AK3, EK3 = Chacha20(CK3, 1)
AK4, EK4 = Chacha20(CK4, 1)

CK1     CK2     CK3     CK4
 │   1   │   1   │   1   │   1
 │ ┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐
 └─┤ C │ └─┤ C │ └─┤ C │ └─┤ C │
   └─┬─┘   └─┬─┘   └─┬─┘   └─┬─┘
     │       │       │       │
   ┌─┴─┐   ┌─┴─┐   ┌─┴─┐   ┌─┴─┐
  AK1  │  AK2  │  AK3  │  AK4  │
      EK1     EK2     EK3     EK4

And now the whole shebang:

p        = protocol_name
CK1      = HChacha20(HChacha20(DH1, 0)        , p)
CK2      = HChacha20(HChacha20(DH2, 0) XOR CK1, p)
CK3      = HChacha20(HChacha20(DH3, 0) XOR CK2, p)
CK4      = HChacha20(HChacha20(DH4, 0) XOR CK3, p)
AK1, EK1 = Chacha20(CK1, 1)
AK2, EK2 = Chacha20(CK2, 1)
AK3, EK3 = Chacha20(CK3, 1)
AK4, EK4 = Chacha20(CK4, 1)

DH1       DH2       DH3       DH4
 │  0      │  0      │  0      │  0
 │┌─┴─┐    │┌─┴─┐    │┌─┴─┐    │┌─┴─┐
 └┤ H │    └┤ H │    └┤ H │    └┤ H │
  └─┬─┘     └─┬─┘     └─┬─┘     └─┬─┘
    │   p     │   p     │   p     │   p
    │ ┌─┴─┐   │ ┌─┴─┐   │ ┌─┴─┐   │ ┌─┴─┐
    └─┤ H ├─┬╴+╶┤ H ├─┬╴+╶┤ H ├─┬╴+╶┤ H ├─┐
      └───┘ │   └───┘ │   └───┘ │   └───┘ │
            │   1     │   1     │   1     │   1
            │ ┌─┴─┐   │ ┌─┴─┐   │ ┌─┴─┐   │ ┌─┴─┐
            └─┤ C │   └─┤ C │   └─┤ C │   └─┤ C │
              └─┬─┘     └─┬─┘     └─┬─┘     └─┬─┘
                │         │         │         │
              ┌─┴─┐     ┌─┴─┐     ┌─┴─┐     ┌─┴─┐
             AK1  │    AK2  │    AK3  │    AK4  │
                 EK1       EK2       EK3       EK4