# XCKDF: X25519 Chacha20 Key Derivation Function

We describe here a way to derive keys from X25519 shared secrets, using nothing but the Chacha20 stream cipher (and its related hash HChacha20).

This is a cascading scheme, intended to be used with up to 4 levels (we could go beyond, but the protocols I intend to use it for don't go beyond 4 key exchanges).

```
DH1 DH2 DH3 DH4
┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
0╶┤ H │ 0╶┤ H │ 0╶┤ H │ 0╶┤ H │
└─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘
│ ┌────╴+ ┌────╴+ ┌────╴+
┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐
p╶┤ H │ │ p╶┤ H │ │ p╶┤ H │ │ p╶┤ H │
└─┬─┘ │ └─┬─┘ │ └─┬─┘ │ └─┬─┘
┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐ │ ┌─┴─┐
1╶┤ C ├─┘ 1╶┤ C ├─┘ 1╶┤ C ├─┘ 1╶┤ C │
└┬┬┬┘ └┬┬┬┘ └┬┬┬┘ └┬┬┬┘
││└╴AK1 ││└╴AK2 ││└╴AK3 ││└╴AK4
│└╴EK1 │└╴EK2 │└╴EK3 │└╴EK4
└╴PK1 └╴PK2 └╴PK3 └╴PK4
```

**DH1**, **DH2**, **DH3**, and **DH4**, are X25519 shared secrets. "+"
means XOR, and **p** is an arbitrary constant (typically the protocol
name, used for domain separation) The boxes mean the following:

```
key
┌─┴─┐
input╶┤ H │ out = HChacha20(key, input)
└─┬─┘
out
key
┌─┴─┐ out1 = Chacha20(key, nonce)[ 0: 31]
nonce╶┤ C ├╴out1 out2 = Chacha20(key, nonce)[32: 63]
└┬┬┬┘ out3 = Chacha20(key, nonce)[64: 95]
││└╴out2 out4 = Chacha20(key, nonce)[96:127]
│└╴out3
└╴out4
```

Keys span 32 bytes. The HChacha20 input spans 16 bytes. All outputs span 32 bytes. Chacha20 technically outputs a stream, we just divide the first 128 bytes of that stream among the four outputs.

## Security Goals

If

**DH1**is secure, then:**AK1**,**AK2**,**AK3**, and**AK4**are independent random strings.**EK1**,**EK2**,**EK3**, and**EK4**are independent random strings.**PK1**,**PK2**,**PK3**, and**PK4**are independent random strings.

If

**DH2**is secure, then:**AK2**,**AK3**, and**AK4**are independent random strings.**EK2**,**EK3**, and**EK4**are independent random strings.**PK2**,**PK3**, and**PK4**are independent random strings.

If

**DH3**is secure, then:**AK3**and**AK4**are independent random strings.**EK3**and**EK4**are independent random strings.**PK3**and**PK4**are independent random strings.

If

**DH4**is secure, then:**AK4**is an independent random string.**EK4**is an independent random string.**PK4**is an independent random string.

Being a "random string" means being indistinguishable from uniform random string in a feasible amount of computation. Being "secure" for a DH shared secret means the private keys involved in the key exchange are independent random strings, unknown to anyone but their rightful owner.

## Analysis of a single stage

A single stage of the cascade looks like this:

```
DH (1) hash_s = HChacha20(DH, 0)
┌─┴─┐ (2) x = CK_prev XOR hash_s
0╶┤ H │ (3) hash_x = HChacha20(x, p)
└─┬─┘ (4) CK_next = Chacha20(hash_x, 1)[ 0: 31]
CK_prev╶─╴+ (5) AK = Chacha20(hash_x, 1)[32: 63]
┌─┴─┐ (6) EK = Chacha20(hash_x, 1)[64: 95]
p╶┤ H │ (7) PK = Chacha20(hash_x, 1)[96:127]
└─┬─┘
┌─┴─┐
1╶┤ C ├╴CK_next
└┬┬┬┘
││└╴AK
│└╴EK
└╴PK
```

The security goals of a single step are:

If

**CK_prev**is an independent uniform random string, then**CK_next**,**AK**,**EK**, and**PK**are independent, uniform random strings.

If

**DH**is secure, then**CK_next**,**AK**,**EK**, and**PK**are independent, uniform random strings.

### Assumptions

**Assumption 1:**Chacha20 is a PRF: given a random string and nonce, it outputs an independent random stream.*(This is the security goal of Chacha20. It holds as long as Chacha20 is secure.)***Assumption 2:**HChacha20 is a PRF: given a random string and an input, it outputs an independent random string.*(This is the security goal of HChacha20. There's a security reduction proving that it holds as long as Chacha20 is secure.)***Assumption 3:**The HChacha20 of a secure X25519 shared secret and input zero is random.*(This comes from a security conjecture made by Daniel Bernstein in his Curve25519 paper, and applied in NaCl and Libsodium. The only difference is that he used HSalsa20, and we use HChacha20.)***Theorem 4:**the XOR of two independent inputs, at least one of which is random, is random.*(We consider this theorem "obvious", and won't prove it here.)*

### Preliminary lemmas

**Lemma 5:**If**DH**is secure, then**hash_s**is random.*(From equation (1) and assumption 3.)***Lemma 6:**If**CK_prev**and**hash_s**are independent, and at least one of them is random, then**x**is random.*(From equation (2) and theorem 4)***Lemma 7:**If**x**is random, then**hash_x**is an independent random string. Note that**hash_x**also depends on**p**. Changing**p**would produce a*different*independent random string.*(From equation (3) and assumption 2)***Lemma 8:**If**hash_x**is random, then**CK_next**,**AK**,**EK**, and**PK**are independent random strings.*(From equations (4) to (7), and assumption 1.)*

### Induction lemmas

**Lemma 9:**If both**CK_prev**and**DH**are known, then**CK_next**is known*(Obvious, from equations 1 to 7.)***Lemma 10:**If**CK_prev**is known and**DH**is secure, then**CK_next**,**AK**,**EK**, and**PK**are independent random strings.*(From lemma 5 to 8.)***Lemma 11:**If**CK_prev**is an independent random string, then**CK_next**,**AK**,**EK**, and**PK**are independent random strings.*(From lemma 5 to 8.)*

## Analysis of the whole chain

Notes: **CK_prev(0)** is zero. **CK_next(n) = CK_prev(n+1)** for all
**n > 0**.

**Step zero:**If**DH(i)**is known, for**0 < i ≤ n**, then**CK_prev(n+1)**is known.**Initialisation step:**If**CK_prev(n)**is known and**DH(n)**is secure, then**AK(n)**,**EK(n)**,**PK(n)**, and**CK_prev(n+1)**are independent random strings.*(From lemma 10.)***Induction step:**If**CK_prev(n)**is an independent random string, then**AK(n)**,**EK(n)**,**PK(n)**, and**CK_prev(n+1)**are independent random strings.*(From lemma 11.)***Conclusion:**- If
**DH(1)**is secure, then**AK(n)**,**EK(n)**, and**PK(n)**are independent random strings, for all**n > 0**. - If
**DH(1)**is known, let**i**be the smallest integer such that**DH(i)**is known, and**DH(i+1)**is secure. Then**AK(n)**,**EK(n)**, and**PK(n)**, for all**n > i**.

- If

QED.

## Design rationale

This key derivation is intended to be used in protocols that already rely on Chacha20 for encryption. Using it instead of an actual hash saves a primitive, allowing implementations to be simpler and smaller.

All output keys are, when secure, independent random strings. This independence is important because:

The

**AK**keys are intended to be used with Poly1305, a one time authenticator that partially reveals the key. Thus, revealing an**AK**key must not leak any other secret.The

**EK**keys are intended to be used as one time pads (a 32 byte key is enough to encrypt 32 bytes with XOR), and are thus revealed to the attacker if they happen to know the plaintext. Thus, revealing an**EK**key must not leak any other secret.The

**PK**keys are intended to be used as payload keys for reduced round trip messages, or session keys (for the last one). Keeping them independent from everything else removes all restrictions on nonce or counter usage (This is in contrast to the Noise protocol framework, which reserves the maximum value for its own use.)

The intermediate HChacha20 call with **p** has been added to provide
domain separation across protocols, even if users end up reusing keys
between protocols. This can happen if users perform an exchanges between
their long term keys, or if they (misguidedly) reuse an ephemeral key
for several protocols to save key exchanges during a protocol
negotiation. Users who don't need such domain separation could arguably
remove it, and save a tiny bit of computation in the process. I
personally think the flexibility is worth the cost.

The call to Chacha20 has a nonce of 1, to avoid collision with the
HChacha20 of the DH key exchanges, even if the attacker controls one an
knows the other. For instance, an attacker knowing the value of DH1 and
having control over **DH2** could have **DH2** be
*HChacha20(HChacha20(DH1, 0), p)*, Causing half of **CK_prev(2)** to be
correlated with **HChacha20(DH2)**. Changing the nonce basically forces
the attacker to perform a preimage attack on HChacha20. *(This is a
minor concern: the attacker can only has limited control over the the
value of a DH key exchange to begin with, and having the opportunity to
exercise that control means they know the relevant keys anyway, so they
don't need to control anything.)*

## Limitations

This scheme only mixes key exchanges, not the transcript. This has two limitations:

- An active attacker could add a low order point to a public key, without disrupting the exchange. (The public keys are malleable.)
- The authentication properties of earlier payloads do not strengthen as the exchange goes on. An unauthenticated payload for instance will forever stay unauthenticated.

Moreover, if we use the encryption keys directly as one time pads, (and the authentication keys directly with a one time authenticator), then we are limited to Noise patterns that encrypt at most one key between two key exchanges. There are ways around that, but they're a bit more complex.

All those limitations easily go away if we are willing to use a general purpose hash instead. (But then this would mean depending on another primitive, precisely what XCKDF aims to avoid.)