September 2023

Monocypher 4: The Clean Break

Since I released Monocypher in July 2017, I broke the API four times.

Version 1.1.0 in February 2018.
Version 2.0.0 barely 8 days later.
Version 3.0.0 in June 2020.
Version 4.0.0 in February 2023.

The first three breaks were fairly minimal: fixing constant time comparison in 1.1.0, adopting more standard authenticated encryption in 2.0.0, dropping unsafe streaming encryption and improving support for options in 3.0.0.

With version 4 however, I broke everything:

I changed many function names to be more consistent and memorable.
I changed the BLAKE2b API to better support common use cases.
I changed the Argon2 API to support all variants (i, d, id).
I changed the high-level signature API to make it safer.
I tweaked signature verification for future compatibility.
I completely overhauled the low-level signature API.

The enabler? Money.

Why

Over time Monocypher’s shortcomings grew more apparent. Three in particular stood out: Argon2id was missing, RFC 8439 compatibility was incomplete, and most importantly the signature API was unsafe.

That last one stung to be honest. Though it looks neat and clean at a first glance, separating the private and public half of the key lets users specify the wrong public key, which can leak the private key. EdDSA authors likely knew this, and their API avoids the mistake by concatenating the private seed and public key in a single “private key”. Unfortunately, I and many other library writers missed this, and went instead for the prettier and more error prone API.

After spending time thinking it over, it was decided that the only way to remove the foot-gun was to break compatibility. Which I was hesitant to do because doing so provides absolutely no value to those who use the signature API correctly. Then on the last day of September 2023, Nima Ghassemi Nejad from Kufatec gave me the spark I needed.

They were asking me about Ed25519ph. I said they could pay me to implement it for them, but instead advised they keep their money and just hash the message before signing the hash.

They wanted to pay me.

Okay, time to go to work then. Starting from Monocypher was easy enough. Just delete everything I don’t need, keep all Ed25519 tests, modify the API so it could accommodate Ed25519ph, implement and test Ed25519ph, done. As I predicted, it took me less than a day. Kufatec got the code and the right to use it as their own, my employer got the sweet money, and I got a day’s worth of paid vacation.

The spark

The difference between Ed25519 and Ed25519ph is tiny. They both have the exact same structure:

a || seed = SHA512(secret_key)
a         =  2^254 + (a % 2^254) - (a % 8)
r         = SHA512(domain || seed || message)    % prime_order
R         = scalarmult(r, Base_point)
h         = SHA512(domain || R || pk || message) % prime_order
s         = (h * a + r)                          % prime_order
signature = R || s

The only differences between the two are:

The domain string is empty for Ed25519, and "SigEd25519 no Ed25519 collisions\1" for Ed25519ph.
Ed25519ph doesn’t operate on the message directly, but on its SHA-512 hash instead.

This gave me an easy way to test Ed25519ph for my customer: I could have an API that takes the domain parameter as argument, use it to implement both Ed25519 and Ed25519ph, and be confident that if all Ed25519 tests pass, then my Ed25519ph implementation is almost certainly correct (I did verify a couple test vectors just in case).

So it was quite frustrating Monocypher couldn’t support it out of the box. I had a fancy custom hash streaming API with a vtable for crying out loud! And yet this crowning achievement wasn’t enough to insert that little domain parameter.

Ed25519ph may be a small niche, but someone did pay to get it. Maybe I could tweak my low-level interface to accommodate it? Wasn’t a good idea though, it would just make my gnarly API even more complex. Looking at the code I just shipped however, something jumped at me: The high-level algorithm for EdDSA is simple, and requires few core operations:

Trim a scalar.
Reduce a number modulo the prime order of the curve.
A × B + C modulo the prime order of the curve.
Fixed point scalar multiplication.
Check the EdDSA equation.

We still need the hash function of course, but that one is already provided. I ended up designing what I consider to be the ultimate low-level EdDSA API:

void crypto_eddsa_trim_scalar(uint8_t out[32], const uint8_t in[32]);
void crypto_eddsa_reduce(uint8_t reduced[32], const uint8_t expanded[64]);
void crypto_eddsa_mul_add(uint8_t r[32],
                          const uint8_t a[32],
                          const uint8_t b[32],
                          const uint8_t c[32]);
void crypto_eddsa_scalarbase(uint8_t point[32], const uint8_t scalar[32]);
int crypto_eddsa_check_equation(const uint8_t signature[64],
                                const uint8_t public_key[32],
                                const uint8_t h_ram[32]);

Those 5 functions can implement everything: we can use any hash we want, we can generate the random nonce deterministically or at random, we can make a streaming API, we can make Ed25519ph… the only thing we can’t change is the curve itself.

Now I had a worthy justification to break the API: more functionality with fewer, simpler functions. Simplifying signature code across the board. And as a thank you to Kufatec, implementing Ed25519ph.

This opened the floodgates.

Break All the Things!

Though it had been a while since the last major release, I was painfully cognisant of the fact that I was spinning up the fourth major revision in less than 6 years. This time I figured I’d get serious and aim for perfection. This new API better last.

Documentation

The first to go was the illusion that Monocypher is a high-level library. It’s not. Some of it is, but important high-level constructions are out of reach of the simple tricks NaCl originally tried. crypto_box() was an enticing idea, but failed to provide modern properties like forward secrecy.

Moreover, the only high-level construction in Monocypher that’s not also a primitive is authenticated encryption, which enjoys overwhelming consensus on how to do it: just conform to or imitate RFC 8439. Other high-level protocols (authenticated key exchange, PAKE…), don’t enjoy that consensus, and I really don’t want to pick the wrong one.

So, no more hiding the scary stuff. I’d rather make Monocypher’s full capabilities more discoverable.

Names

Function names still had some lingering inconsistencies, so I settled for a naming scheme and applied it everywhere:

crypto_<primitive>[_<function>]()

This took care of most function names, leaving only a couple edge cases:

Conversion between Curve25519 and Edwards25519 don’t clearly belong to either X25519 or EdDSA. I split them up as crypto_x25519_to_eddsa() and crypto_eddsa_to_x25519().
HMAC-SHA-512 could have been named in various ways. I settled on crypto_sha512_hmac_*, and documented it in the SHA-512 man page. That way that man page matches BLAKE2b functionality.
For EdDSA-BLAKE2b and Ed25519, I favoured conciseness over accuracy, and prefixed the signature functions with crypto_eddsa_*, and crypto_ed25519_* respectively.

Streaming authenticated encryption

For a long time I wanted to add a streaming API, that would be safe, simple, efficient, flexible, and while we’re at it compatible with the direct interface.

First I took a look at libsodium’s design. The API is fairly straightforward, with init_push() and push() on the encryption side, and init_pull() and pull() on the decryption side, and a symmetric ratchet implemented by a rekey() function. There’s also an additional tag argument, transmitted over the wire and used to mark the end of the stream and prevent malicious truncations.

The underlying algorithm however is a bit wasteful: the tag requires computing an entire ChaCha20 block! There are other choices I don’t quite agree with, but this one was my deal breaker. There’s got to be a better way.

While working on my own design, I remembered a property of RFC8439 I could use: some of the encrypted stream is never used. Here’s how it works:

auth_key   = ChaCha20(auth_key_size, key, nonce, ctr = 0)
ciphertext = ChaCha20(text_size    , key, nonce, ctr = 1)
mac        = auth(auth_key, ad, cipher_text)

The details of the auth() function aren’t important here.

Note how we changed the counter from 0 to 1 so we wouldn’t reuse the stream for the authentication key and the ciphertext (which is important because Poly1305 basically leaks the authentication key). What’s less obvious is that a ChaCha20 block is 64 bytes, while the authentication key is only 32. We are sitting on 32 unused bytes.

And you know what those bytes could be used for? Rekeying. It’s basically free, so we can afford to rekey every time. Not only does this mean we can remove the rekey() function, we no longer have to change the nonce and counter.

That last one is kind of a big deal, because if we want to support all ChaCha20 variants, we have to limit our counter to 32 bits. This means rekeying every time this counter overflows — with the streaming API it inevitably will. If however rekeying is free, we can do it all the time and never worry about the rest.

And the best part? Depending how I initialise the first block, the first iteration can be compatible with my direct interface, or even RFC 8439!

There’s just one little wrinkle: my API doesn’t automatically transmit a tag over the network to securely end the transmission. This is less safe than libsodium, but also gives users more choice:

They could use different additional data for the last chunk.
They could send the size of the stream beforehand and encode the remaining size in the additional data of each chunk.
They could do what libsodium does, and prefix all messages with a one-byte tag to mark the last chunk.

Full Argon2 compliance

Users were asking for Argon2d and Argon2id for a long time. I have always resisted including them in the name of simplicity, but I have to admit both have their uses. And even though I’m still not sure whether I agree with it, the RFC does recommend Argon2id by default.

Now that I was breaking the API, there was an opportunity to add them without adding too many functions (the old API would have amounted to a total of 6 functions), and hopefully not add too much code. Surprisingly, there was enough room for simplification that my code ended up shorter.

The API however needed serious rework. A fully general Argon2 function needs 15 arguments:

The output hash and its size.
A pointer to the (user allocated) work area.
The algorithm.
The hardness parameters (number of blocks, passes, and lanes).
The password and its size.
The salt and its size.
The key and its size.
The additional data and its size.

My solution involved putting 12 of those arguments in 3 different structures, thus emulating named arguments:

typedef struct {
    uint32_t algorithm;
    uint32_t nb_blocks;
    uint32_t nb_passes;
    uint32_t nb_lanes;
} crypto_argon2_config;

typedef struct {
    const uint8_t *pass;
    const uint8_t *salt;
    uint32_t       pass_size;
    uint32_t       salt_size;
} crypto_argon2_inputs;

typedef struct {
    const uint8_t *key;
    const uint8_t *ad;
    uint32_t       key_size;
    uint32_t       ad_size;
} crypto_argon2_extras;

void crypto_argon2(uint8_t *hash, uint32_t hash_size, void *work_area,
                   crypto_argon2_config config,
                   crypto_argon2_inputs inputs,
                   crypto_argon2_extras extras);

extern const crypto_argon2_extras crypto_argon2_no_extras;

I reckon this new API is very verbose, but it is also very explicit, and I think easier to read.

Tweaking signature verification

Before version 4, Monocypher’s signature verification didn’t quite comply with the RFC. I used to assume this didn’t affect security, but in one case it does: in some protocols, mixing implementations that accept a signature while the others reject it can wreck some havoc.

No problem, let’s comply with the RFC, right? Nope.

Some points on the curve have a non-canonical encoding (between 2²⁵⁵-19 and 2²⁵⁵-1). Should the non-canonical encoding be accepted or rejected?
Public keys can have a low order. Do we accept or reject them?
There are 2 ways to verify the equation, one of which eliminates the cofactor (batch equation), and the other doesn’t (strict equation). Should we use the batch or the strict equation?

Unfortunately the RFC fails us pretty hard here:

It says non-canonical encodings should be rejected. Okay.
It says low-order public keys should be rejected. Why not.
It lets us choose the equation. Crap!

Even if I stick to the RFC there’s no way to guarantee compatibility. After careful consideration I chose to be maximally permissive: accept non-canonical encodings, accept low-order points, and use the batch equation. I have 3 main reasons for this choice:

The batch equation allows more powerful optimisations, most notably batch verification.
Rejecting low-order points and non-canonical encodings requires more code and provides no benefit, not even security.
Accepting signatures that used to be rejected provides better compatibility than rejecting signatures that used to be accepted.

Conclusion

Making a good API is hard. Getting it exactly right the first time is nearly impossible. I hate to admit it, but unless we’re willing to acrete warts over time, APIs have to change. How else could I have covered up my mistakes?

On the other hand, we also need to avoid changing too much. It is very tempting to change things just because it “feels better”, but that makes us vulnerable to mere mood swings. To avoid that, I tried my best to come up with solid justifications about my changes making things not merely different, but objectively better.

Which is hard, especially for bike shedding like function names. But I think it was worth it. Indeed I broke the API, but I’m really happy with how it currently is, and believe this time it’s really going to last.

All thanks to Kufatec paying me for Ed25519ph.

(Discuss on Hacker News, Reddit, Lobsters)