@ Loup's Impossible? Like that would stop me.

August 2023

Fixing the TPM: Hardware Security Modules Done Right

I’ve worked with TPM 2.0 for a year, it’s a bloated mess.

The core concept behind the TPM is fairly simple: it’s a Hardware Security Module (HSM) with a device secret, a small persistent memory bank, a serial connection with its host, and cryptographic services such as encryption, signatures, secure sessions… Not trivial, but not that complex either.

Yet, the specs span hundreds of pages. The public API of the TPM2 Software stack comprises close to 1200 functions and required 80K lines of code to implement.

Why?

I used to blame committees. TPM 2.0 supports anything and everything, including redundant cryptographic primitives, arbitrary cipher suites, and many use cases. There isn’t even one true way to talk to the TPM, secure sessions and audit trails are optional. Fixing these would have made the TPM simpler, but imagine the endless negotiations about which cipher suite to keep, which use case to drop, what performance trade-off to make. And even then the result wouldn’t be that simple.

Then a couple months ago I saw the TKey from Tillitis taking a different approach, that completely obsoletes TPM 2.0.

The problem

Once we remove committee bickering, the TPM’s core problem is the irreducible complexity of its software: there are simply too many use cases to support, too many competing standards to satisfy.

The TPM hardware however, is a tiny general purpose computer. So how about letting users run arbitrary programs? That way we can do everything with tiny specs. Users would have to provide their own programs of course, but since each program would address a single use case, it would be much simpler than the one firmware to rule them all.

One little snag though: what’s stopping users from loading code that just extracts the TPM’s secret? This is no good: hardware security modules do not leak their internal secret. Their whole point is to keep their secret safe from user error or host compromission. In other words, we need a way out of the following contradiction:

The solution

Microsoft Research provided the key to solving this: DICE measured boot

Their motivation however was different: they wanted to better protect the device secret. Many devices store their secret in a fuse bank, and changing it is often impossible. Leaking it is especially bad, because it can render the entire device forever unusable: no one wants an HSM with a leaked secret, or a remote IoT device that can be impersonated.

Their solution is to drastically limit access to the device secret, and instead give derived secrets to the main program. It needs the following components:

The boot sequence works as follows:

  1. Hardware launches the bootloader.
  2. Bootloader computes CDI = KDF(UDS, program)
  3. Bootloader activates the latch. The UDS is gone until reboot.
  4. Bootloader launches the main program.
  5. The main program does the thing, with the help of the CDI.

This neatly solves Microsoft’s problem:

Curiously enough, Microsoft didn’t seem to realise how powerful DICE really is. Maybe they still thought of the main program as firmware: stored on the device, likely inconvenient to update. Maybe our problem wasn’t their priority. One way or another, Tillitis saw past that and added one crucial step:

Download the main program at each startup.

The CDI doesn’t just distinguish between vulnerable and patched programs, or between legitimate and malicious programs. It gives every program their own unique secret, tied to the device. The contradiction we had to get past? It’s meaningless now:

With DICE-style measured boot, HSMs don’t need to be limited or complicated. With DICE-style measured boot we can have simplicity and flexibility and security, all at the same time. I appreciate the herculean efforts of the Trusted Computing Group to provide an international standard for everything and everyone, but the fixed-firmware approach is obsolete now. TPM 2.0 is obsolete, and I’m not touching it ever again.

Or the YubiKey for that matter.

Concrete example 1: the TKey

Tillitis already shows an excellent detailed description of the TKey in their developer handbook. I’ll just provide an overview.

Hardware wise, the core of the TKey is a minimal 32-bit RISC-V system on a chip with 128KiB of RAM. The instruction set supports the full 32 registers, compressed instructions, and multiplication. Though minimal and void of any cryptographic extension, this ISA is already enough to run meaningful cryptographic code fairly efficiently.

Communication with the host is handled by a separate microcontroller that translates between USB (host side) and UART (TKey side). Though it is soldered on the same PCB, we can mostly pretend this microcontroller is not part of the TKey, and just say the TKey talks to the host through a UART interface.

In addition to this core, the TKey ships with a number of goodies:

When we plug the TKey into our computer, the firmware starts up and waits for commands from the host. Communication with the firmware uses a very simple command/response protocol that allows the host to request some data (name, version, ID), and load the main program.

The way the TKey derives the program’s CDI is a bit different from what DICE specifies: we can insert a User Supplied Secret (USS) into that mix. The USS is either a 32-byte buffer representing a user secret (typically a password hash), or the empty string. In practice, CDI = BLAKE2s(UDS || USS || BLAKE2s(program)).

This addition of the user secret might seem like a needless complication for the bootloader: after all, applications could load the USS and hash it together with their CDI themselves. Thinking about it however, I think this is neat: the additional complexity for the bootloader is minimal (less than 20 lines of C code), and is useful for most TKey applications. Also, the TKey is a personal security dongle. Having the CDI represent not only it, but its user, just makes a ton of sense.

As for concrete applications, Tillitis already provides software to supports SSH (the SSH-agent and various Git forges), and there are plans for FIDO and TOTP. But those are just examples. We could have the TKey sign files, help with encryption, support exotic cryptographic primitives & protocols…

What really gets me excited about the TKey though, is that it’s actually implemented on an FPGA. The regular locked down version is already extremely flexible as it is, but the unlocked version and associated programmer can neatly serve as a generic FPGA development board. Besides, I have some optimisations I’d like to try.

Concrete example 2: Measured Boot

The reason I worked on the TPM for a year, was a connected EV charging station. The thing was supposed to sit there in the public space, so we had to assume the enemy may have physical access to the computer. The higher-ups decided we needed the TPM, and I was tasked with provisioning it.

Setting aside that hardware security is likely impossible to achieve through a dicrete HSM (just insert something between the HSM and the main board, then lie about what is really executed in the main board), we don’t need the TPM specifically. Anything the TPM does, could be done by something like the Tkey instead.

Chances are though, we don’t even need a discrete chip: if like so many embedded system we had an ARM SoC, we just need it to be DICE capable: a fuse bank (256 bits are enough), a latch that can turn off read access to that fuse bank until next reboot, and some ROM for a small bootloader.

Then it’s just a matter of having the bootloader perform a hash of the next stage (Linux kernel in this case), then compute the CDI from that. That CDI can then be used to derive a key pair that can be used to authenticate the machine to the company’s servers.

Conclusion

The classical approach to hardware security modules such as the TPM and YubiKey, is fundamentally flawed. Those things are general purpose computers. Freezing their firmware (or making it hard to update) cripples them beyond repair.

That one firmware to rule them all has to address many use cases for many users, making it bigger, more complex, and more more likely to leak the device secret. And when that happens we’re often out of luck, because such secrets are often stored in fuses that cannot be changed: the HSM still works, but it’s forever compromised and useless.

The right approach to hardware security modules allows users to load arbitrary firmware. Different firmware, different use case, different secret. Such specialised firmware is smaller, simpler, and more secure. And if it does leak its own secret, fixing the bug automatically rekeys it. The HSM stays useful even if its root key was stored in fuses.

My recommendation? Imitate Tillitis (their whole thing is open source), or buy directly from them. Try to avoid old inflexible security tokens like the YubiKey.

And run the hell away from the TPM. I can forgive its allegiance to the Evil Empire in the war on general computation, but its complexity is just unacceptable.

(Discuss on /r/crypto, /r/programming, Lobsters, or Hacker News)