How not to format a private key

4 months ago 23

Last time I had an issue with standardization going into the, in my opinion, wrong direction, I wrote a blog post about it. Much to my own surprise, that actually worked, and as a reward you get more blog posts about somewhat convoluted standardization issues.

In order to entice you to at least skim the background section, here a bit of a teaser: Do you like undefined behavior? Do you enjoy two standard conforming implementations agreeing on all input bytes, and all entropy the system produced, but disagreeing on the output bytes? Then you might be happy with the proposed “compromise”. Otherwise, read on to find out what this is all about.

The Problem

The current conundrum is about the format to use for the private key of ML-KEM/ML-DSA when using these keys for certificates. It turns out that NIST left a tiny hole in their definition of these formats, which now causes some considerable problems. When defining the key generation algorithm of ML-KEM, NIST added this statement in the standard:

Similarly, for ML-DSA, there is the almost identical statement:

This means there are fundamentally, and NIST approvedly, two different representations for ML-KEM/ML-DSA private keys, in that you can either store the random seed used as input for the respective key generation algorithm or store the output of that algorithm.

Before we watch how we, as a community, managed to turn this rather benign issue into an absolute mess, let’s look at the two private key formats NIST has defined a little bit more closely.

Seeds

First, there are the seeds, 64 bytes in the case of ML-KEM, and 32 bytes in the case of ML-DSA. Aside from being really small, especially when compared to anything else in PQC, seeds have a number of other advantages. What the key generation does, in both cases, is to feed this seed into a number of cryptographic hash functions and eXtensible Output Function (XOF), basically a cryptographic hash function that produces any number of output bytes, instead of a fixed number as would be expected from a hash function. These outputs are then used to generate the actual key material.

This has a number of advantages. I already mentioned their compact size, but another property is that it is impossible to create a malformed key out of a seed. Since the seed is the input in hash functions and XOFs, the actual key material derived from it will depend on every bit of the seed equally, and changing a single bit changes the key in its entirety, there is no way of independently chose parts of the private key. This is why seeds defend against the attacks outlined in Kemmy Schmidt, and their main advantage over other formats (small caveat here is that for ML-KEM, the seed actually consists of 2 separate, concatenated seeds, and the format still allows for one of the two Kemmy Schmidt attacks. We could avoid that by deriving the ML-KEM seed from an even higher level seed, but NIST opted not to do that). Module Lattice schemes in particular have private keys $(s, e)$ such that the public key is $t = As + e\,\text{mod}\, q$ . s and e have to be drawn from a specific distribution in order for the key to be secure, with the key generation algorithm describing how to do so, meaning that the seed enforces this specific distribution, which can not easily be proven to be case afterwards.

But also, like with any cryptographic hash function, there is no going back. The seed is high entropy, so there is no way of reversing the hash functions and XOFs, and the only way to get the seed later is by saving it.

Semi-Expanded Private Keys

So if seeds are so great, why aren’t we using them everywhere? For elliptic curves, the reason is that the private key is extremely simple. Any integer in the correct range is a valid private key, so there isn’t really much of a key generation algorithm. The parameters for popular curves are set to be very close to powers of two, so there isn’t even much rejection sampling to be done. X25519 goes even further and simply ignores the extremely small bias introduced by not rejection sampling and declares every 32 byte string to be a valid private key.

For RSA, the situation is the other way around. RSA key generation has to compute a substantial amount of variables, including rejection sampling not one, but two large primes. Primality testing is fairly expensive, and numbers in this range (for RSA-2048) are only prime with a probability of roughly 1 in 700 (though this improves if you don’t test even numbers). All in all, the RSA key generation algorithm is painfully slow, and you really want to avoid running it from scratch every time you load a key, which has lead to RSA storing the results of the key generation algorithm as a cached computation. That being said, there would be a way to add seeds to RSA, by only noting down the succeeding roles for p and q, but all in all this ship has sailed, circumnavigated the globe several times, and is about to be decommissioned.

This brings us to the semi-expanded key formats of ML-KEM/ML-DSA. As I mentioned, we need to sample some parts of the private key from a specific distribution, but unlike with RSA, this distribution is rather simple (centered binomial in ML-KEM’s case, uniform in ML-DSA’s case). In particular, no rejection sampling of the private key material takes place, as the parameters are chosen in such a way that everything works out with powers of two. This makes ML-KEM’s/ML-DSA’s key generation algorithm very fast, and not much of a burden to run on load. In fact, the slowest part of the key generation algorithm is sampling the matrix $A$ . But this matrix is stored as a seed in both the public key and private key format anyways, so you don’t get around that step as is, and this is also the reason why I keep calling this key format semi-expanded. It expanded some aspects of the key, but left others to be recomputed.

As far as I can tell, the only reason this format exists to begin with, is as a compromise to make performance metrics look better in the competition, even though they do not apply in practice.

I discussed how cryptographic primitives mathematically speaking are tuples of functions before, with KEMs in particular being a triple $(G, E, D)$ and signatures being a triple $(G, S, V)$ and expanded how this mathematical definition warps and bends when hitting reality. To understand the difference between performance in competition conditions and performance in real life, we need to again look at the code that would implement such a triple, starting at the same code snippet from the last blog post:

As you might have noticed when reading that previous blog post, I left some objects undefined in this API. While I specify that messages and signatures are binary strings, I say nothing about the keys, and leave them as generic objects. The competition simply defined these keys as byte strings as well, and used this API as their performance benchmark.

However, when we in practice want to perform multiple operations using the same key material, we would usually use some in-memory representation that fixes things like alignment and similar at the very minimum, so forcing these objects to be byte arrays would be overly limiting for actual implementations. What we really want is a function to parse and serialize objects, as common in object oriented programming, leaving us with this type of API:

typedef std::vector<uint8_t> Message; typedef std::vector<uint8_t> Signature; std::tuple<PrivateKey, PublicKey> generate_key(); Signature sign(PrivateKey sk, Message m); bool verify(PublicKey pk, Message m, Signature s); std::vector<uint8_t> serialize_private_key(PrivateKey sk); PrivateKey parse_private_key(std::vector<uint8_t> sk); std::vector<uint8_t> serialize_public_key(PublicKey sk); PublicKey parse_public_key(std::vector<uint8_t> sk);

This is the API many libraries use in practice, and with this API, you are usually not very concerned about slight performance regressions in the parsing operations, as they amortize. You are of course not completely isolated from them, so completely regenerating RSA keys is not going to fly, but the rather straight-forward LWE keygen algorithm is not a problem. This leads to a situation where the competition produced a key format that is inferior in practical applications, and a good lesson in “you get what you measure”.

The Options

With the problem now explained to in unnecessary detail, we can finally discuss what is happening at IETF. In case you skip everything above, to recap, we have two formats, one short (seed) and one long (semi-expanded) one, and a one-way map between the two, mapping from seeds to semi-expanded keys, but no way to invert this mapping.

Which brings us to IETF now having to define a key format. The most elegant choice would be to just always use the seed, it is the more general format, and call it a day. But some folks have already started implementing HSMs before the standard was done, and they forgot to add the ability to read seeds. I have to admit, I don’t have particularly large amounts of sympathy for that position, implementing a pre-standard comes with risks, and given that the vast majority of implementations of ML-KEM and ML-DSA, including in HSMs, is yet to be written, we should not bend over backwards to accommodate over-eager adopters. But I do have some sympathy, and there are ways that we could make both sides happy.

The simplest would be to define two separate private key formats, using separate algorithm identifiers (OID) in PKCS8. The algorithm identifier in PKCS8 is shared between private key and public key, so this leads to the rather awkward outcome of having two public keys that use the exact same algorithm using different IDs. I still think that that is a acceptable outcome, and will expand on why in the last section, but for now let us just accept that there was significant pushback to this idea.

The next simple thing would be to have a way to switch between seed and semi-expanded key, using a construction like ASN.1 CHOICE. In other words, have a two value enum as first byte, followed by either the seed or the semi-expanded key, depending on the enum value. This format is somewhat incredibly awkward, but it is workable. It is how we define elliptic curve points for example, which contains an encoding for both compressed and uncompressed points. In all likelihood all that would happen is that we would see seeds winning out in the end as the more flexible option, with the occasional old hardware having to have its keys rewritten by a software layer.

Which brings us to the last option, which prompted the reason for me writing this blog post. Making decisions is hard, so what about a standard that just, like, did not make one? The suggestion is to use a list of two optional fields, one for the seed and one for the semi-expanded key, and leave the interpretation up to the importing library. This has two edge cases, one that is easily fixed, and one that is inherent to this lack of decisiveness. First, the key could be empty. This can be fixed by disallowing that option, and only allow seed, semi-expanded, or both to be present. But this leaves the other bad choice: both. If there is both a seed and a semi-expanded key present, you have not one, but two equivalent keys in your format. And nothing guarantees that they are the same key. Sure you could run the key generation algorithm and check, but if you do that, then why did you include the semi-expanded key to begin with, it is literally the output of this check.

In case you are wondering, NIST agrees with me on this one, and recommends the following check to be run when you have both seed and semi-expanded key available. However, they different from all other consistency checks, they opted to not make this check mandatory, mostly due to private key import being a rare occurrence, if I understand them correctly.

And of course, since the semi-expanded key is superfluous when the seed is present and this seed consistency check is run, the proposal is that the recipient does not perform it, leaving us with implementation defined behavior in the case of an inconsistent key.

On Well-Definedness

So why am I so vehemently opposed to any format that supports both seed and semi-expanded key material to be present in a single key blob without checking consistency? It boils down to the mathematical concept of well-definedness. Funnily enough, the concept of being well-defined is in itself not really well-defined, and depends on the context. In general it means that a definition should be internally consistent, and depending on context, the object defined should actually exist and be unique. For example, “the smallest integer bigger than 3.4” is a well-defined statement. This integer exists and is uniquely determined to be 4, so the definite article in the statement and the superlative are warranted. Compare this to the statement “the smallest real number bigger than 3.4” and you can see a statement that is not well defined. Real numbers are not well-ordered, and for any real number bigger than 3.4 there are infinitely many more real numbers, equally bigger than 3.4 that are smaller than the chosen number. There is no “smallest” number. In other words, when a mathematician says “this statement is not well-defined” they are implying that what is said is simply nonsensical, and not a legal statement to begin with, so it can be considered no further.

But aside from offending mathematicians, for standards there is a very practical concern here: The whole point of a standard is to describe how to interpret something. If two implementations of the same standard having vastly different behaviors would be acceptable, then why did you write a standard to begin with?

Implementation defined behavior has concrete security ramifications. The point of a private key format in particular is to transfer private key material from one system to another, which is already a fairly rare and not well-tested situation. It is easy to imagine one part of a system writing one field, and the other part of the system reading the other. In particular, since the seed is unrecoverable after the fact, a system that only has the semi-expanded key available might, instead of writing NULL in the seed field, write “NULL”, which the other system then prefers over the semi-expanded key that contains the actual key material, both compromising the system (as “NULL” is very brute-forcable) and potentially losing the actual private key forever.

This brings us to the last topic, the aforementioned resistance against using separate OIDs. The reason for this resistance has a historic predecessor, RSA. RSA keys have two different OIDs, one for “RSA” and the other for “RSA encryption”. This causes lots of pain, and with the lens of well-definedness, it is somewhat immediately obvious why: “RSA” is not a well-defined algorithm, but is actually four separate algorithms, and those algorithms aren’t even fully specified by the key format. We have RSA-PKCS1 for signatures, RSA-PKCS1 for encryption, RSA-PSS for signatures, and RSA-OAEP for encryption. The “RSA” key type is used for all four, and while the “RSA encryption” key type is used only for two of them. This is a classic problem of not properly defining your algorithm and then being surprised by the confusion it causes. Compare that to the ML-KEM/ML-DSA situation, where, if we used two separate OIDs, we merely would have an alias, with the explicit definition of the public key being the same for both OIDs. While it would be annoying to write an explicit “handle this case like the other case” everywhere, it would be nowhere near as bad as “handle this case somewhat, but not exactly like that other case”.

Read Entire Article