Why Your Simple Password Is a Mathematical Catastrophe

1 month ago 4

I recently came across this old New York Times article about Stefan Thomas, a programmer who made headlines back in 2019 for forgetting a password. This wasn't just any old password, but one that could unlock a digital wallet containing 7002 bitcoins which, as of October 2025, are worth over $800 million. Part of me felt that, if I was in his shoes, I'd do everything within my power to crack that password.

As I read more about this, it became more than just a story about cryptocurrency and lost riches. It's about how we keep our information safe in an increasingly digital world. It's about the mathematical concepts that allow us to transform memorable information into something computationally secure. Anyone who's ever lost or forgotten a password may recognise the fundamental tension here. A more complex password designed to keep our valuable information from prying eyes can just as easily lock us out.

Humans have been trying to obscure information for as long as we can collectively remember. One of the earliest records of this takes us all the way back to Julius Caesar in the ancient Roman Empire. As the story goes, the goal was to develop an encryption technique to allow Caesar to communicate with his generals without enemies understanding the messages if they managed to intercept them. The solution they came up with was elegantly simple: shift every letter in the alphabet by three positions:

| Plain | A | B | C | D | E | | ------ | --- | --- | --- | --- | --- | | Cipher | D | E | F | G | H |

The Caesar cipher, as it was called, is one of humanity's oldest systematic approaches to cryptography. Cryptography is the mathematical transformation of readable information into unreadable information, with a key (in this case, the shift amount) allowing authorised recipients to reverse the transformation.

However, the cipher's simplicity was also its greatest weakness. Since there are only 25 possible shifts, anyone could crack the code simply by trying every key. This vulnerability forced thinkers to ask deeper questions that would eventually become the bedrock of information theory: How do you measure the predictability of the original message? How many possible keys are enough to be secure? And how much work would it take to break a code just by guessing? These are the core problems of entropy, key size, and computational difficulty.

Modern Mathematics of Information

In 1948, Claude Shannon published a blockbuster paper titled "A Mathematical Theory of Communication" in which he laid the foundation for the information age. The impact of this paper was far-reaching and it fundamentally changed how we think about information itself. In it, he didn't just create information theory, but he mathematically defined what information actually is.

Shannon's breakthrough came from the realisation that communication was fundamentally about surprise and uncertainty. A more surprising message is likely to contain more information i.e. there's more in there that you don't know. A message that says "the Earth is round" contains very little information because it is completely predictable. Another one containing the exact weather in Kwekwe, Zimbabwe on 29 June, 2039 contains enormous information because it's highly unpredictable.

This insight would prove to be crucial for the field of cryptography and guide our understanding of passwords, as Shannon also showed us how to measure this uncertainty through a concept called entropy.

The Mathematics of Surprise

Shannon entropy is defined by the formula:

H(X) = -Σ P(x) log₂ P(x)

Don't let this scare you. Our focus is mainly on P(x), which is the probability of each possible outcome. In other words, it tells us how surprising (or not) the information from a source is. Let's see how this relates to passwords.

Let's say you want to choose a password by flipping a fair coin four times (heads = 1, tails = 0). You might get "0100". On each flip, there's a 50% chance of getting heads or tails, so each position contributes exactly 1 bit of entropy. The total entropy of this password is 4 bits, which gives us 2⁴ (16) equally likely possibilities.

Here is where it gets fascinating: if you choose a four-character password from the English language, like "love", your selection pool isn't 26⁴ equally likely possibilities. It's actually much smaller i.e. the set of actual four-letter English words. And, within that set, some words are more likely than others.

1 in 1,000 people might choose the word "love", while 1 in 10 million people might choose "qdft". Not all four-letter passwords are created equal. Same length, very different entropy. With that, Shannon's mathematical framework gives us a way to measure the true security of a password. Not just its length, but its genuine unpredictability.

I have been playing Wordle a lot lately, and there are similarities, here. If you had to find a random five-letter combination of letters from the entire alphabet, the game would be unimaginably difficult. But, instead, it is constrained to a list of actual five-letter words which significantly shrinks the pool of possibilities.

A Computational Revolution

Continuing with the "love" password example, a low-intelligence, brute force attempt (i.e. without considering the likelihood of real English words) to guess the password would take, in the worst case, 26⁴ (456,976) tries. That may sound like a lot, but it really isn't.

The emergence of computers didn't just make cryptography fast, but they fundamentally changed the mathematics of what was possible. Where a human would take hours trying a few dozen password combinations manually, a computer could try millions in a second. Those 456,976 combinations could be exhausted instantly.

On the other hand, computers also made it practical to implement mathematical operations that would have been impossible by hand. This led to one of the most elegant discoveries in computer science: [cryptographic hash functions](cryptographic hash functions).

The Mathematical Miracle of One-Way Functions

A hash function is a mathematical operation that takes any input and produces a fixed-size output that appears completely random. For example, the SHA-256 (Secure Hashing Algorithm) will generate an exactly 256-bit string (digest) from any given input:

NotSecure123 => 2955d9720f90b9557a5a43e9ca6f3e593d6f54bf5486635cc4edfdaa11c29ecd

If you took the time the check the length of the digest, you may have noticed that the "256-bit" digest is only 64 characters long. This isn't a contradiction; it's just a more compact way of writing down the data.

The hash is in the hexadecimal (base-16) number system which uses the characters 0-9 and a-f. Each of these characters serves as a shorthand for a unique group of 4 bits.

Therefore, the 64 characters perfectly represent the full 256 bits of underlying data (64 characters × 4 bits/character = 256 bits).

This process is deterministic, so the same input will always produce the same output. You can try it, too, using an online hash generator. However, just changing a single character in the input will dramatically change the output. As already mentioned, the output is also always 256 bits (in this case, 64 characters), whether the input is a single letter or an entire novel. Most importantly, the output hash is computationally irreversible.

The last property is what makes hash functions so remarkable. They are easy to compute in one direction, but computationally infeasible to reverse.

Computationally Infeasible??

You may be wondering what that even means. This brings us to one of the deepest questions in computer science: the relationship between mathematical proof and computational reality.

When we talk of SHA-256 being a secure algorithm, this statement is not backed up by mathematical proof in the traditional sense. Instead, the assertion is based on the computational complexity of the best known algorithm for finding an input that would produce a given hash i.e. reverse engineering the hash. This would require 2²⁵⁶ operations in the worst case as that is the number of different possible outputs.

This number is unfathomably large. It's approximately 10⁷⁷; a number almost as large as the estimated number of atoms in the observable universe (around 10⁸⁰). Even with top of the line, traditional processing hardware today, it would take longer than the current age of the universe to try every possible input.

This is the mathematical foundation of digital security. These systems are built not on mathematical impossibility (which is much harder to achieve), but on computational impracticality at a cosmic scale.

As with anything, there's always fatal flaw, a chink in the armour, so to say. In this case, the Achilles heel of this elegant mathematical construction is the simple fact that humans don't pick random passwords. We choose passwords that have some meaning to us as that makes them more memorable, and they often reflect our psychology in predictable ways.

For example, the RockYou database breach of 2009 exposed over 32 million passwords in plaintext, giving security researchers an unprecedented window into human password behaviour. What they found was that nearly 60% of people chose passwords from a dictionary of just 1,000 common passwords. Not only that, but the top 5,000 most common passwords accounted for 20% of all passwords in the list.

This was a major failure on their part. Saving plaintext is generally frowned-upon. Instead, platforms should save only password hashes.

So, if passwords were chosen truly randomly from lowercase and uppercase letters, numbers, and symbols (approximately 95 characters), an 8-character password would have:

95⁸ = 2^53 = 6.63 × 10¹⁵ possibilities

But if 60% of users choose from 1,000 common passwords, the effective entropy becomes:

log₂(1000) ≈ 10 bits

So, instead of roughly 2⁵⁶ bits of entropy (for a truly random 8-character password), that is reduced to a measly 10 bits. Instead of requiring 2²⁶ guesses on average to crack, an attacker would only need 2⁵ = 32 guesses on average.

Analysing Human Predictability

The RockYou analysis revealed deeper mathematical patterns in human behaviour. Password creation follows something called "Zipf's law". This means that the frequency of passwords follows a power law distribution. The nth most popular password appears with frequency proportional to 1/n. In other words, the second most password (rank 2) appears about half (1/2) as often, and the third most popular one (rank 3) appears about one-third (1/3) times as often.

Zipf's Law Passwords

What we see if that a small number of passwords makes up the largest proportion of all chosen passwords. We see this same mathematical relationship in word frequency in natural languages.

This is a cryptographically damning conclusion. A smart attacker does not need to search the entire space of possible passwords uniformly. They can get away with a dictionary attack, targeting the most common passwords first, giving them access to a significant number of accounts very quickly. It also means that they have no reason to go after and try to crack the most difficult passwords. As a user, your password doesn't have to be the most secure, it just has to be more secure than most other people's i.e. you don't have to outrun the bear, you just have to outrun the slowest person on the camp site.

Password cracking

Modern password cracking is really one of the most sophisticated applications of parallel computing and algorithmic optimisation. When attackers obtain a database of password hashes, they are essentially trying to invert a one-way function which, as we already saw, should be computationally infeasible.

This means they have to be smart. For instance, instead of computing hashes in real-time, they use rainbow tables, which are just massive databases that store pre-computed hashes of millions of common passwords. They also use powerful graphics processing units (GPUs) which are designed for parallel computation and are exceptionally good at the repetitive mathematical operations required for password cracking. A modern, high-end GPU can compute billions of SHA-1 hashes per second. And, if they do this on multiple machines simultaneously, they effectively multiply their computational power.

Salting and Key Stretching

The cryptographic community has had to be equally smart. The primary response was to make the one-way function deliberately slow and to ensure that each password hash is unique, even if the underlying passwords are the same.

This can be done through salting and key stretching.

Salting involves appending a random string to a given password before hashing it. So, two users could choose "qwerty123", and their hashes would be completely different because of the different salts. This defeats the rainbow tables strategy by forcing attackers to crack each password individually.

The second part of it, key stretching, means hashing a password thousands or even millions of times. The bcrypt algorithm, for example, can be configured to require 2ⁿ iterations, where n depends on available computing power.

Mathematically, this is a beautiful solution. If it takes an attacker's computer 1 second to try a single password, and there are 1 billion possible passwords in their dictionary, it would take them 31.7 years to try them all. If you can apply a 1000x slowdown to each attempt, let's say, from 1 microsecond (1/1,000,000) to 1 millisecond (1/1000), you extend the attack time to 31,700 years.

This introduces an elegant asymmetry. The extra waiting time for a user to get their password evaluated when logging in is basically negligible (e.g. 100 milliseconds instead of 0.1 milliseconds). For an attacker, however, trying millions of passwords, this slowdown is devastating.

This brings us to the most counterintuitive aspect of password security: how individual choices create systemic vulnerabilities that transcend mathematics.

Imagine that you use the same password for your email, bank, social media, and streaming accounts. Chances are each one of those systems is (almost) perfectly secure. The bank uses proper salting and key stretching, your email provider has robust security, etc.

But, what you've created is what we'd call a "single point of failure". When any one of those systems is compromised, they all become vulnerable. Not because of any technical weaknesses, but because of this systemic interconnectedness.

Credential Stuffing

When a bad actor obtains username/password combinations from a single data breach, they systematically try them across thousands of other services. This attack, called credential stuffing, succeeds not because of cryptographic weakness, but because of human behaviour patterns.

Studies show that credential stuffing attacks can be up to 2% successful. This might sound low, but when attackers are trying millions of stolen credentials across thousands of services, the absolute number of successful breaches can be enormous. For example, if an attacker had 10 million stolen credentials and tried them across 1,000 services, even with a 1% success rate, they would still gain access to 100,000 accounts.

From the attacker's perspective, this is an easy decision given the potential big wins. The cost of computational resources for credential stuffing attacks is minimal compared to the potential reward from compromising even a handful of accounts.

Generating True Randomness

The solution, as I wrote about before is to remove humans from the password-generating equation entirely. True randomness can only be found in physical sources of entropy: radioactive decay, atmospheric noise, quantum fluctuations, etc.

When a password manager generates a password like "K7$nM@2vB9!qL5z8W", each character is chosen from the full alphabet of available characters with equal probability. You can't get the same outcome by just mashing randomly at your keyboard. You'll invariably find that some characters will be repeated. For example, you'll likely press more characters with one hand (your dominant one) than the other.

Securing Password Managers

You may wonder if we're not just creating a catastrophic single point of failure by having all our passwords in one place.

Modern passwords use multiple layers of security to keep your passwords safe including client-side encryption, zero-knowledge architecture, and key-derivation functions. Without getting into the technical details, it means that your unencrypted passwords don't even reach the password manager's servers, and they cannot decrypt your passwords even if they wanted to. That's why many password managers don't even offer traditional password recovery, and instead rely on recovery keys.

The encryption typically uses AES-256, which would require approximately 2²⁵⁶ operations to crack which, as we have already seen, would require cosmic timescales.

A Coming Revolution

Everything we have seen so far relies on some assumptions about computational difficulty. But quantum computers, when they become practical, will change what's computationally feasible.

For example, Shor's algorithm can find factors for prime integers faster on quantum computing that any known classical algorithm. This presents a fundamental threat to public-key cryptography schemes upon which many internet security protocols are built.

However, symmetric encryption (like AES, used by password managers) and cryptographic hash functions (like SHA-256) are more resistant to quantum attacks. Grover's algorithm can speed up brute-force attacks on symmetric encryption, but it only halves the effective security level. AES-256 would become equivalent to AES-128 against quantum computers which is still computationally secure for the foreseeable future. In that sense, AES-256 is generally considered to be quantum resistant.

Security vs Usability

At it's core, password security represents a fundamental tension in information theory: the trade-off between entropy (unpredictability) and memorability (human cognitive limitations).

What Shannon showed us was that high-entropy information is, by definition, unpredictable and therefore unmemorable to humans. The most secure password is indistinguishable from random noise which makes it psychologically impossible for humans to remember.

This is what cryptographers call the password paradox: the requirements for security are fundamentally incompatible with human cognitive architecture.

Understanding password security requires grappling with concepts from information theory, computational complexity, probability theory, and systems thinking. In our increasingly digital world, this mathematical literacy becomes a form of citizenship.

Just as we expect citizens in a democracy to understand basic civics, we might expect citizens in a digital society to understand the mathematical foundations of the systems they rely on daily.

Our journey has showed us some of the remarkable achievements of modern cryptography. We've learned to transform human-memorable information into computationally impregnable barriers through mathematical functions that are easy to compute but impossible to reverse. And, we have even built these systems to scale with the fundamental limits of computation itself. We have, however, also shown the potential downsides: simple character strings known to a single person can create barriers stronger than any physical vault.

Every time you input a password, you are engaging with concepts from the deepest depths of mathematics and computer science: information theory, computational complexity, cryptographic theory, and the limits of what can be computed. You become a participant in a system of mathematical trust that would have seemed like magic to our ancestors.

In our digital age, understanding the mathematics of passwords isn't just about following security advice, it's also about understanding the mathematical foundations of the world we've built and inhabit. It's about appreciating the elegant complexity hidden in the most mundane aspects of our digital lives.

Read Entire Article