Understanding the basics of quantum error correction and fault tolerance.
Authors
Robert Davis
Olivia Lanes
John Watrous
Today’s utility-scale quantum computers are now performing calculations so complex that classical computers can only approximate their outputs. However, from a practical perspective, quantum computers are still quite sensitive to errors, which limit the level of complexity and circuit size they can handle.
To realize the full potential of quantum computing and solve problems too complex for classical computing alone, we need to build a fault-tolerant quantum computer capable of running larger, deeper circuits. Simply adding more qubits to the device isn’t going to solve all these challenges, because the errors will grow alongside the qubit counts in our processors—or, rather, the errors will grow if we don’t have a plan to attack them. That’s why we need fault tolerance. So what is fault tolerance, exactly, and what will we need to achieve it?
A fault-tolerant quantum computer is a quantum computer designed to operate correctly even in the presence of errors. Because quantum bits (“qubits”) are extremely sensitive to their environment, they are prone to errors like decoherence and noise. Fault tolerance is about detecting and correcting these errors, in real time.
There are organizations that claim they have already built fault tolerant quantum computers, but so far, all of these claims have been at very small scales. For fault tolerance to be meaningful—and more importantly, useful—we must demonstrate it on large-scale devices capable of running circuits with hundreds of millions of logical operations occurring on hundreds or thousands of qubits. The goal is to not just create a proof of principle, but a real, working machine capable of running fault-tolerant-level algorithms.
In this blog, we’ll take a look what large-scale fault tolerance really entails, why it's fundamentally different from just scaling up quantum processors to larger qubit counts, and what it will take to get us there.
Fault tolerance in classical computing
In 1947, while working at Bell Labs, Richard Hamming made an incredible discovery: the world's first error correcting code, now known as the Hamming code. Its purpose was to make computations more reliable. If even a single bit inside of a computer errantly "flips" during a computation — meaning that it switches from 0 to 1 or vice versa, without the knowledge or control of the computer operator — it can, in principle, ruin the entire computation. While today’s advanced chip technology makes errors like this quite rare, this was not the case back in 1947. It was a major issue, in fact, and Richard Hamming was tired of returning to work on Monday mornings only to discover that the computations he'd run over the weekend had erred.
The way the Hamming code works is pretty simple: it uses redundancy to protect strings of bits against errors. For every four bits that we want to protect, we add three additional bits, called parity bits, whose values depend on the four bits we started with. They're called parity bits because their values are determined by taking the parity (or, in other words, the exclusive-OR) of different subsets of the four original bits. By comparing the values of the three parity bits with the four original bits, we can gain clues about whether errors may have occurred, and possibly correct them.
In particular, if one of the seven bits flip — whether it’s one of the original four bits or a parity bit — we can infer which one it was and flip it back. What this means is that if the rate of errors is sufficiently low, so that it's very unlikely that more than one out of every seven bits will flip during a computation, then we can use this code to perform computations reliably. Hamming implemented and later generalized his code, and it allowed him to perform his computations more reliably.
Hamming's discovery also kicked off a revolution in telecommunications, data storage, and computation. Today, classical error-correcting codes serve as an invisible foundation underlying much of our digital technology infrastructure — enabling satellite and deep-space communication, reliable data storage and computation, high-speed internet, and more.
Birth of quantum error correction theory
When the field of quantum computing started to take off after Peter Shor's 1994 discovery of a quantum algorithm for integer factorization, a natural question arose: could we somehow use error correcting codes to protect qubits from errors, so that quantum computations could be made more resilient against noise?
Even back then, long before quantum computers existed, it was widely suspected that noise and decoherence would be major obstacles to building reliable quantum computers. It was not at all obvious that error-correcting codes could actually help, and it was Shor who first showed that they could.
Quantum error correction isn't easy, though, because quantum information is extremely fragile. This means that quantum error correcting codes have much greater demands placed on them — they have to protect the delicacy of quantum states. The basic idea is similar to Hamming's code, where we encode four bits into seven by adding three parity bits, but it's not so simple as adding parity bits in the quantum case.
Instead, we start with one or more logical qubits — meaning the qubits representing the quantum information we'd like to protect and perform computations on — and we encode their state into a larger number of physical qubits using entanglement. In effect, the state of the logical qubits is distributed across the physical qubits in such a way that it's made more resilient to noise. Certain measurements can then be performed on the physical qubits to provide information about what errors might have occurred, and based on that information we can then try to correct the errors.
It is theoretically impossible for a quantum error correcting code to handle every error, but different quantum error correcting codes are tolerant to different error rates when we model noise mathematically. Just as in the Hamming code, which works only so long as we never have more than one bit flip among seven, quantum error correcting codes only work when the error rate is sufficiently low.
Shor's very first quantum error correcting code, known as the 9-qubit Shor code because it encodes one qubit into 9, was a breakthrough. It's also a great example for learning about quantum error correcting codes because it's simple and intuitive. It's not a practical code, though; it can only tolerate a minuscule error rate.
More quantum error correcting codes followed. Another breakthrough came a couple of years later in 1997, when Alexei Kitaev discovered the toric code, which could tolerate a much higher error rate than the codes that came before it. Subsequent work by Kitaev and now IBM Fellow Sergey Bravyi led to surface codes soon after, which can be viewed as more practical implementations of the toric code. Surface codes allow single qubits to be encoded into square grids of qubits of varying sizes, and like the toric code they can tolerate a relatively high error rate.
Surface codes do have a shortcoming, however. Namely, they need lots of physical qubits to encode each logical qubit. This is why we're excited about the gross code, which we debuted last year in our landmark error correction paper in Nature. The gross code is related to surface codes and tolerates a similar level of noise, but is much more efficient, allowing several times as many logical qubits to be encoded with a given number of physical qubits. In simple terms, the gross code scales much better.
Quantum error correction fundamentals
The process of error correction itself can be broken down into three basic steps. The first step is that the physical qubits are measured — but this is done indirectly so as to leave the encoded state intact. More specifically, in error correction we introduce some new, initialized qubits, have them interact with the physical qubits being used for the encoding, and then measure them.
This gives us a string of bits called the syndrome, and the process of obtaining that bit-string is referred to as syndrome extraction. Much like the medical term for which it is named, the syndrome doesn't tell us exactly which errors took place (i.e., the underlying cause), but it gives us helpful information about what might have gone wrong (analogous to a list of symptoms).
The second step is decoding, which means processing the syndrome with a classical computation to try to figure out what correction operations, if any, to apply to the physical qubits to get them back to their original encoded state. This step is performed on a classical computer, the faster the better.
The last step is to actually perform the correction operations on the physical qubits and attempt to get them back to their original encoded state. The entire error correction process is repeated over and over in a race against noise to store quantum information reliably.
However, in a quantum computer, every operation is noisy, which means any operation can fail. We have to perform syndrome extraction and correction operations using components which themselves have a chance of being inaccurate or not working correctly, and we also have to be very careful that our QEC protocols don't have the effect of magnifying the rate of errors.
And of course, we don't just want to “protect” quantum information using quantum error correcting codes; we also want to perform computations on logical qubits. Once again, we face the same problem. Computations must be performed with quantum computing components that are imperfect and can magnify error rates if not done properly.
It’s really a fascinating theoretical and engineering challenge, and one of the reasons quantum error correction has inspired decades of research. How do you build something reliable out of unreliable parts?
Additional requirements for fault-tolerant quantum computations
Beyond merely getting our quantum computations to be reliable, fault-tolerant quantum computing also requires the implementation of a so-called universal set of quantum gates on the logical qubits we've encoded.
In simple terms, this is a rich enough collection of gates to give us the full computational power of quantum computing. For example, Hadamard gates, T gates, and controlled-NOT gates together give us one version of a universal gate set. If you can perform these three operations, then you can combine them together to create circuits that closely approximate any quantum computation.
As it turns out, some operations are easier to perform on encoded quantum information than others, and which operations are easy and which are hard can depend on the quantum error correcting code we choose. But no matter how you slice it — and there's a mathematical theorem that tells us this — for whatever code you choose, you're always going to have at least one gate in any universal gate set for which the task is quite tricky.
This is where so-called magic states, another discovery by Bravyi and Kitaev, perform their function. The idea is that certain operations on encoded quantum information can be performed in a fault-tolerant way — meaning that they can be done with unreliable components without magnifying the error rate and spoiling the computation — provided that we have additional qubits initialized to these very special states.
One way to think about magic states is that they're tiny little quantum programs pre-loaded into qubits. (Technically speaking, they work through a process known as gate teleportation.) A critically important aspect of them is that they can be prepared and tested independently, and then, once we have confidence that they're good, we can interact them with our physical qubits, with the end result being that computations are performed on the logical qubits that the physical qubits encode in a fault-tolerant way. They're not really magic, of course, but it is pretty amazing that this is possible in theory.
From an implementation perspective, there are many obstacles that we must overcome to reach the long-sought goal of building a reliable, fault-tolerant quantum computer through the methods described above.
Error rates for physical qubits must be sufficiently low for the error correcting code to work. The direct connections between qubits must allow both syndrome extraction and fault-tolerant operations to be performed reliably and quickly. Decoding must take place with extremely low latency, and the process of creating magic states and interacting them with physical qubits to perform computations on logical qubits must be perfected.
IBM is rising to this challenge, and we look forward to sharing our plans for achieving fault-tolerant quantum computing with you soon.