By Jacob Strieb.
Published on November 4, 2025.
I had a friendly disagreement the other day. Imagine looking for ARM (Thumb mode) instructions in a stream of random bytes. A friend claimed that the random stream is more likely to contain DEFLATE-compressed Thumb instructions than it is to contain uncompressed Thumb instructions.1 I disagreed.
The gist of his argument was that random bytes have high Shannon Entropy in expectation, and compressed bytes have higher entropy than raw assembled instructions. Thus, compressed streams of assembled instructions will be more likely to occur than raw assembled instructions in sequences of random bytes. I don’t debate the premise, but the conclusion does not follow.
I argued that the joint probability of random bytes being a valid DEFLATE stream—and that stream inflating to valid instructions—is lower than the probability of random bytes being valid instructions. Having written a DEFLATE/INFLATE implementation from scratch, I know how many ways there are for compressed data to be invalid. I also know that ARM’s Thumb instruction set is designed for high code density, which directly implies that a large proportion of possible byte sequences disassemble as valid Thumb instructions. Just because the entropy of typical Thumb code in the wild is lower than that of typical compressed data doesn’t mean that the theoretical maximum entropy of valid Thumb is lower too. He was not convinced. I was nerd sniped.
Since he has a PhD in an applied field of computer science, I figured he is most likely to be compelled by strong experimental results in a poorly-written, LaTeX-typeset paper with lots of graphs that he can skim and pretend to have read fully. (I’m only mostly joking.)
Here is a link to the typeset PDF research paper version of this post.
AI Usage Disclosure
Large Language Models (LLMs) were not used for any part of this research. The code and prose are entirely human-written. I wrote a separate post detailing some opinions about LLMs that you should read if you want to know more.
We write Zig code to generate random byte sequences. Then we try to disassemble the bytes using Capstone. We also try to inflate and then disassemble the bytes. We record successes and failures to output statistics at the end.
We test code samples with Zig version 0.14.1.2 Zig changes fast, so it is likely that code samples will need to be modified to work on other versions.
The full code repository is available on GitHub.
Capstone and Zig
Zig’s build system makes it easy to automatically clone, compile, and link against Capstone.3 Assuming a build.zig file already exists in the working directory (use zig init to create an empty project if not), the following command updates the build.zig.zon file with Capstone as a dependency.
Then we add the following to build.zig.4
In our main.zig file, we import and use Capstone.5 Our Capstone.disassemble method returns a usize that represents the percentage of disassembled instruction bytes (between 0 and 100, inclusive). We only care the proportion of bytes that successfully disassemble; we don’t care about the disassembled instructions themselves.
Generating Random Bytes
We generate random bytes using ChaCha seeded by cryptographic randomness. ChaCha is a fast CSPRNG, so it outputs high-quality random bytes, but won’t slow down our hot loop.
Inflating Bytes
In Zig 0.14.1, the standard library INFLATE implementation operates on reader and writer streams.6 We build those streams out of pre-allocated buffers, one of which contains the random input data.
Parallelizing
To maximize throughput, we run our main loop across all CPU cores. We keep statistics local to each thread, only compiling global results from the thread-local results when the loop terminates. That way the hot loop is lock-free, and we avoid overhead from contention. The downside of this strategy is that we can’t print aggregated statistics while the trials are running.7
Collecting Decompression Errors
To collect decompression errors, we use Zig compile-time (comptime) metaprogramming. At comptime, we build an array of all possible errors returned from the decompression function. We use that array to initialize a corresponding array of counts, where we tally the number of occurrences of each error.
Notice that to reify an error from its name, we use @field(anyerror, name). This works because anyerror can be considered an enum that collects all errors reachable in the compilation unit.8 Hence, every possible Zig error is a field of that global enum.
Reifying errors from their names is important for inlining the mapping between returned errors and their corresponding index in the counts array. It is faster to check the equality of the received error and the expected error directly than to serialize the received error to a name and then do string comparison.
Final Script
The full script that combines each of these parts is available in the GitHub repository.
The most important result is that successful disassembly is over 125x more common than successful decompression, and over 350x more common than decompression and disassembly.
| Total | 1,000,000,000 |
| Disassembled | 642,628,412 |
| Decompressed | 5,037,627 |
| Decompressed and disassembled | 1,810,170 |
Disassembly Results
Given that about 65% of 128-byte buffers consist of at least 95% valid Thumb instructions, there are two natural questions: since most Thumb instructions are two bytes long, what proportion of 2-byte sequences are valid Thumb instructions? And what percentage of 128-byte buffers consist entirely of valid Thumb instructions?
| Total | 1,000,000,000 |
| Disassembled | 892,800,538 |
According to our measurements, any 2-byte sequence has an 89.3% chance of disassembling as a valid Thumb instruction. Evidently, Thumb has very high code density.
Note that some few Thumb instructions assemble as four bytes, so we cannot assume that every pair of bytes has an independent, identically (Bernoulli) distributed probability of being part of a valid assembly sequence. Our measurements indicate that 855,008,565 (or 85.5%) of 1,000,000,000 4-byte buffers disassembled completely as valid Thumb instructions, compared with the 0.8922 ≈ 79.7% rate we would expect if all Thumb instructions were 2-bytes long.
Though we can’t assume every pair of bytes is independent of every other pair, we may assume that every 4-byte sequence is independent and identically distributed in terms of the probability of successful disassembly. Using this assumption, we can approximately predict the proportion of fully disassembled 128-byte sequences.
0.855008565(128/4) = 0.006653572 ≈ 6.65%
Compare our prediction with the actual, measured incidence of 4.47% of 128-byte buffers that completely disassemble.
| Total | 1,000,000,000 |
| Disassembled | 44,726,130 |
| Decompressed | 5,036,173 |
| Decompressed and disassembled | 1,577,559 |
Even when we require 100% of bytes to disassemble as valid Thumb instructions, there are still an order of magnitude more buffers that disassemble than decompress.
If we assume groups of four consecutive bytes have an independent and identically distributed probability of successful disassembly, we expect the likelihood of completely disassembling a buffer to decay exponentially relative to the size of the buffer. In other words, the probability of finding a sequence that does not disassemble should quickly approach 1 as the size of the buffer increases. The experimental results validate that theoretical prediction.
Also, unsurprisingly, as the minimum threshold for successful disassembly decreases, the proportion of successes increases for each buffer size.
Decompression Results
Based on our experiments, we know why disassembly is likely to succeed. But, independent of successful disassembly, why is decompression likely to fail?
The rate of successful decompression is fairly constant for buffers above 32 bytes in size.
This suggests that the majority of errors from unsuccessful decompression happen in the first 32 bytes of the buffer, which makes sense, since that is where the small DEFLATE header would be found for valid compressed streams. In other words, it’s unlikely that random bytes make a valid DEFLATE header. But if the header is valid, the rest is likely to successfully decompress, regardless of buffer size.
The errors returned when decompression fails corroborate the hypothesis that failures are primarily caused by invalid DEFLATE metadata.
| Total | 1,000,000,000 |
| Success | 5,030,965 |
| Wrong Stored Block Nlen | 251,255,219 |
| Invalid Block Type | 251,254,609 |
| Invalid Match | 214,744,118 |
| Oversubscribed Huffman Tree | 146,011,334 |
| Incomplete Huffman Tree | 74,024,330 |
| Invalid Dynamic Block Header | 31,141,638 |
| Invalid Code | 26,438,043 |
| Missing End of Block Code | 95,816 |
| End of Stream | 3,928 |
To understand these errors completely, we must explore the structure of the DEFLATE stream.
DEFLATE Structure
A DEFLATE stream consists of one or more blocks. The first bit of every block denotes whether it is the final block in the stream.9 If the first bit is 1, the decompressor terminates when it sees an “end of block” token. On the other hand, if the first bit is 0, the decompressor will look for another valid DEFLATE header after an end of block token in the stream. Thus, in that case, there is an additional opportunity for any of the errors that could have caused decompression to fail on the first block header.
As a result, we can reduce the failure rate for most classes of error by fixing the first bit of the stream to be 1, so that only one valid header is required, instead of two or more.
| First bit 0 | First bit 1 | |
| Total | 1,000,000,000 | 1,000,000,000 |
| Success | 56,372 | 10,010,648 |
| Wrong Stored Block Nlen | 252,530,381 | 250,005,035 |
| Invalid Block Type | 252,495,113 | 249,999,800 |
| Invalid Match | 215,816,449 | 213,698,259 |
| Oversubscribed Huffman Tree | 146,737,762 | 145,281,110 |
| Incomplete Huffman Tree | 74,398,039 | 73,635,423 |
| Invalid Dynamic Block Header | 31,290,580 | 30,987,133 |
| Invalid Code | 26,574,767 | 26,283,441 |
| Missing End of Block Code | 96,424 | 95,216 |
| End of Stream | 4,113 | 3,935 |
The second and third bits of the DEFLATE block header denote how the data is encoded.10 There are four possible options for these two bits, and three of the four options are considered valid.
A value of 11 for the second and third bits is considered invalid, and the decompressor will return an “invalid block type” error in that case. Because this pattern occurs for one in four random streams (in expectation), it accounts for approximately 25% of the attempts that end in decompression failure. Setting the second and third bits to 11 makes 100% of decompression attempts fail.
| Total | 1,000,000,000 |
| Success | 0 |
| Invalid Block Type | 1,000,000,000 |
If the second and third bits are 00, the data is included “raw” – it is not encoded or compressed before being included in the stream. In this case, the second and third bytes of the stream represent the length of the raw data, and the two bytes after that are their ones’ complement (logical negation). According to the DEFLATE specification, the data after the first byte in a raw DEFLATE stream looks like this:
2 3 4 5 6... +---+---+---+---+=================================+ | LEN | NLEN |... LEN bytes of literal data ...| +---+---+---+---+=================================+The likelihood of the fourth and fifth bytes being the exact ones’ complement of the second and third bytes is:
1 / 216 ≈ 0.0015258789%
The extremely low probability of success for the raw block type explains another ~25% of decompression attempts that end in failure with the “wrong stored block nlen” error. Setting the second and third bits to 00 makes almost all of the decompression attempts fail.
| Total | 1,000,000,000 |
| Success | 24 |
| Wrong Stored Block Nlen | 999,984,702 |
| End of Stream | 15,274 |
In the other two block modes, raw data is first LZ77-encoded, then Huffman-encoded to compress it. In Huffman coding, an efficient bit representation of each symbol is generated from the frequency distribution of symbols in the uncompressed data. Symbols that appear more frequently will be represented using fewer bits. The rules for decoding symbols from bits are stored as a “Huffman tree” data structure.
If the second and third bits of the DEFLATE stream are 01, the data is decoded using a hard-coded (“static”) Huffman tree whose structure is specified in the DEFLATE specification. If the second and third bits are 10 instead, the Huffman tree was built dynamically, and is included in the stream itself. Using the hard-coded Huffman tree typically leads to worse compression ratios than using a dynamic Huffman tree, but the static tree was more useful for our purpose of decompressing random bytes since it presents fewer opportunities for erroneous header data.
| Dynamic Huffman Code | Static Huffman Code | |
| Total | 1,000,000,000 | 1,000,000,000 |
| Success | 0 | 40,055,267 |
| End of Stream | 233 | 570 |
| Invalid Code | 15,254 | 105,125,696 |
| Oversubscribed Huffman Tree | 581,086,654 | 0 |
| Incomplete Huffman Tree | 294,586,455 | 0 |
| Missing End of Block Code | 380,197 | 0 |
| Invalid Match | 0 | 854,818,467 |
| Invalid Dynamic Block Header | 123,931,207 | 0 |
In order to refute the strongest version of the claim that decompressing and then disassembling random bytes is more likely to succeed than just disassembling random bytes, we re-run our initial test after fixing the first three bits of the random buffer to 110. That way the otherwise random bytes are always considered the final block in the stream, and always use static Huffman codes to avoid header errors.
| Total | 1,000,000,000 |
| Disassembled | 44,724,849 |
| Decompressed | 40,056,259 |
| Decompressed and disassembled | 12,543,383 |
Though the rates of success are much closer, even in this most generous case, disassembly is more likely than decompression, and once again much more likely than decompression and disassembly.
Other Architectures
Besides ARM in Thumb mode, Capstone supports disassembling a wide variety of other architectures and modes. Using C macro reflection in Zig, we can re-run our randomized disassembly across all available architectures and modes.
In many architectures, instruction disassembly is stateful, and is conditioned on previous instructions (especially when assembled instructions have variable length, such as in x86). As a result, many other architectures do not admit the assumption that aligned, fixed-sized buffer subsections have an independent probability of being disassembled successfully.
Performance
Overall, the Monte Carlo code that powers these “experiments” is very performant. Zig is a fast language, and even without any deliberate optimization, the code can disassemble and decompress one billion 128-byte buffers in about a half hour on the 10-year-old, 48-core server we used for testing.
jacob@jacob:random-instructions$ time ./zig-out/bin/random_instructions --total-iterations 1_000_000_000 --disassembly-threshold 100 [ ... ] real 30m35.158s user 1396m34.236s sys 1m0.426sMost of the computation time was spent in Capstone. Without profiling in detail, we suspect that overhead from dynamic memory allocation performed inside of Capstone during disassembly accounts for a significant portion of that time. The code is fast enough for testing that it wasn’t worth experimenting with different allocator implementations in Capstone. It likewise wasn’t worth doing any detailed profiling.
The program’s running times scaled approximately linearly with the size of the input buffer. There was no evidence of the disassembly threshold impacting the running time, nor was there an impact on decompression time resulting from fixing particular starting bits.
Adding evidence to the theory that Capstone was the slowest part of the code, the throughput of a different script that only decompressed without attempting to disassemble was much higher. In a randomly chosen run, that script decompressed one billion 128-byte buffers in 5 minutes and 46 seconds.
jacob@jacob:random-instructions$ time ./zig-out/bin/random_inflate --total-iterations 1_000_000_000 [ ... ] real 5m46.619s user 261m22.503s sys 1m6.833s jacob@jacob:random-instructions$ time ./zig-out/bin/random_inflate --total-iterations 1_000_000_000 --first-bits 0b011 --num-bits 3 [ ... ] real 5m47.705s user 262m18.896s sys 1m6.702sIn other words, without any particular optimizations, that program had a decompression throughput upper-bounded by:11
(128 × 1000000000) / (5 × 60 + 46) = 369942197 bytes per second ≈ 345 MB/s
The code and raw data used to generate the graphs are on GitHub.
Our theoretical and experimental results point clearly to one conclusion: Thumb instructions have very high code density, and are much more likely to occur in sequences of random bytes than compressed data (especially more likely than compressed data containing valid Thumb instructions).
Even in cases where we artificially modify the random data before decompressing it to make it as likely as possible to decompress, it is still less likely to decompress than disassemble.
Zig makes it fun to write fast code, and was pretty awesome to use for this exploratory analysis. If you’ve not heard of it, you should definitely check it out.
Other ISAs supported by Capstone show varying code density, and it would be worth researching better candidates than Thumb for maximizing the likelihood of successful disassembly of random byte sequences. For example, does M680X actually have absurdly high code density, or was there a methodological error or code bug artificially inflating its disassembly success rate?
We have done no work to assess whether the instruction sequences appearing in our random buffers are of particular interest. It would be unsurprising if an instruction set with relatively low code density (and therefore low rate of random disassembly success) might have a higher likelihood of useful instruction sequences appearing in random byte streams, compared with a dense instruction set with a high likelihood of disassembly.
There might be other ways to cook up the header of a DEFLATE stream to maximize the decompression likelihood of an otherwise random stream. For example, is it possible to craft a “dynamic” Huffman tree (that we insert at the beginning of every random sequence) that maximizes the likelihood of successful decompression?
If anyone continues this research, please email me or use my contact form to notify me of your results.
Nice try, but this isn’t actually a real research paper. I hope you weren’t hoping for APA-style citations or any such nonsense. The important references are linked inline.
Thanks to Logan Snow and Amy Liu for reading early versions of this post.
Shout out to my anonymous friend whose incorrect—but strongly-held—convictions inspired this work in the first place.
.png)

