Recently I came across an interesting paper, Coz: Finding Code that Counts with Causal Profiling. The idea is surprisingly counterintuitive: optimizing the hottest code often doesn’t make your program faster.
Why? Not all time consuming work lies on the critical path. Some functions run in parallel with others, do background tasks, or get blocked by I/O. Speeding them up might just make idle threads wait longer with no real throughput gain.
Instead of directly optimizing code, Coz injects tiny random delays in other parts of the program to simulate what would happen if a specific function were faster:
Suppose function F takes a large part of total runtime
Coz adds microsecond-scale sleeps everywhere except F
That makes everything else slightly slower, making F seem faster, a “virtual speedup”
By measuring how throughput or latency changes with these simulated speedups, Coz estimates how much F truly affects performance.
In one SQLite experiment, Coz found three simple functions with unusually high indirect call overhead. They didn’t do much, but their function pointer indirection was expensive. Replacing those indirect calls with direct ones led to a nice speedup. Pretty wild for something so small.
It’s easy to assume profilers already track wall time or critical paths, but doing that efficiently with low overhead is hard. Most focus on CPU time instead. Causal profiling offers a clever way to see what really drives performance. The code is open source, so you can try it out. It doesn’t work with interpreted languages yet.
.png)

