From O(n) to O(1): Expiring 10M TTL Keys in Go

1 month ago 7

A high-performance, generic Hierarchical Timing Wheel implementation in Go for efficient timer management at scale.

When you need to manage thousands or millions or timers (timeouts, TTLs, scheduled tasks) etc. While Go's standard time.Timer is excellent, it faces some challenges at high volumes.

Example: How a Timing Wheel Efficiently Expires 10 Million TTL-Based Cache Keys

Traditional cache cleanup scans every key (O(n)), which works fine at small scale — but completely breaks down once you hit millions of entries.

Using a Timing Wheel, expiration becomes O(1) per tick — processing only the keys that are actually due right now.

Read Stall Comparison (10 Million Keys)

MetricNaive ScanTiming Wheel

Avg Read Latency	4.68 ms	3.15 µs
Max Read Stall	500 ms	≈ 2 ms

At 10 million keys, a naive cleanup can stall reads for seconds — while the Timing Wheel glides through them in microseconds.

Read the full story on Medium:
Killing O(n): How Timing Wheels Expire 10 Million Keys Effortlessly in Golang

go get github.com/ankur-anand/taskwheel

package main import ( "fmt" "time" "github.com/ankur-anand/taskwheel" ) func main() { // Create hierarchical timing wheel intervals := []time.Duration{10 * time.Millisecond, 1 * time.Second} slots := []int{100, 60} wheel := taskwheel.NewHierarchicalTimingWheel[string](intervals, slots) // Start the wheel stop := wheel.Start(10*time.Millisecond, func(timer *taskwheel.Timer[string]) { fmt.Printf("Timer fired: %s\n", timer.Value) }) defer stop() // Schedule timers wheel.AfterTimeout("task1", "Process payment", 100*time.Millisecond) wheel.AfterTimeout("task2", "Send email", 500*time.Millisecond) time.Sleep(1 * time.Second) }

High-Throughput Usage (10,000+ timers/sec)

For production systems with high timer volumes, use StartBatch() with a worker pool:

package main import ( "fmt" "runtime" "sync" "time" "github.com/ankur-anand/taskwheel" ) func main() { intervals := []time.Duration{10 * time.Millisecond, 100 * time.Millisecond, time.Second} slots := []int{10, 100, 60} wheel := taskwheel.NewHierarchicalTimingWheel[string](intervals, slots) // Create worker pool workerPool := make(chan *taskwheel.Timer[string], 1000) var wg sync.WaitGroup numWorkers := runtime.NumCPU() * 2 for i := 0; i < numWorkers; i++ { wg.Add(1) go func() { defer wg.Done() for timer := range workerPool { // process timer processTask(timer) } }() } // start with batch callback stop := wheel.StartBatch(10*time.Millisecond, func(timers []*taskwheel.Timer[string]) { for _, t := range timers { workerPool <- t } }) // schedule timers for i := 0; i < 10000; i++ { wheel.AfterTimeout( taskwheel.TimerID(fmt.Sprintf("task-%d", i)), fmt.Sprintf("Task %d", i), time.Duration(i)*time.Millisecond, ) } time.Sleep(15 * time.Second) stop() close(workerPool) wg.Wait() } func processTask(timer *taskwheel.Timer[string]) { // business logic here }

goos: darwin goarch: arm64 pkg: github.com/ankur-anand/taskwheel cpu: Apple M2 Pro BenchmarkNativeTimers/1K_timers_100ms-10 10 102317362 ns/op 136000 B/op 2000 allocs/op BenchmarkNativeTimers/10K_timers_100ms-10 10 113182400 ns/op 1360000 B/op 20000 allocs/op BenchmarkNativeTimers/100K_timers_1s-10 1 1093996708 ns/op 13600000 B/op 200000 allocs/op BenchmarkTimingWheelAfterTimeout/1K_timers_100ms-10 10 110006242 ns/op 214488 B/op 1056 allocs/op BenchmarkTimingWheelAfterTimeout/10K_timers_100ms-10 10 110001250 ns/op 1973784 B/op 10181 allocs/op BenchmarkTimingWheelAfterTimeout/100K_timers_100ms-10 9 121230764 ns/op 18755629 B/op 101093 allocs/op BenchmarkMemoryComparison/Native_10K_timers-10 10 111245146 ns/op 1336540 B/op 20001 allocs/op BenchmarkMemoryComparison/TimingWheel_10K_timers-10 10 109956992 ns/op 1973784 B/op 10181 allocs/op BenchmarkTimingWheel_Memory/Timers_100000-10 67 19544636 ns/op 9665561 B/op 600001 allocs/op BenchmarkTimingWheel_Memory/Timers_1000000-10 6 199790202 ns/op 96065897 B/op 6000003 allocs/op BenchmarkTimingWheel_Memory/Timers_10000000-10 1 1807187459 ns/op 960067416 B/op 60000009 allocs/op BenchmarkNativeTimer_Memory/Timers_100000-10 140 12572099 ns/op 15329470 B/op 100001 allocs/op BenchmarkNativeTimer_Memory/Timers_1000000-10 13 145825920 ns/op 151103566 B/op 1000001 allocs/op BenchmarkNativeTimer_Memory/Timers_10000000-10 1 1309542667 ns/op 1705383936 B/op 10000002 allocs/op

Workload Metric NativeTimers TimingWheel Difference

1K timers (100ms)	Time/op	102 ms	110 ms	+8% slower
	Mem/op	136 KB	214 KB	+58% more
	Allocs/op	2.0 K	1.1 K	-47% fewer
10K timers (100ms)	Time/op	113 ms	110 ms	-3% faster
	Mem/op	1.36 MB	1.97 MB	+45% more
	Allocs/op	20.0 K	10.2 K	-49% fewer
100K timers (1s)	Time/op	1.09 s	0.12 s	-89% faster
	Mem/op	13.6 MB	18.8 MB	+38% more
	Allocs/op	200.0 K	101.1 K	-50% fewer

At very small scales (≈1K timers), TimingWheel shows a slight overhead (+8% slower, more memory).
But once you reach 10K+ timers, it matches or beats native performance, consistently cuts allocations ~50%, and at 100K timers it’s ~9× faster with half the allocations.

type Task struct { UserID string Action string Priority int } wheel := taskwheel.NewHierarchicalTimingWheel[Task](intervals, slots) wheel.AfterTimeout("task1", Task{ UserID: "user123", Action: "send_email", Priority: 1, }, 5*time.Second)

Priority-Based Processing

stop := wheel.StartBatch(10*time.Millisecond, func(timers []*taskwheel.Timer[Task]) { // Sort by priority sort.Slice(timers, func(i, j int) bool { return timers[i].Value.Priority > timers[j].Value.Priority }) for _, t := range timers { processTask(t) } })

MIT License - see LICENSE file for details

Inspired by:

Read Entire Article