A high-performance, generic Hierarchical Timing Wheel implementation in Go for efficient timer management at scale.
When you need to manage thousands or millions or timers (timeouts, TTLs, scheduled tasks) etc. While Go's standard time.Timer is excellent, it faces some challenges at high volumes.
Example: How a Timing Wheel Efficiently Expires 10 Million TTL-Based Cache Keys
Traditional cache cleanup scans every key (O(n)), which works fine at small scale — but completely breaks down once you hit millions of entries.
Using a Timing Wheel , expiration becomes O(1) per tick — processing only the keys that are actually due right now.
Read Stall Comparison (10 Million Keys)
Metric
Naive Scan
Timing Wheel
Avg Read Latency
4.68 ms
3.15 µs
Max Read Stall
500 ms
≈ 2 ms
At 10 million keys , a naive cleanup can stall reads for seconds — while the Timing Wheel glides through them in microseconds .
Read the full story on Medium:
Killing O(n): How Timing Wheels Expire 10 Million Keys Effortlessly in Golang
go get github.com/ankur-anand/taskwheel
package main
import (
"fmt"
"time"
"github.com/ankur-anand/taskwheel"
)
func main () {
// Create hierarchical timing wheel
intervals := []time.Duration {10 * time .Millisecond , 1 * time .Second }
slots := []int {100 , 60 }
wheel := taskwheel .NewHierarchicalTimingWheel [string ](intervals , slots )
// Start the wheel
stop := wheel .Start (10 * time .Millisecond , func (timer * taskwheel.Timer [string ]) {
fmt .Printf ("Timer fired: %s\n " , timer .Value )
})
defer stop ()
// Schedule timers
wheel .AfterTimeout ("task1" , "Process payment" , 100 * time .Millisecond )
wheel .AfterTimeout ("task2" , "Send email" , 500 * time .Millisecond )
time .Sleep (1 * time .Second )
}
High-Throughput Usage (10,000+ timers/sec)
For production systems with high timer volumes, use StartBatch() with a worker pool:
package main
import (
"fmt"
"runtime"
"sync"
"time"
"github.com/ankur-anand/taskwheel"
)
func main () {
intervals := []time.Duration {10 * time .Millisecond , 100 * time .Millisecond , time .Second }
slots := []int {10 , 100 , 60 }
wheel := taskwheel .NewHierarchicalTimingWheel [string ](intervals , slots )
// Create worker pool
workerPool := make (chan * taskwheel.Timer [string ], 1000 )
var wg sync.WaitGroup
numWorkers := runtime .NumCPU () * 2
for i := 0 ; i < numWorkers ; i ++ {
wg .Add (1 )
go func () {
defer wg .Done ()
for timer := range workerPool {
// process timer
processTask (timer )
}
}()
}
// start with batch callback
stop := wheel .StartBatch (10 * time .Millisecond , func (timers []* taskwheel.Timer [string ]) {
for _ , t := range timers {
workerPool <- t
}
})
// schedule timers
for i := 0 ; i < 10000 ; i ++ {
wheel .AfterTimeout (
taskwheel .TimerID (fmt .Sprintf ("task-%d" , i )),
fmt .Sprintf ("Task %d" , i ),
time .Duration (i )* time .Millisecond ,
)
}
time .Sleep (15 * time .Second )
stop ()
close (workerPool )
wg .Wait ()
}
func processTask (timer * taskwheel.Timer [string ]) {
// business logic here
}
goos: darwin
goarch: arm64
pkg: github.com/ankur-anand/taskwheel
cpu: Apple M2 Pro
BenchmarkNativeTimers/1K_timers_100ms-10 10 102317362 ns/op 136000 B/op 2000 allocs/op
BenchmarkNativeTimers/10K_timers_100ms-10 10 113182400 ns/op 1360000 B/op 20000 allocs/op
BenchmarkNativeTimers/100K_timers_1s-10 1 1093996708 ns/op 13600000 B/op 200000 allocs/op
BenchmarkTimingWheelAfterTimeout/1K_timers_100ms-10 10 110006242 ns/op 214488 B/op 1056 allocs/op
BenchmarkTimingWheelAfterTimeout/10K_timers_100ms-10 10 110001250 ns/op 1973784 B/op 10181 allocs/op
BenchmarkTimingWheelAfterTimeout/100K_timers_100ms-10 9 121230764 ns/op 18755629 B/op 101093 allocs/op
BenchmarkMemoryComparison/Native_10K_timers-10 10 111245146 ns/op 1336540 B/op 20001 allocs/op
BenchmarkMemoryComparison/TimingWheel_10K_timers-10 10 109956992 ns/op 1973784 B/op 10181 allocs/op
BenchmarkTimingWheel_Memory/Timers_100000-10 67 19544636 ns/op 9665561 B/op 600001 allocs/op
BenchmarkTimingWheel_Memory/Timers_1000000-10 6 199790202 ns/op 96065897 B/op 6000003 allocs/op
BenchmarkTimingWheel_Memory/Timers_10000000-10 1 1807187459 ns/op 960067416 B/op 60000009 allocs/op
BenchmarkNativeTimer_Memory/Timers_100000-10 140 12572099 ns/op 15329470 B/op 100001 allocs/op
BenchmarkNativeTimer_Memory/Timers_1000000-10 13 145825920 ns/op 151103566 B/op 1000001 allocs/op
BenchmarkNativeTimer_Memory/Timers_10000000-10 1 1309542667 ns/op 1705383936 B/op 10000002 allocs/op
Workload
Metric
NativeTimers
TimingWheel
Difference
1K timers (100ms)
Time/op
102 ms
110 ms
+8% slower
Mem/op
136 KB
214 KB
+58% more
Allocs/op
2.0 K
1.1 K
-47% fewer
10K timers (100ms)
Time/op
113 ms
110 ms
-3% faster
Mem/op
1.36 MB
1.97 MB
+45% more
Allocs/op
20.0 K
10.2 K
-49% fewer
100K timers (1s)
Time/op
1.09 s
0.12 s
-89% faster
Mem/op
13.6 MB
18.8 MB
+38% more
Allocs/op
200.0 K
101.1 K
-50% fewer
At very small scales (≈1K timers) , TimingWheel shows a slight overhead (+8% slower, more memory).
But once you reach 10K+ timers , it matches or beats native performance, consistently cuts allocations ~50%,
and at 100K timers it’s ~9× faster with half the allocations.
type Task struct {
UserID string
Action string
Priority int
}
wheel := taskwheel .NewHierarchicalTimingWheel [Task ](intervals , slots )
wheel .AfterTimeout ("task1" , Task {
UserID : "user123" ,
Action : "send_email" ,
Priority : 1 ,
}, 5 * time .Second )
Priority-Based Processing
stop := wheel .StartBatch (10 * time .Millisecond , func (timers []* taskwheel.Timer [Task ]) {
// Sort by priority
sort .Slice (timers , func (i , j int ) bool {
return timers [i ].Value .Priority > timers [j ].Value .Priority
})
for _ , t := range timers {
processTask (t )
}
})
MIT License - see LICENSE file for details
Inspired by: