Why Your 'Optimized' Code Is Still Slow: Faster Time Comparison

1 month ago 2

Earlier this month, while benchmarking 🌶️ hot, an in-memory caching library, I discovered that the primary bottleneck was the time.Now() call used for expiration checking.

In data-intensive applications, every nanosecond matters. Cache layers can handle thousands of requests per second per core, making performance critical. While time.Now() is the standard choice for time measurement, it carries hidden performance costs that accumulate significantly in high-throughput systems.

Every call to time.Now() triggers a system call to the operating system kernel. System calls are expensive operations that involve:

Context switching from user space to kernel space
Kernel execution overhead
Context switching back to user space
CPU cache invalidation

In high-frequency scenarios where time measurements occur between thousands and millions of times per second, these system calls can become a significant bottleneck.

Go’s time package implements a sophisticated dual-clock system as described in the official documentation. The time.Time type contains two distinct clock readings:

Represents the actual calendar time
Subject to system clock adjustments (NTP synchronization, manual changes)
Can move backwards or jump forward
Essential for timestamps and scheduling

Always moves forward at a constant rate
Time since a point in time (eg: process start)
Unaffected by system clock adjustments
Perfect for measuring durations and intervals
Provides consistent elapsed time measurements

When you call time.Now(), Go reads both clocks and stores both values in the returned time.Time struct, with additional processing. However, for simple interval measurements, we often only need the monotonic component.

Since we only need one clock reading instead of two, we can optimize our measurements by directly accessing the monotonic clock, avoiding the overhead of the dual-clock system.

import “golang.org/x/sys/unix” func getMonotonicTimeSyscall() int64 { var ts unix.Timespec unix.ClockGettime(unix.CLOCK_MONOTONIC, &ts) return ts.Nano() }import “syscall” func getWallTimeSyscall() int64 { var tv syscall.Timeval syscall.Gettimeofday(&tv) return int64(tv.Sec)*1e6 + int64(tv.Usec) }import _ “unsafe” //go:linkname nanotime runtime.nanotime func nanotime() int64 func getMonotonicNanos() int64 { return nanotime() }

nanotime() is an internal Go runtime function that directly returns monotonic time in nanoseconds. It’s the fastest method but requires //go:linkname directive and comes with maintenance risks as it’s an internal API.

nanotime() has OS-specific implementations. Use this abstraction rather than calling underlying syscalls directly.

import “time” // costly, but executed once, at boot var startTime = time.Now() func getMonotonicDuration() time.Duration { return time.Since(startTime) }

As you can see in the Golang source code, the time.Since(…) first measure duration using monotonic time:

// https://cs.opensource.google/go/go/+/refs/tags/go1.25.1:src/time/time.go;l=1223 func Since(t Time) Duration { if t.wall&hasMonotonic != 0 && !runtimeIsBubbled() { // Common case optimization: if t has monotonic time, then Sub will use only it. return subMono(runtimeNano()-startNano, t.ext) } return Now().Sub(t) }

Some CPU architectures or OS offer more methods:

VDSO optimized calls
rdtsc
CPU Performance Counters (TSC)
ASM
CGO
...

Since it is against the requirements of 🌶️ hot, my in-memory caching library, it won’t be covered here.

Monotonic clocks typically offer higher precision and lower jitter compared to wall clocks, making them ideal for performance measurements and fine-grained timing operations.

// go test -benchmem -benchtime=10000000x -bench=. ./... //go:linkname nanotime runtime.nanotime func nanotime() int64 var startTime = time.Now() func BenchmarkTime(b *testing.B) { b.Run(”timeTime”, func(b *testing.B) { for n := 0; n < b.N; n++ { _ = time.Now() } }) b.Run(”Method1”, func(b *testing.B) { for n := 0; n < b.N; n++ { var ts unix.Timespec _ = unix.ClockGettime(unix.CLOCK_MONOTONIC, &ts) } }) b.Run(”Method2”, func(b *testing.B) { for n := 0; n < b.N; n++ { var tv syscall.Timeval _ = syscall.Gettimeofday(&tv) } }) b.Run(”Method3”, func(b *testing.B) { for n := 0; n < b.N; n++ { _ = nanotime() } }) b.Run(”Method4”, func(b *testing.B) { for n := 0; n < b.N; n++ { _ = time.Since(startTime).Nanoseconds() } }) }

Results:

goos: darwin
goarch: arm64
pkg: github.com/samber/hot/bench
cpu: Apple M3
timeTime-8 10000000 36.82 ns/op 0 B/op 0 allocs/op
Method1-8 10000000 37.24 ns/op 0 B/op 0 allocs/op
Method2-8 10000000 12.31 ns/op 0 B/op 0 allocs/op
Method3-8 10000000 12.32 ns/op 0 B/op 0 allocs/op
Method4-8 10000000 13.03 ns/op 0 B/op 0 allocs/op

The benchmark demonstrates that time.Now() and its underlying syscall for retrieving wall clock time is approximately 2.5 times more expensive than reading monotonic time. Method 4 should be preferred for cross-platform compatibility and to prevent future changes.

While time.Now() serves as an excellent general-purpose time function, understanding its dual-clock nature and associated costs opens up optimization opportunities. For applications that frequently measure time intervals, bypassing the wall clock component and directly accessing monotonic time can provide significant performance improvements.

I benchmarked multiple popular in-memory caching libraries, with key expiration enabled. To date, https://github.com/samber/hot 🌶️ is by far one of the 3 fastest. I will publish my results very soon. ;)

PS: I made the benchmark during a massive heat wave in France. My computer might be slower than yours! 🥵