Critical TTL patterns for in-memory caching

1 hour ago 2

Most caching libraries get TTL expiration wrong. They focus on per-key complexity while missing the patterns that actually prevent production outages: cache stampedes after bulk operations, blocking refresh latency, and synchronized warmup expirations.

After hitting these limitations in existing libraries, I built 🌶️ hot to implement the sophisticated expiration strategies production systems actually need.

Here are the 3 critical techniques that will make your cache bulletproof.

The critical problem jitter solves happens after cache.SetMany(...). Load 10,000 keys simultaneously? They all expire at exactly the same moment, creating a cache stampede that can crash your database.

cache := hot.NewHotCache[string, *Product](hot.LRU, 100_000). WithTTL(10 * time.Minute). Build() // Load 10,000 products at startup products := fetchAllProducts() // DANGER: All keys expire simultaneously in exactly 10 minutes cache.SetMany(products) // t+10min: 10,000 simultaneous cache misses = database meltdown

🌶️ hot uses exponential distribution to desynchronize expirations:

cache := hot.NewHotCache[string, *Product](hot.LRU, 100_000). WithTTL(10 * time.Minute). WithJitter(2.0, 5*time.Minute). // lambda=2.0, spread over 5min Build() // Same bulk operation cache.SetMany(products) // Result: Keys expire between 10-15 minutes // Instead of 10,000 simultaneous misses → ~2,000 per minute

Why the exponential distribution works:

Most keys expire near base TTL (maintains freshness)
Some expire later (prevents synchronization)
Creates a natural decay curve (no spikes)

Using an equally distributed algorithm within a 5-minute interval would result in an average expiration time of 2 minutes and 30 seconds, effectively reintroducing the synchronization you were trying to avoid.

Monitor with hit ratio: If your hit ratio stays stable over time, jitter is working. Sudden drops indicate stampedes.

Traditional caching libraries block on refresh: miss → wait 50ms → return fresh data. Revalidation immediately returns expired stale data while refreshing in the background.

Perfect for frequently accessed keys where blocking latency hurts user experience.

loader := func(keys []string) (map[string]*User, error) { return fetchUsersFromDatabase(keys) // May take 100-500ms } cache := hot.NewHotCache[string, *User](hot.LRU, 100_000). WithTTL(15 * time.Minute). // Base TTL WithRevalidation(3*time.Minute, loader). // Stale window: 3min WithRevalidationErrorPolicy(hot.KeepOnError). // Keep stale on errors Build()

Timeline for popular keys:

t+0 to t+15min: Fresh data served from cache
t+15 to t+18min: Stale data served **instantly** while background refresh runs
Refresh success: New fresh data available
Refresh failure: Stale data continues (KeepOnError policy)

Never exceed 20% of base TTL for revalidation window. 3 minutes for 15-minute TTL = 20%. Going higher serves data that’s too stale.

hot.KeepOnError: Perfect for centralized caches (Redis, Memcached). When the central system fails, stale local data is better than no data.

hot.DropOnError: Use for data that must be fresh (real-time prices, inventory). Stale data could cause business problems.

Warmup preloads frequently accessed keys at startup. **But you MUST use jitter or create a massive stampede when all warmed keys expire simultaneously.**

// DANGEROUS: Warmup without jitter cache := hot.NewHotCache[string, *Product](hot.LRU, 100_000). WithTTL(30 * time.Minute). WithWarmUp(func() (map[string]*Product, []string, error) { return fetchTop1000Products() // Load 1000 popular products }). Build() // t+0: App starts, loads 1000 products // t+30min: ALL 1000 expire simultaneously → database crashwarmupLoader := func() (map[string]*Product, []string, error) { // Load most popular products based on analytics return fetchPopularProducts(2000) } cache := hot.NewHotCache[string, *Product](hot.LRU, 200_000). WithTTL(5 * time.Minute). // Base TTL WithJitter(2.0, 1*time.Minute). // CRITICAL: 5-6min spread WithWarmUpWithTimeout(10*time.Second, warmupLoader). // 10s timeout Build()

Analytics-driven warmup: Use monitoring to identify your actual hot keys, not guesses.

Even with perfect TTL and jitter, concurrent cache misses can trigger simultaneous refreshes. To prevent this, you need request deduplication. Otherwise, 100 concurrent requests for the same expired key could result in 100 parallel database queries.

🌶️ hot already implements deduplication under the hood.

Hit Ratio: The most critical metric. Stable hit ratio over time = jitter working correctly. Sudden drops = stampedes happening.

# Hit ratio over 1 minute rate(hot_hit_total[1m]) / (rate(hot_hit_total[1m]) + rate(hot_miss_total[1m]))# Alert if below 80% alert: CacheHitRatioLow expr: rate(hot_hit_total[5m]) / (rate(hot_hit_total[5m]) + rate(hot_miss_total[5m])) < 0.8

In cache-heavy applications, always track the 99th percentile latency. The average latency mostly captures cache hits, but the 99th percentile exposes the impact of cache misses.

Jitter is mandatory after SetMany() operations to prevent stampedes
Background revalidation makes cache refresh non-blocking for hot keys
Warmup + jitter prevents startup stampedes when all warmed keys expire
Monitor hit ratio to verify jitter effectiveness

These patterns work together to create cache systems that maintain performance under load while protecting your backend from overload. The 🌶️ hot library implements all these strategies with simple, composable APIs that make robust caching accessible for production systems.

Click on the “subscribe” buttons!