Loggr: Processing 250M logs in 11.5s on a laptop with on-the-fly 5× compression

2 days ago 1

Press enter or click to view image in full size

Loggr live telemetry in real test-case

Superwired-labs

Benchmarks and technical deep-dive into a high-performance C logging library

TL;DR
We built Loggr: a tiny (170 KB, no external dependencies) native C logging library that preprocesses, batches and compresses logs at line rate. On a Lenovo P14s developer laptop (Ryzen 5 Pro, NVMe) we processed 250,000,000 synthetic web-style logs in 11.52 seconds (21.71 million logs/second), achieving roughly 5× end-to-end (to disk) compression (preprocessing + LZ4) while keeping RAM usage low and zero lost logs. This article explains the architecture, test methodology, exact parameters, benchmark data, limitations, and how to reproduce the tests.

Author:
François Gauthier — Founder & Software Architect, Superwired-labs

The Challenge

Most logging systems assume infinite cloud resources, but we wanted to test fundamental limits: how many logs can you process on consumer hardware while maintaining business value?

Our goal: handle 250 million logs with on-the-fly compression and live observability on a standard developer laptop, measured from memory to disk.

Why This Matters

Modern observability pipelines often ignore egress costs and storage multipliers. Loggr moves efficient preprocessing upstream — reducing volume before data leaves the host while preserving temporal order and fidelity.

Positioning in the Observability Ecosystem:
Loggr is not designed to replace full-featured platforms like Datadog or Splunk, but to serve as an upstream gateway — compressing logs at the source before transmission to storage or downstream analysis pipelines. This creates a cost-efficient two-stage architecture where Loggr handles the “heavy lifting” of data reduction, dramatically cutting egress and storage costs while maintaining compatibility with existing tools.

Technical Architecture

Core Design:

  • 170KB C DLL, zero dependencies, AVX2 CPU required
  • Lock-free MPMC queues and ring buffers, highly parallelized hot-path
  • Preprocessing + LZ4 batch compression
  • Configurable memory footprint (20MB to scale)

Key Innovation: Smart Preprocessing-First + Temporal Caching
Instead of throwing raw logs at LZ4, we transform the stream into low-entropy representation, with natural temporal proximity as a caching layer.

This preprocessing reduces entropy, allowing fast compression to achieve ratios typically requiring heavy algorithms.

API Snapshot

The primary API used in tests:

DLL_API uint8_t LogInit(
const char* log_path, // Primary log path
const char* backup_path, // Back-up path, instant hot-swap
const char* error_path, // Error message path (text file)
uint8_t enable_seq, // Cross thread atomic counter
Anon_lvl ip_anon_lvl, // On-the-fly IPv4/6 anonymization level
uint8_t truncate_url_params, // If applicable, truncate params
uint16_t flush_per_file, // File size control
Lz4CompressionLevel level, // Compression level
BatchSize buffer_size, // Logs per batch
DictSize dict_size); // Dictionary size
DLL_API uint16_t LogWrite(
uint8_t operation_id, // HTTP method, ICMP type, DNS opcode, etc.
const char* resource, // URL, domain name, ICMP message, etc.
uint32_t resource_len, // Length of the resource string
uint32_t status_code, // HTTP status, DNS RCODE, ICMP code/type, etc.
const char* endpoint, // IPv4/IPv6, hostname, DNS server, etc.
uint32_t endpoint_len, // Length of the endpoint string
uint16_t duration_ms, // Duration in milliseconds (0–65535)
uint16_t data_size_bucket, // Data size bucket (0 = 1–5 KB, 1 = 5–10 KB, etc.)
uint8_t flags, // Free bitmask
uint64_t timestamp // High-resolution timestamp in microseconds
);

Test Environment & Methodology

Tests were run on a stock Lenovo ThinkPad P14s Gen5 (Ryzen 5 Pro, 96GB RAM, NVMe SSD) under Windows 11 x64. The load generator is a native compiled executable; compression uses LZ4 v1.10 (default API). Timing is measured with QueryPerformanceCounter from producer startup to producers termination. Writes use WriteFile() with FILE_FLAG_WRITE_THROUGH and FlushFileBuffers() is issued at each file rotation to ensure persistence at rotation boundaries.
Workload: 250M synthetic web-style logs
Loggr: Live telemetry (15s refresh default) and atomic numbering activated

  • Resources: 1,000 unique URLs preloaded in ram, selected randomly at run-time
  • Endpoints: 5,000 unique IPs preloaded in ram, selected randomly at run-time
  • Other fields (numeric) generated randomly on-the-fly

Sample log values :

[248][2025–10–17T15:50:20.721988Z][/forum/thread/12345.html][172.16.18.116][506][SEARCH][59466ms][16802b][174]
[547][2025-10-29T12:27:41.019960Z][/path_0499.png][10.0.1.204][506][UNCHECKOUT][32090ms][52819b][0]
[548][2025-10-29T12:27:41.019960Z][/path_0242.pdf][172.16.2.207][208][REPORT][48896ms][55970b][224]
[549][2025-10-29T12:27:41.019960Z][/admin/dashboard/overview.jpeg][172.16.14.222][414][PUT][37213ms][64923b][151]

“High-Performance” Configuration:

LogInit(
"C:\logs", // Main path
"C:\AltLogspath", // Back-up
"C:\logs\error", // Errors
1, // Activate atomic numbering
ANON_IP_NONE, // No IP anonymization
0, // No param truncation
32, // 32 batch per file before rotation
COMPRESSION_BALANCED, // LZ4 Default
BATCH_2MB, // 2MB == 65536 logs per batch
DICT_16K // Dictionary 16384 slots
);

“Economy” Configuration:

LogInit(
"C:\logs", // Main path
"C:\AltLogspath", // Back-up
"C:\logs\error", // Errors
1, // Activate atomic numbering
ANON_IP_NONE, // No IP anonymization
0, // No param truncation
32, // 32 batch per file before rotation
COMPRESSION_BALANCED, // LZ4 Default
BATCH_500KB, // 500KB == 16000 logs per batch
DICT_16K // Dictionary 16384 slots
);

Benchmark Results

High-Performance Mode (6 caller threads):

  • 250,000,000 logs in 11.52 seconds
  • 21.71M logs/second sustained
  • ~105MB RAM footprint
  • LZ4-only ratio: 1.5× (67% of original)
  • End-to-end ratio: ~5.16× (preprocessing + LZ4: 19.4% of the original)
  • 0 logs lost, 0 backpressure events

Economy Mode (single caller thread):

  • 250,000,000 logs in 29.52 seconds
  • 8.47M logs/second
  • ~16MB RAM footprint
  • LZ4-only ratio: 1.5× (67% of original)
  • End-to-end ratio: ~4.66× (preprocessing + LZ4: 21.5% of the original)
  • 0 logs lost, 0 backpressure events

Now let’s try with an adversarial workload :
250M synthetic web-style logs

  • Resources: 1,300,000 unique filesystem paths & randomly formed urls preloaded in ram, selected randomly at run-time
  • Endpoints: 5,000 unique IPs preloaded in ram, selected randomly at run-time
  • Other fields (numeric) generated randomly on-the-fly
  • Sample log values :
[249328055] [2025–10–28T17:39:52.556279Z] [http://jmtbfkgosr.com/khbceyfz] [14:11:14:58:9a:a4:8c:cc] [416] [MOVE] [20542ms] [22240b] [168]
[249328056] [2025–10–28T17:39:52.556280Z] [file:///C:/Windows/WinSxS/wow64_microsoft-windows-w..for-management-core_31bf3856ad364e35_10.0.26100.1_none_d1adc81249f528d5/WsmAgent.mof] [46.104.145.156] [402] [UNBIND] [15124ms] [62786b] [138]
[249328057] [2025–10–28T17:39:52.556281Z] [file:///C:/Windows/WinSxS/Manifests/wow64_system.printing_31bf3856ad364e35_4.0.15912.0_none_5105107b3d632fd3.manifest] [57.32.93.105] [503] [MOVE] [55300ms] [20270b] [111]
[249328058] [2025–10–28T17:39:52.556282Z] [http://psgecnxhwf.com/ubwyxdiz] [122.34.41.214] [418] [MKWORKSPACE] [30828ms] [5704b] [134]

“High-Performance” Configuration (adapted to the load):

LogInit(
"C:\logs", // Main path
"C:\AltLogspath", // Back-up
"C:\logs\error", // Errors
1, // Activate atomic numbering
ANON_IP_NONE, // No IP anonymization
0, // No param truncation
32, // 32 batch per file before rotation
COMPRESSION_BALANCED, // LZ4 Default
BATCH_8MB, // 8MB == 262144 logs per batch
DICT_256K // Dictionary 262144 slots
);

“Economy” Configuration (same as before):

LogInit(
"C:\logs", // Main path
"C:\AltLogspath", // Back-up
"C:\logs\error", // Errors
1, // Activate atomic numbering
ANON_IP_NONE, // No IP anonymization
0, // No param truncation
32, // 32 batch per file before rotation
COMPRESSION_BALANCED, // LZ4 Default
BATCH_500KB, // 500KB == 16000 logs per batch
DICT_16K // Dictionary 16384 slots
);

Benchmark Results

High-Performance Mode (6 caller threads):

  • 250,000,000 logs in 47.81 seconds
  • 5.23M logs/second sustained
  • ~1.5GB RAM footprint
  • LZ4-only ratio: 1.54× (65% of original)
  • End-to-end ratio: ~2,60× (preprocessing + LZ4: 38.5% of the original)
  • 0 logs lost, 0 backpressure events

Economy Mode (single caller thread):

  • 250,000,000 logs in 144.12 seconds
  • 1.73M logs/second
  • ~16MB RAM footprint
  • LZ4-only ratio: 1.54× (65% of original)
  • End-to-end ratio: ~2,14× (preprocessing + LZ4: 46.7% of the original)
  • 0 logs lost, 0 backpressure events

One last test, with a maximum compression configuration, using the adversarial dataset (same as above).
“Max-Compression” Configuration :

LogInit(
"C:\logs", // Main path
"C:\AltLogspath", // Back-up
"C:\logs\error", // Errors
1, // Activate atomic numbering
ANON_IP_NONE, // No IP anonymization
0, // No param truncation
32, // 32 batch per file before rotation
COMPRESSION_BALANCED, // LZ4 Default
BATCH_512MB, // 512MB == 16777216 logs per batch
DICT_2M // Dictionary 2097152 slots
);

Max-Compression Mode (single caller thread):

  • 250,000,000 logs in 168.24 seconds
  • 1.49M logs/second sustained
  • ~3.5GB RAM footprint
  • LZ4-only ratio: 1.4× (71% of original)
  • End-to-end ratio: ~5,99× (preprocessing + LZ4 : ~16.7% of the original)
  • 0 logs lost, 0 backpressure events

Compression Analysis

Why preprocessing multiplies effectiveness:

  • Raw logs: LZ4 alone ≈ 1.5×
  • With preprocessing: ≈ 5.07× total
  • Preprocessing reduces entropy, enabling LZ4 to find longer repeated patterns while preserving throughput and hardware resources.
    CPU usage is directly tied to the number of calling threads (1 thread ≈ 18% of total CPU in this stress test, actual utilization varies with data volume and patterns).

Test Consistency

Across multiple runs, variance was <4% for throughput and <6% for compression ratios. The presented numbers represent typical sustained performance.

Tradeoffs and Tuning

  • Batch size: Larger = better compression, more RAM and more in-flight logs
  • Dictionary size: helps absorb cardinality, but works even undersized thanks to the temporal cache
  • Producer threads: In our tests, scaling from 1 to 6 threads delivered approximately 2.5× throughput improvement, with diminishing returns expected beyond 1.5× core count due to CPU bottlenecks.
  • CPU vs RAM: Configurable balance
  • No cache resize and very limited dynamic allocations : the system is designed to perform any workload on fixed customizable resource constraints, ensuring predictability even under stress as demonstrated in the tests.
  • Log files are composed of autonomous batches, meaning every written batch is readable without the need of a completed file.

On Test Cardinality and Real-World Relevance

Our benchmark scenarios were designed to reflect actual production environments:

  • Standard Workload: Simulates a typical web service with fixed endpoints (1,000 URLs) serving many clients (5,000 IPs) — representing the common case where Loggr excels.
  • Adversarial Workload: Models high-cardinality scenarios like file servers, parameter-heavy APIs, or enumeration attacks (1.3M unique paths) — demonstrating graceful degradation under stress.

The random generation of numeric fields (status codes, durations, data sizes) ensures our tests reflect the entropy of real telemetry data, while the preloaded string pools control for specific cardinality impacts.

Limitations and Caveats

  • AVX2 required — no software fallback
  • No transactional durability — in-memory logs vulnerable to system failure (batches dumped on disk are safe even if the file is not complete). A full flush is done on log file rotation and on LogShutdown() call. For true crash consistency in production, we recommend enterprise storage with power loss protection.
  • Structured logs optimized — very high-cardinality data yields typical 2–3× compression, allowing bigger batches can lead to (much) better ratios
  • Synthetic tests — real production data will vary
  • Benchmarked on a mid-level laptop, production hardware may behave otherwise

Reproducibility

The detailed methodology, parameters, and dataset descriptions above provide everything needed to understand Loggr’s performance characteristics and design principles.

For organizations conducting formal technical evaluations, a limited demo DLL is available under NDA.

We encourage the technical community to use these specifications as a reference for building their own high-performance logging solutions.

When This Approach Fits

Appropriate for:

  • High-throughput services (100K+ events/sec)
  • Cost-constrained environments (edge, multi-region)
  • Detailed audit trails with size constraints

Less appropriate for:

  • Low-volume setups where integration overhead isn’t justified
  • Environments without AVX2 support

Sample Telemetry

[STAT] cache_hit resource L1 : 249937807
[STAT] cache_hit resource L1 % : 99.98
[STAT] cache_hit resource L2 : 62193
[STAT] cache_hit resource L2 % : 0.02
[STAT] cache_hit resource L3 : 0
[STAT] cache_hit resource L3 % : 0.00
[STAT] cache_hit endpoint L1 : 250000000
[STAT] cache_hit endpoint L1 % : 100.00
[STAT] cache_hit endpoint L2 : 0
[STAT] cache_hit endpoint L2 % : 0.00
[STAT] cache_hit endpoint L3 : 0
[STAT] cache_hit endpoint L3 % : 0.00
[STAT] resource_cache_probes_total : 206998729
[STAT] endpoint_cache_probes_total : 4944735
[STAT] resource cache_insert_to_step_ratio : 1.21
[STAT] endpoint cache_insert_to_step_ratio : 50.56
[STAT] resource cache_probes_depth_max : 32
[STAT] endpoint cache_probes_depth_max : 0
[STAT] resource_cache_fullprobescan_total : 0
[STAT] endpoint_cache_fullprobescan_total : 0
[STAT] log_processed_total : 250000000
[STAT] batch_flushed_total : 15
[STAT] batch_compressed_total : 15
[STAT] batch_written_total : 15
[STAT] writer_waitfile_max : 0
[STAT] backpressure_count : 0
[STAT] compression_ratio_avg : 0.71
[STAT] compression_failure_total : 0
[STAT] lost_logs_total : 0
[STAT] log_rotation : 1
[STAT] log_refused_total : 0
[STAT] rotation_resync_total : 0
[STAT] throughput: 1.49 million logs/s

Sample telemetry is taken form the compression test.

Conclusion

By rethinking logging as a data compression problem rather than a formatting and I/O challenge, Loggr demonstrates that order-of-magnitude improvements are still possible in fundamental infrastructure components.

This write-up aims for full technical transparency. Hardware/software details are included to facilitate reproduction. The library is available for evaluation by enterprise customers by contacting the author.

All benchmarks measured on described hardware; results will vary with workload and environment.

Read Entire Article