Research → Open Source → Product
Sempress learns patterns in IoT sensors, time-series metrics, and ML features—delivering 50-125% better compression than gzip while preserving precision.
5.8×
Average compression ratio on numeric-heavy data
125%
Improvement over gzip on IoT telemetry
100%
Lossless preservation of locked columns
Real-World Performance
Tested on 400K+ rows across IoT sensors, ML features, and financial data
Telemetry (IoT)
8.08× Sempress
3.58× Gzip
Sensor Physics
5.88× Sempress
2.76× Gzip
ML Features
5.46× Sempress
3.09× Gzip
Financial Data
3.80× Sempress
2.51× Gzip
Results vary by data characteristics. Best for numeric-heavy tables (>60% numeric columns).
Built for Modern Analytics
Reduce storage costs and transfer times for data-intensive workloads
🌐 IoT & Telemetry
Compress sensor data 2× better than gzip. Perfect for industrial IoT, smart cities, and fleet management where millions of numeric readings flow continuously.
🤖 ML Feature Stores
Reduce S3 costs for training data. Store high-dimensional continuous features with near-zero error, enabling efficient model training at scale.
💰 Financial Analytics
Archive tick data with lossless precision. Bounded reconstruction error meets compliance requirements while saving 50% on storage.
How It Works
Semantic compression via learned vector quantization
1. Learn Structure
K-Means VQ per column learns semantic patterns in numeric data. Temperatures cluster around 20-25°C, prices follow smooth distributions.
2. Preserve Fidelity
Auto-locks strings and categoricals for lossless storage. Optional residuals eliminate quantization error on precision-critical columns.
3. Package Smart
Msgpack + Zstd container with uncertainty tracking. Self-describing format includes schema and reconstruction metadata.
Research Paper
Sempress: Semantic Compression for Numeric Tabular Data
Traditional compression algorithms treat tabular data as byte streams, ignoring semantic structure. We present Sempress, achieving 50-125% better compression than gzip on numeric-heavy datasets through column-wise vector quantization.
Published: January 2025
Authors: Keaton Anderson (Independent Researcher)
License: Open access
Open Source
Install with pip, integrate in minutes
Ready to compress smarter?
Join the research community building the future of semantic compression
.png)

