Show HN: DeltaGlider – Store 4TB of build artifacts in 5GB
2 hours ago
1
Store 4TB of similar files in 5GB. No, that's not a typo.
DeltaGlider is a drop-in S3 replacement that may achieve 99.9% size reduction for versioned compressed artifacts, backups, and release archives through intelligent binary delta compression (via xdelta3).
🌟 Star if you like this! Or Leave a message in Issues - we are listening!
You're storing hundreds of versions of your software releases. Each 100MB build differs by <1% from the previous version. You're paying to store 100GB of what's essentially 100MB of unique data.
Data migration deltaglider migrate s3://origin-bucket s3://dest-bucket
Deltaglider is great for compressed archives of similar content. Like multiple releases of the same software, DB backups, etc.
We don't expect significant benefit for multimedia content like videos, but we never tried.
Deltaglider comes as SDK, CLI, but we also have a GUI:
DeltaGlider stores the first file in a directory (deltaspace) as a reference and subsequent similar files as tiny deltas (differences). When you download, it reconstructs the original file perfectly using the reference + delta.
Intelligent File Type Detection
DeltaGlider automatically detects file types and applies the optimal strategy:
File Type
Strategy
Typical Compression
Why It Works
.zip, .tar, .gz
Binary delta
99%+ for similar versions
Archive structure remains consistent between versions
.dmg, .deb, .rpm
Binary delta
95%+ for similar versions
Package formats with predictable structure
.jar, .war, .ear
Binary delta
90%+ for similar builds
Java archives with mostly unchanged classes
.exe, .dll, .so
Direct upload
0% (no delta benefit)
Compiled code changes unpredictably
.txt, .json, .xml
Direct upload
0% (use gzip instead)
Text files benefit more from standard compression
.sha1, .sha512, .md5
Direct upload
0% (already minimal)
Hash files are unique by design
AWS CLI Replacement: Same commands as aws s3 with automatic compression
boto3-Compatible SDK: Works with existing boto3 code with minimal changes
Zero Configuration: No databases, no manifest files, no complex setup
Data Integrity: original file's SHA256 checksum saved within S3 metadata, verification on every reconstruction
S3 Compatible: Works with AWS S3, MinIO, Cloudflare R2, and any S3-compatible storage
# Copy files to/from S3 (automatic delta compression for archives)
deltaglider cp my-app-v1.0.0.zip s3://releases/
deltaglider cp s3://releases/my-app-v1.0.0.zip ./downloaded.zip
# Recursive directory operations
deltaglider cp -r ./dist/ s3://releases/v1.0.0/
deltaglider cp -r s3://releases/v1.0.0/ ./local-copy/
# List buckets and objects
deltaglider ls # List all buckets
deltaglider ls s3://releases/ # List objects
deltaglider ls -r s3://releases/ # Recursive listing
deltaglider ls -h --summarize s3://releases/ # Human-readable with summary# Remove objects
deltaglider rm s3://releases/old-version.zip # Remove single object
deltaglider rm -r s3://releases/old/ # Recursive removal
deltaglider rm --dryrun s3://releases/test.zip # Preview deletion# Sync directories (only transfers changes)
deltaglider sync ./local-dir/ s3://releases/ # Sync to S3
deltaglider sync s3://releases/ ./local-backup/ # Sync from S3
deltaglider sync --delete ./src/ s3://backup/ # Mirror exactly
deltaglider sync --exclude "*.log" ./src/ s3://backup/ # Exclude patterns# Get bucket statistics with intelligent S3-based caching
deltaglider stats my-bucket # Quick stats (~100ms with cache)
deltaglider stats s3://my-bucket # Also accepts s3:// format
deltaglider stats s3://my-bucket/ # With or without trailing slash
deltaglider stats my-bucket --sampled # Balanced (one sample per deltaspace)
deltaglider stats my-bucket --detailed # Most accurate (slower, all metadata)
deltaglider stats my-bucket --refresh # Force cache refresh
deltaglider stats my-bucket --no-cache # Skip caching entirely
deltaglider stats my-bucket --json # JSON output for automation# Integrity verification & maintenance
deltaglider verify s3://releases/file.zip # Validate stored SHA256
deltaglider purge my-bucket # Clean expired .deltaglider/tmp files
deltaglider purge my-bucket --dry-run # Preview purge results
deltaglider purge my-bucket --json # Machine-readable purge stats# Migrate existing S3 buckets to DeltaGlider compression
deltaglider migrate s3://old-bucket/ s3://new-bucket/ # Interactive migration
deltaglider migrate s3://old-bucket/ s3://new-bucket/ --yes # Skip confirmation
deltaglider migrate --dry-run s3://old-bucket/ s3://new/ # Preview migration
deltaglider migrate s3://bucket/v1/ s3://bucket/v2/ # Migrate prefixes# Works with MinIO, R2, and S3-compatible storage
deltaglider cp file.zip s3://bucket/ --endpoint-url http://localhost:9000
# All standard AWS flags work
deltaglider cp file.zip s3://bucket/ \
--endpoint-url http://localhost:9000 \
--profile production \
--region us-west-2
# DeltaGlider-specific flags
deltaglider cp file.zip s3://bucket/ \
--no-delta # Disable compression for specific files
--max-ratio 0.8 # Only use delta if compression > 20%
- name: Upload Release with 99% compressionrun: | pip install deltaglider deltaglider cp dist/*.zip s3://releases/${{ github.ref_name }}/ # Or recursive for entire directories deltaglider cp -r dist/ s3://releases/${{ github.ref_name }}/
#!/bin/bash# Daily backup with automatic deduplication
tar -czf backup-$(date +%Y%m%d).tar.gz /data
deltaglider cp backup-*.tar.gz s3://backups/
# Only changes are stored, not full backup# Clean up old backups
deltaglider rm -r s3://backups/2023/
DeltaGlider provides a boto3-compatible API for core S3 operations (21 methods covering 80% of use cases):
fromdeltagliderimportcreate_client# Drop-in replacement for boto3.client('s3')client=create_client() # Uses AWS credentials automatically# Identical to boto3 S3 API - just works with 99% compression!response=client.put_object(
Bucket='releases',
Key='v2.0.0/my-app.zip',
Body=open('my-app-v2.0.0.zip', 'rb')
)
print(f"Stored with ETag: {response['ETag']}")
# Standard boto3 get_object - handles delta reconstruction automaticallyresponse=client.get_object(Bucket='releases', Key='v2.0.0/my-app.zip')
withopen('downloaded.zip', 'wb') asf:
f.write(response['Body'].read())
# Smart list_objects with optimized performanceresponse=client.list_objects(Bucket='releases', Prefix='v2.0.0/')
forobjinresponse['Contents']:
print(f"{obj['Key']}: {obj['Size']} bytes")
# Paginated listing for large bucketsresponse=client.list_objects(Bucket='releases', MaxKeys=100)
whileresponse.get('IsTruncated'):
forobjinresponse['Contents']:
print(obj['Key'])
response=client.list_objects(
Bucket='releases',
MaxKeys=100,
ContinuationToken=response.get('NextContinuationToken')
)
# Delete and inspect objectsclient.delete_object(Bucket='releases', Key='old-version.zip')
client.head_object(Bucket='releases', Key='v2.0.0/my-app.zip')
No boto3 required! DeltaGlider provides complete bucket management:
fromdeltagliderimportcreate_clientclient=create_client()
# Create bucketsclient.create_bucket(Bucket='my-releases')
# Create bucket in specific region (AWS only)client.create_bucket(
Bucket='my-regional-bucket',
CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
)
# List all bucketsresponse=client.list_buckets()
forbucketinresponse['Buckets']:
print(f"{bucket['Name']} - {bucket['CreationDate']}")
# Delete bucket (must be empty)client.delete_bucket(Bucket='my-old-bucket')
For N versions of a S MB file with D% difference between versions:
Traditional S3: N × S MB
DeltaGlider: S + (N-1) × S × D% MB
Example: 100 versions of 100MB files with 1% difference:
Traditional: 10,000 MB
DeltaGlider: 199 MB
Savings: 98%
Solution
Compression
Speed
Integration
Cost
DeltaGlider
99%+
Fast
Drop-in
Open source
S3 Versioning
0%
Native
Built-in
$$ per version
Deduplication
30-50%
Slow
Complex
Enterprise $$$
Git LFS
Good
Slow
Git-only
$ per GB
Restic/Borg
80-90%
Medium
Backup-only
Open source
Architecture & Technical Deep Dive
Why xdelta3 Excels at Archive Compression
Traditional diff algorithms (like diff or git diff) work line-by-line on text files. Binary diff tools like bsdiff or courgette are optimized for executables. But xdelta3 is uniquely suited for compressed archives because:
Block-level matching: xdelta3 uses a rolling hash algorithm to find matching byte sequences at any offset, not just line boundaries. This is crucial for archives where small file changes can shift all subsequent byte positions.
Large window support: xdelta3 can use reference windows up to 2GB, allowing it to find matches even when content has moved significantly within the archive. Other delta algorithms typically use much smaller windows (64KB-1MB).
Compression-aware: When you update one file in a ZIP/TAR archive, the archive format itself remains largely identical - same compression dictionary, same structure. xdelta3 preserves these similarities while other algorithms might miss them.
Format agnostic: Unlike specialized tools (e.g., courgette for Chrome updates), xdelta3 works on raw bytes without understanding the file format, making it perfect for any archive type.
When you rebuild a JAR file with one class changed:
Text diff: 100% different (it's binary data!)
bsdiff: ~30-40% of original size (optimized for executables, not archives)
xdelta3: ~0.1-1% of original size (finds the unchanged parts regardless of position)
This is why DeltaGlider achieves 99%+ compression on versioned archives - xdelta3 can identify that 99% of the archive structure and content remains identical between versions.
DeltaGlider intelligently stores files within DeltaSpaces - S3 prefixes where related files share a common reference file for delta compression:
Binary diff engine: xdelta3 for optimal compression
Intelligent routing: Automatic file type detection
Integrity verification: SHA256 on every operation
Local caching: Fast repeated operations
Zero dependencies: No database, no manifest files
Modular storage: The storage layer is pluggable - you could easily replace S3 with a filesystem driver (using extended attributes for metadata) or any other backend
The codebase follows a ports-and-adapters pattern where core business logic is decoupled from infrastructure, with storage operations abstracted through well-defined interfaces in the ports/ directory and concrete implementations in adapters/.
✅ Perfect for:
Software releases and versioned artifacts
Container images and layers
Database backups and snapshots
Machine learning model checkpoints
Game assets and updates
Any versioned binary data
❌ Not ideal for:
Already compressed unique files
Streaming or multimedia files
Frequently changing unstructured data
Files smaller than 1MB
Migrating from aws s3 to deltaglider is as simple as changing the command name:
AWS CLI
DeltaGlider
Compression Benefit
aws s3 cp file.zip s3://bucket/
deltaglider cp file.zip s3://bucket/
✅ 99% for similar files
aws s3 cp -r dir/ s3://bucket/
deltaglider cp -r dir/ s3://bucket/
✅ 99% for archives
aws s3 ls s3://bucket/
deltaglider ls s3://bucket/
-
aws s3 rm s3://bucket/file
deltaglider rm s3://bucket/file
-
aws s3 sync dir/ s3://bucket/
deltaglider sync dir/ s3://bucket/
✅ 99% incremental
Migrating Existing S3 Buckets
DeltaGlider provides a dedicated migrate command to compress your existing S3 data:
# Migrate an entire bucket
deltaglider migrate s3://old-bucket/ s3://compressed-bucket/
# Migrate a prefix (preserves prefix structure by default)
deltaglider migrate s3://bucket/releases/ s3://bucket/archive/
# Result: s3://bucket/archive/releases/ contains the files# Migrate without preserving source prefix
deltaglider migrate --no-preserve-prefix s3://bucket/v1/ s3://bucket/archive/
# Result: Files go directly into s3://bucket/archive/# Preview migration (dry run)
deltaglider migrate --dry-run s3://old/ s3://new/
# Skip confirmation prompt
deltaglider migrate --yes s3://old/ s3://new/
# Exclude certain file patterns
deltaglider migrate --exclude "*.log" s3://old/ s3://new/
Key Features:
Resume Support: Migration automatically skips files that already exist in the destination
Progress Tracking: Shows real-time migration progress and statistics
Safety First: Interactive confirmation shows file count before starting
EC2 Cost Optimization: Automatically detects EC2 instance region and warns about cross-region charges
✅ Green checkmark when regions align (no extra charges)
ℹ️ INFO when auto-detected mismatch (suggests optimal region)
⚠️ WARNING when user explicitly set wrong --region (expect data transfer costs)
Disable with DG_DISABLE_EC2_DETECTION=true if needed
AWS Region Transparency: Displays the actual AWS region being used
Prefix Preservation: By default, source prefix is preserved in destination (use --no-preserve-prefix to disable)
S3-to-S3 Transfer: Both regular S3 and DeltaGlider buckets supported
Use --no-preserve-prefix to place files directly in destination without the source prefix
The migration preserves all file names and structure while applying DeltaGlider's compression transparently.
✅ Battle tested: 200K+ files in production
✅ Data integrity: SHA256 verification on every operation
✅ Cost optimization: Automatic EC2 region detection warns about cross-region charges - 📖 EC2 Detection Guide
✅ S3 compatible: Works with AWS, MinIO, Cloudflare R2, etc.
✅ Atomic operations: No partial states
✅ Concurrent safe: Multiple clients supported
✅ Thoroughly tested: 99 integration/unit tests, comprehensive test coverage
✅ Type safe: Full mypy type checking, zero type errors
✅ Code quality: Automated linting with ruff, clean codebase
# Clone the repo
git clone https://github.com/beshu-tech/deltaglider
cd deltaglider
# Install with dev dependencies
uv pip install -e ".[dev]"# Run tests (99 integration/unit tests)
uv run pytest
# Run quality checks
uv run ruff check src/ # Linting
uv run mypy src/ # Type checking# Run with local MinIO
docker-compose up -d
export AWS_ENDPOINT_URL=http://localhost:9000
deltaglider cp test.zip s3://test/
Q: What if my reference file gets corrupted?
A: Every operation includes SHA256 verification. Corruption is detected immediately.
Q: How fast is reconstruction?
A: Sub-100ms for typical files. The delta is applied in-memory using xdelta3.
Q: Can I use this with existing S3 data?
A: Yes! DeltaGlider can start optimizing new uploads immediately. Old data remains accessible.
Q: What's the overhead for unique files?
A: Zero. Files without similarity are uploaded directly.
Q: Is this compatible with S3 encryption?
A: Yes, DeltaGlider respects all S3 settings including SSE, KMS, and bucket policies.
We welcome contributions! See CONTRIBUTING.md for guidelines.
Key areas we're exploring:
Cloud-native reference management
Rust implementation for 10x speed
Automatic similarity detection
Multi-threaded delta generation
WASM support for browser usage
MIT - Use it freely in your projects.
"We reduced our artifact storage from 4TB to 5GB. CI is also much faster, due to smaller uploads."
— ReadonlyREST Case Study
Try it now: Got versioned files in S3? See your potential savings: