Speed up your data queries by caching remote files locally. The QuackStore extension uses block-based caching to automatically store frequently accessed file portions in a local cache, dramatically reducing load times for repeated queries on the same data.
When you query remote files (like CSV files from the web), DuckDB normally downloads them every time. With QuackStore, the first query downloads and caches the file locally. Subsequent queries use the cached version, making them much faster.
Key Benefits:
- ✅ Block-based caching: Only caches the parts of files you actually access (blocks)
- ✅ Persistent cache: Cache survives database restarts and is stored on disk
- ✅ Data integrity: Automatic corruption detection and recovery - corrupt blocks are automatically evicted and re-fetched from source
- ✅ Seamless integration: Works with existing file systems without breaking compatibility
- ✅ Smart caching: LRU eviction automatically manages cache size; frequently accessed blocks stay cached longer
- ✅ Perfect for: Scenarios where data files are accessed repeatedly or where network I/O is a bottleneck
-
Enable the extension (if not already loaded):
INSTALL quackstore FROM community; LOAD quackstore; -
Configure cache location:
SET GLOBAL quackstore_cache_path = '/tmp/my_duckdb_cache.bin'; SET GLOBAL quackstore_cache_enabled = true; -
Use cached file access by adding quackstore:// prefix:
-- Slow: Downloads every time SELECT * FROM 'https://example.com/data.csv'; -- Fast: Cached after first download SELECT * FROM 'quackstore://https://example.com/data.csv';
Note: Cache path, size, and enabled settings are global-only because the cache is shared across all database sessions. Currently, it's not possible to have multiple per-session caches.
Note: Cache path, size, and enabled settings are global-only because the cache is shared across all database sessions. Currently, it's not possible to have multiple per-session caches.
✅ Good for:
- Large files you query repeatedly
- Remote files (HTTP/HTTPS URLs, S3, etc.)
- Slow network connections
- Both static and changing data (use appropriate quackstore_data_mutable setting)
❌ Don't use for:
- Local files (already fast)
- One-time queries on small files
The quackstore_data_mutable parameter controls how aggressively the cache validates data freshness:
For immutable data (recommended for most analytics workloads):
- Files are assumed not to change once cached
- Maximum performance (no validation checks)
- Perfect for: Historical data, archived files, static datasets
For mutable data (default behavior):
- Cache validates file freshness on access
- Performs lightweight metadata checks (modification time and file size) - these are small requests that don't download the whole file
- Use for: Live datasets, frequently updated files
Session vs Global scope:
- SET GLOBAL - affects all database connections
- SET (session) - affects only current connection
- Useful for mixed workloads where different queries need different behaviors
-
quackstore_clear_cache(): Removes all cached data and resets the cache
- Safe to call multiple times or when cache doesn't exist
- Works even when caching is disabled
-
quackstore_evict_files(['quackstore://file1', 'quackstore://file2', ...]): Removes specific files from the cache
- Important: File paths must include the quackstore:// prefix, exactly as used in queries
- Takes a list of file paths with the prefix: ['quackstore://https://example.com/data.csv']
- Useful for removing outdated files without clearing the entire cache
- Safe to call with non-existent files (no error)
- Validates input parameters and provides clear error messages for invalid arguments
- Cache Location: Store the cache on fast storage (SSD) for best performance
- Cache Size: Set it large enough for your working dataset, but remember it's block-based - you don't need space for entire files
- Access Patterns: Sequential reads within 1MB boundaries are most efficient
- Block Alignment: Works best with files larger than 1MB (the internal block size)
The extension uses block-based caching with 1MB blocks. This means:
- Partial file caching: Only the portions of files you actually read are cached
- Efficient memory usage: Large files don't need to be fully downloaded if you only need part of them
- Block-level eviction: Individual blocks are evicted independently using LRU (least recently used)
- Whole files can span multiple blocks: A large file may be cached across many blocks, but some blocks might be evicted while others remain
When you access a cached file:
- First access: Downloads only the needed blocks and stores them in cache
- Subsequent access: Reads cached blocks from local storage (much faster)
- Cache pressure: Individual blocks are evicted as needed, not entire files
Cache not working?
- Check that quackstore_cache_enabled = true
- Ensure quackstore_cache_path is set to a writable location
- Make sure you're using the quackstore:// prefix
Cache file growing too large?
- Adjust quackstore_cache_size setting
- Use quackstore_clear_cache() to reset
Still slow?
- Cache works best for repeated queries on the same files
- First query will always be slow (downloading + caching)
- Very small files may not benefit much
Function errors?
- quackstore_evict_files requires file paths with quackstore:// prefix: use ['quackstore://https://example.com/file.csv']
- quackstore_evict_files requires a list of strings: use proper list syntax with quotes
- Empty lists need explicit casting: use []::VARCHAR[] instead of []
- NULL arguments are not allowed: ensure all parameters are valid
- For subqueries, extract to variable first: SET files = (SELECT list(...)); CALL quackstore_evict_files(getvariable('files'));
For most users, the extension can be installed directly:
For building from source, see the build instructions in the original documentation.
See LICENSE.txt
.png)
![Bad Apple but it's a sorting algorithm [video]](https://www.youtube.com/img/desktop/supported_browsers/chrome.png)

