I accidentally built a vector database using video compression
4 months ago
17
The lightweight, game-changing solution for AI memory at scale
Memvid revolutionizes AI memory management by encoding text data into videos, enabling lightning-fast semantic search across millions of text chunks with sub-second retrieval times. Unlike traditional vector databases that consume massive amounts of RAM and storage, Memvid compresses your knowledge base into compact video files while maintaining instant access to any piece of information.
mem.mp4
🎥 Video-as-Database: Store millions of text chunks in a single MP4 file
🔍 Semantic Search: Find relevant content using natural language queries
💬 Built-in Chat: Conversational interface with context-aware responses
📚 PDF Support: Direct import and indexing of PDF documents
🚀 Fast Retrieval: Sub-second search across massive datasets
💾 Efficient Storage: 10x compression compared to traditional databases
🔌 Pluggable LLMs: Works with OpenAI, Anthropic, or local models
🌐 Offline-First: No internet required after video generation
🔧 Simple API: Get started with just 3 lines of code
📖 Digital Libraries: Index thousands of books in a single video file
🎓 Educational Content: Create searchable video memories of course materials
📰 News Archives: Compress years of articles into manageable video databases
🔬 Research Papers: Quick semantic search across scientific literature
📝 Personal Notes: Transform your notes into a searchable AI assistant
Video as Database: Store millions of text chunks in a single MP4 file
Instant Retrieval: Sub-second semantic search across massive datasets
10x Storage Efficiency: Video compression reduces memory footprint dramatically
Zero Infrastructure: No database servers, just files you can copy anywhere
Offline-First: Works completely offline once videos are generated
Minimal Dependencies: Core functionality in ~1000 lines of Python
CPU-Friendly: Runs efficiently without GPU requirements
Portable: Single video file contains your entire knowledge base
Streamable: Videos can be streamed from cloud storage
pip install memvid PyPDF2
Recommended Setup (Virtual Environment)
# Create a new project directory
mkdir my-memvid-project
cd my-memvid-project
# Create virtual environment
python -m venv venv
# Activate it# On macOS/Linux:source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install memvid
pip install memvid
# For PDF support:
pip install PyPDF2
frommemvidimportMemvidEncoder, MemvidChat# Create video memory from text chunkschunks= ["Important fact 1", "Important fact 2", "Historical event details", ...]
encoder=MemvidEncoder()
encoder.add_chunks(chunks)
encoder.build_video("memory.mp4", "memory_index.json")
# Chat with your memorychat=MemvidChat("memory.mp4", "memory_index.json")
chat.start_session()
response=chat.chat("What do you know about historical events?")
print(response)
Building Memory from Documents
frommemvidimportMemvidEncoderimportos# Load documentsencoder=MemvidEncoder(chunk_size=512, overlap=50)
# Add text filesforfileinos.listdir("documents"):
withopen(f"documents/{file}", "r") asf:
encoder.add_text(f.read(), metadata={"source": file})
# Build optimized videoencoder.build_video(
"knowledge_base.mp4",
"knowledge_index.json",
fps=30, # Higher FPS = more chunks per secondframe_size=512# Larger frames = more data per frame
)
frommemvidimportMemvidInteractive# Launch interactive chat UIinteractive=MemvidInteractive("knowledge_base.mp4", "knowledge_index.json")
interactive.run() # Opens web interface at http://localhost:7860
Complete Example: Chat with a PDF Book
# 1. Create a new directory and set up environment
mkdir book-chat-demo
cd book-chat-demo
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate# 2. Install dependencies
pip install memvid PyPDF2
# 3. Create book_chat.py
cat > book_chat.py << 'EOF'from memvid import MemvidEncoder, chat_with_memoryimport os# Your PDF filebook_pdf = "book.pdf" # Replace with your PDF path# Build video memoryencoder = MemvidEncoder()encoder.add_pdf(book_pdf)encoder.build_video("book_memory.mp4", "book_index.json")# Chat with the bookapi_key = os.getenv("OPENAI_API_KEY") # Optional: for AI responseschat_with_memory("book_memory.mp4", "book_index.json", api_key=api_key)EOF# 4. Run itexport OPENAI_API_KEY="your-api-key"# Optional
python book_chat.py
encoder=MemvidEncoder(
chunk_size=512, # Characters per chunkoverlap=50, # Character overlap between chunksmodel_name='all-MiniLM-L6-v2'# Sentence transformer model
)
# Methodsencoder.add_chunks(chunks: List[str], metadata: List[dict] =None)
encoder.add_text(text: str, metadata: dict=None)
encoder.build_video(video_path: str, index_path: str, fps: int=30, qr_size: int=512)
retriever=MemvidRetriever(
video_path: str,
index_path: str,
cache_size: int=100# Number of frames to cache
)
# Methodsresults=retriever.search(query: str, top_k: int=5)
context=retriever.get_context(query: str, max_tokens: int=2000)
chunks=retriever.get_chunks_by_ids(chunk_ids: List[int])
fromsentence_transformersimportSentenceTransformer# Use custom embedding modelcustom_model=SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
encoder=MemvidEncoder(embedding_model=custom_model)
# For maximum compressionencoder.build_video(
"compressed.mp4",
"index.json",
fps=60, # More frames per secondframe_size=256, # Smaller framesvideo_codec='h265', # Better compressioncrf=28# Compression quality (lower = better quality)
)
# Process large datasets in parallelencoder=MemvidEncoder(n_workers=8)
encoder.add_chunks_parallel(massive_chunk_list)
ModuleNotFoundError: No module named 'memvid'
# Make sure you're using the right Python
which python # Should show your virtual environment path# If not, activate your virtual environment:source venv/bin/activate # On Windows: venv\Scripts\activate
ImportError: PyPDF2 is required for PDF support
OpenAI API Key Issues
# Set your API key (get one at https://platform.openai.com)export OPENAI_API_KEY="sk-..."# macOS/Linux# Or on Windows:set OPENAI_API_KEY=sk-...
Large PDF Processing
# For very large PDFs, use smaller chunk sizesencoder=MemvidEncoder()
encoder.add_pdf("large_book.pdf", chunk_size=400, overlap=50)
We welcome contributions! Please see our Contributing Guide for details.
# Run tests
pytest tests/
# Run with coverage
pytest --cov=memvid tests/
# Format code
black memvid/
🆚 Comparison with Traditional Solutions
Feature
Memvid
Vector DBs
Traditional DBs
Storage Efficiency
⭐⭐⭐⭐⭐
⭐⭐
⭐⭐⭐
Setup Complexity
Simple
Complex
Complex
Semantic Search
✅
✅
❌
Offline Usage
✅
❌
✅
Portability
File-based
Server-based
Server-based
Scalability
Millions
Millions
Billions
Cost
Free
$$$$
$$$
v0.2.0 - Multi-language support
v0.3.0 - Real-time memory updates
v0.4.0 - Distributed video sharding
v0.5.0 - Audio and image support
v1.0.0 - Production-ready with enterprise features