Show HN: HyperMind – Experimental human-like memory layer for AI apps (OS)

7 hours ago 1

🧠 Intelligent Memory Layer for Large Language Models

Transform stateless LLMs into context-aware AI agents with persistent, optimized memory

HyperMind is a production-grade memory proxy that sits between your application and any LLM provider (OpenAI, Anthropic, Groq, Google). It automatically manages conversation context and long-term memory using cognitive science principles and advanced optimization techniques, preventing vector database bloat while maintaining intelligent memory retention. Try out the demo at: HyperMind chat *experimental

LLMs are stateless - they forget everything after each conversation
Vector databases grow indefinitely, causing performance degradation
Building persistent memory is complex and expensive
No intelligent filtering - everything gets stored, even irrelevant content
Context windows are limited and expensive to extend

HyperMind provides a universal memory layer with comprehensive optimization:

# Instead of calling providers directly: curl https://api.openai.com/v1/chat/completions curl https://api.anthropic.com/v1/messages curl https://api.groq.com/openai/v1/chat/completions curl https://generativelanguage.googleapis.com/v1beta/openai/chat/completions # Call HyperMind (same API, but with intelligent memory): curl https://your-hypermind.workers.dev/router/v1/chat/completions

Your AI now remembers everything - while staying fast and cost-efficient.

🔌 Universal Proxy: Works with any LLM provider (OpenAI, Anthropic, Groq, Google)
🔄 Multi-Provider Support: Seamlessly switch between providers while maintaining memory
⚡ Low Latency: Transparent proxy adds <700ms overhead
💰 Cost Transparent: Uses your API keys, zero markup

Combines three search strategies for comprehensive memory retrieval:

🎯 Vector Search - Semantic similarity using embeddings
🕸️ Graph Traversal - Entity relationships and knowledge graphs
⏰ Chronological - Recent context and temporal relevance

🎛️ Intelligent Memory Optimization

Prevents vector database bloat with advanced techniques:

🔗 Smart Deduplication - Detects and merges similar memories (90% similarity threshold)
📊 Significance Filtering - Skips low-value content (greetings, filler, acknowledgments)
📦 Tiered Archival - Moves old memories through Hot→Warm→Cold→Archived tiers
🔄 Memory Consolidation - Clusters and summarizes related memories
⚡ Batch Processing - Queues embeddings for efficient API usage

Result: 40-60% storage reduction, 2-3x faster search, 50-70% fewer API calls

🔗 Temporal Triplets: Subject-Predicate-Object with time validity
🏷️ Entity Extraction: Automatic extraction of people, places, concepts
📝 Episodic Classification: Categorizes memories by type (comparison, question, definition, list, factual)
📉 Smart Decay: Different forgetting rates for different memory types

⏱️ Cognitive Science Integration

Based on Ebbinghaus' Forgetting Curve:

Tier Age Vector Search Status

🔥 Hot	0-7 days	Active	Full access
🌡️ Warm	7-30 days	Active	Full access
❄️ Cold	30-90 days	Active	Lower priority
📦 Archived	90+ days	Removed	D1 only
🗄️ Ancient	180+ days	Compressed	R2 storage (optional)

1. Deploy HyperMind (1-click)

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude 3.5)
Groq (Llama 3.3) - Free tier available
Google (Gemini 2.0)

3. Make Your First Request

curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: groq" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [ {"role": "user", "content": "I am building a quantum computing system with 127 qubits"} ] }'

curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: groq" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [ {"role": "user", "content": "What quantum computing project am I working on?"} ] }'

Response: "You're building a quantum computing system with 127 qubits..." ✨

sequenceDiagram participant App as Your Application participant Router as Memory Router participant Search as Hybrid Search participant Storage as Storage Layer participant LLM as LLM Provider participant Optim as Optimization App->>Router: Chat Request (user message) Note over Router: Step 1: Memory Retrieval Router->>Search: Find relevant memories par Parallel Search Search->>Storage: Vector Search (semantic) Search->>Storage: Graph Traversal (entities) Search->>Storage: Chronological (recent) end Storage-->>Search: Combined Results Search-->>Router: Top 15 relevant memories Note over Router: Step 2: Context Injection Router->>Router: Inject memories into prompt Note over Router: Step 3: LLM Request Router->>LLM: Enhanced request (with context) LLM-->>Router: Response Router-->>App: Final Response (with memory) Note over Router: Step 4: Background Storage Router->>Optim: Store conversation async Optim->>Optim: Analyze Significance (score: 0.0-1.0) alt Low Significance (< 0.6) Optim->>Optim: Discard ❌ else High Significance (>= 0.6) Optim->>Optim: Check for duplicates (hash + similarity) alt Similar Memory Found (> 0.9) Optim->>Storage: Merge with existing 🔗 else New Memory Optim->>Optim: Add to batch queue Optim->>Storage: Store when batch full end end Note over Storage: Tiered Storage Storage->>Storage: Hot (0-7d): Active Warm (7-30d): Active Cold (30-90d): Active Archived (90d+): D1 only Ancient (180d+): R2

Loading Layer Technology Purpose Data Retention

Active Index	Cloudflare Vectorize	Semantic search on hot/warm/cold memories	0-90 days
Primary DB	Cloudflare D1 (SQLite)	All memories, entities, triplets	Forever
Query Cache	Cloudflare KV	LLM analysis results	1 hour TTL
Cold Archive	Cloudflare R2 (optional)	Compressed ancient memories	180+ days

Incoming Memory ↓ [Significance Analysis] ↓ Score < 0.6? → Discard ❌ ↓ [Hash Check] ↓ Duplicate? → Skip ❌ ↓ [Similarity Check] ↓ Similar (>0.9)? → Merge 🔗 ↓ [Batch Queue] ↓ Queue Full (50)? → Process Batch ↓ [Vector Storage] ↓ Stored ✅

# Chat with memory (works with any provider) curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: groq" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [ {"role": "user", "content": "My favorite programming language is Python"} ] }' # Switch to different provider (memory persists) curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Authorization: Bearer YOUR_OPENAI_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: openai" \ -d '{ "model": "gpt-4", "messages": [ {"role": "user", "content": "What programming language do I prefer?"} ] }'

# Store a memory manually curl -X POST "https://your-hypermind.workers.dev/api/memories?userId=user123" \ -H "Content-Type: application/json" \ -d '{ "content": "I prefer TypeScript over JavaScript", "metadata": {"source": "manual", "tags": ["programming"]} }' # Search memories curl -X POST "https://your-hypermind.workers.dev/api/search?userId=user123" \ -H "Content-Type: application/json" \ -d '{ "query": "programming preferences", "limit": 5 }'

⚡ Performance & Optimization

Feature Impact Description

Smart Deduplication	20-30% reduction	Merges similar memories (cosine similarity > 0.90)
Significance Filtering	30-40% reduction	Skips greetings, filler, low-value content
Tiered Archival	2-3x faster search	Removes old memories from active vector index
Memory Consolidation	30-40% reduction	Clusters related memories into summaries
Batch Processing	50-70% fewer API calls	Queues embeddings for batch processing

Before Optimization:

Storage: Linear growth, indefinite
Search: 5-10s for 10k+ memories
API Calls: Every conversation = 1+ embedding calls

After Optimization:

Storage: 40-60% reduction
Search: 2-3s for 10k+ memories (2-3x faster)
API Calls: 50-70% reduction via batching

Customize optimization thresholds in wrangler.toml:

[vars] DEDUP_SIMILARITY_THRESHOLD = "0.90" # 0.85-0.95 recommended MIN_SIGNIFICANCE_SCORE = "0.60" # 0.5-0.7 recommended CONSOLIDATION_ENABLED = "true" # Enable memory consolidation BATCH_EMBEDDING_SIZE = "50" # Batch size: 10-100 ARCHIVE_COLD_AFTER_DAYS = "90" # Days before archival: 60-180

HyperMind runs automated tasks via cron triggers:

Task Schedule Purpose

Forgetting Cycle	Daily 2 AM	Update relevance scores, archive old memories
Consolidation	Daily 3 AM	Cluster and summarize related memories
Batch Processing	Every 30 min	Process queued embeddings

git clone https://github.com/yourusername/hypermind.git cd hypermind npm install npm run dev

# Create Cloudflare resources wrangler d1 create hypermind-prod wrangler vectorize create hypermind-embeddings --dimensions=768 --metric=cosine wrangler kv:namespace create CACHE # Optional: Create R2 bucket for ancient memory archival wrangler r2 bucket create hypermind-archive # Update wrangler.toml with your resource IDs

# Apply migrations to production wrangler d1 migrations apply hypermind-prod --remote

npm test # Run tests npm run test:coverage # With coverage npm run lint # Code quality

memories: Conversation storage with optimization metadata
memory_consolidations: Tracks consolidated memory summaries
entities: Extracted entities (people, places, concepts)
temporal_triplets: Subject-Predicate-Object relationships
forgetting_config: Per-user decay settings

-- New fields in memories table significance_score REAL DEFAULT 1.0 -- 0.0-1.0 importance score consolidated INTEGER DEFAULT 0 -- Is this memory consolidated? consolidated_into TEXT -- Reference to summary memory vector_archived INTEGER DEFAULT 0 -- Removed from vector index? r2_archived INTEGER DEFAULT 0 -- Stored in R2? dedup_hash TEXT -- Hash for duplicate detection

-- Example temporal triplet INSERT INTO temporal_triplets (subject, predicate, object, episodic_type, valid_from) VALUES ('user123', 'prefers', 'TypeScript', 'factual', '2024-01-01');

Build chatbots that remember user preferences, conversation history, and context across sessions - without bloating your database.

Use HyperMind as your vector store with automatic optimization for document-based AI applications.

Create AI tutors that remember student progress, learning patterns, and knowledge gaps - with intelligent consolidation.

Build AI assistants that remember customer interactions while archiving old, irrelevant data automatically.

Create NPCs with persistent memory that evolves and consolidates over time.

Prevents storing duplicate or near-duplicate memories:

// Automatic similarity detection const similarity = cosineSimilarity(newEmbedding, existingEmbedding); if (similarity > 0.90) { // Merge with existing memory instead of creating new await mergeMemories(existing, newContent); }

Filters out low-value content automatically:

❌ Generic greetings: "hi", "hello", "thanks"
❌ Acknowledgments: "ok", "got it", "understood"
❌ Emoji-only messages
❌ Very short content (< 20 characters)
✅ Technical discussions (high significance score)
✅ Personal information (high significance score)

Automatically clusters and summarizes related memories:

// Daily consolidation process 1. Find related memories (cosine similarity > 0.70) 2. Group into clusters (3+ memories per cluster) 3. Generate summary memory 4. Mark originals as consolidated 5. Update vector index with summary

Result: 30-40% reduction in active corpus size

Automatically moves memories through storage tiers:

// Archival process Hot (0-7d) → Full vector search, all features active Warm (7-30d) → Full vector search, lower priority Cold (30-90d)→ Vector search only if needed Archived → Removed from vector index, D1 only Ancient → Compressed, stored in R2 (optional)

Result: 2-3x faster search on large datasets

We welcome contributions! Please see our Contributing Guide for details.