Show HN: HyperMind – Experimental human-like memory layer for AI apps (OS)

7 hours ago 1

🧠 Intelligent Memory Layer for Large Language Models

 MIT TypeScript Cloudflare Workers Deploy to Cloudflare Workers

Transform stateless LLMs into context-aware AI agents with persistent, optimized memory


HyperMind is a production-grade memory proxy that sits between your application and any LLM provider (OpenAI, Anthropic, Groq, Google). It automatically manages conversation context and long-term memory using cognitive science principles and advanced optimization techniques, preventing vector database bloat while maintaining intelligent memory retention. Try out the demo at: HyperMind chat *experimental

  • LLMs are stateless - they forget everything after each conversation
  • Vector databases grow indefinitely, causing performance degradation
  • Building persistent memory is complex and expensive
  • No intelligent filtering - everything gets stored, even irrelevant content
  • Context windows are limited and expensive to extend

HyperMind provides a universal memory layer with comprehensive optimization:

# Instead of calling providers directly: curl https://api.openai.com/v1/chat/completions curl https://api.anthropic.com/v1/messages curl https://api.groq.com/openai/v1/chat/completions curl https://generativelanguage.googleapis.com/v1beta/openai/chat/completions # Call HyperMind (same API, but with intelligent memory): curl https://your-hypermind.workers.dev/router/v1/chat/completions

Your AI now remembers everything - while staying fast and cost-efficient.


  • 🔌 Universal Proxy: Works with any LLM provider (OpenAI, Anthropic, Groq, Google)
  • 🔄 Multi-Provider Support: Seamlessly switch between providers while maintaining memory
  • ⚡ Low Latency: Transparent proxy adds <700ms overhead
  • 💰 Cost Transparent: Uses your API keys, zero markup

Combines three search strategies for comprehensive memory retrieval:

  1. 🎯 Vector Search - Semantic similarity using embeddings
  2. 🕸️ Graph Traversal - Entity relationships and knowledge graphs
  3. ⏰ Chronological - Recent context and temporal relevance

🎛️ Intelligent Memory Optimization

Prevents vector database bloat with advanced techniques:

  1. 🔗 Smart Deduplication - Detects and merges similar memories (90% similarity threshold)
  2. 📊 Significance Filtering - Skips low-value content (greetings, filler, acknowledgments)
  3. 📦 Tiered Archival - Moves old memories through Hot→Warm→Cold→Archived tiers
  4. 🔄 Memory Consolidation - Clusters and summarizes related memories
  5. ⚡ Batch Processing - Queues embeddings for efficient API usage

Result: 40-60% storage reduction, 2-3x faster search, 50-70% fewer API calls

  • 🔗 Temporal Triplets: Subject-Predicate-Object with time validity
  • 🏷️ Entity Extraction: Automatic extraction of people, places, concepts
  • 📝 Episodic Classification: Categorizes memories by type (comparison, question, definition, list, factual)
  • 📉 Smart Decay: Different forgetting rates for different memory types

⏱️ Cognitive Science Integration

Based on Ebbinghaus' Forgetting Curve:

Tier Age Vector Search Status
🔥 Hot 0-7 days Active Full access
🌡️ Warm 7-30 days Active Full access
❄️ Cold 30-90 days Active Lower priority
📦 Archived 90+ days Removed D1 only
🗄️ Ancient 180+ days Compressed R2 storage (optional)

1. Deploy HyperMind (1-click)

Deploy to Cloudflare Workers

Sign up for any LLM provider:

3. Make Your First Request

curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: groq" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [ {"role": "user", "content": "I am building a quantum computing system with 127 qubits"} ] }'
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: groq" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [ {"role": "user", "content": "What quantum computing project am I working on?"} ] }'

Response: "You're building a quantum computing system with 127 qubits..."


sequenceDiagram participant App as Your Application participant Router as Memory Router participant Search as Hybrid Search participant Storage as Storage Layer participant LLM as LLM Provider participant Optim as Optimization App->>Router: Chat Request<br/>(user message) Note over Router: Step 1: Memory Retrieval Router->>Search: Find relevant memories par Parallel Search Search->>Storage: Vector Search (semantic) Search->>Storage: Graph Traversal (entities) Search->>Storage: Chronological (recent) end Storage-->>Search: Combined Results Search-->>Router: Top 15 relevant memories Note over Router: Step 2: Context Injection Router->>Router: Inject memories into prompt Note over Router: Step 3: LLM Request Router->>LLM: Enhanced request<br/>(with context) LLM-->>Router: Response Router-->>App: Final Response<br/>(with memory) Note over Router: Step 4: Background Storage Router->>Optim: Store conversation async Optim->>Optim: Analyze Significance<br/>(score: 0.0-1.0) alt Low Significance (< 0.6) Optim->>Optim: Discard ❌ else High Significance (>= 0.6) Optim->>Optim: Check for duplicates<br/>(hash + similarity) alt Similar Memory Found (> 0.9) Optim->>Storage: Merge with existing 🔗 else New Memory Optim->>Optim: Add to batch queue Optim->>Storage: Store when batch full end end Note over Storage: Tiered Storage Storage->>Storage: Hot (0-7d): Active<br/>Warm (7-30d): Active<br/>Cold (30-90d): Active<br/>Archived (90d+): D1 only<br/>Ancient (180d+): R2
Loading Layer Technology Purpose Data Retention
Active Index Cloudflare Vectorize Semantic search on hot/warm/cold memories 0-90 days
Primary DB Cloudflare D1 (SQLite) All memories, entities, triplets Forever
Query Cache Cloudflare KV LLM analysis results 1 hour TTL
Cold Archive Cloudflare R2 (optional) Compressed ancient memories 180+ days
Incoming Memory ↓ [Significance Analysis] ↓ Score < 0.6? → Discard ❌ ↓ [Hash Check] ↓ Duplicate? → Skip ❌ ↓ [Similarity Check] ↓ Similar (>0.9)? → Merge 🔗 ↓ [Batch Queue] ↓ Queue Full (50)? → Process Batch ↓ [Vector Storage] ↓ Stored ✅

# Chat with memory (works with any provider) curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: groq" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [ {"role": "user", "content": "My favorite programming language is Python"} ] }' # Switch to different provider (memory persists) curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \ -H "Authorization: Bearer YOUR_OPENAI_KEY" \ -H "x-hypermind-user-id: user123" \ -H "x-hypermind-provider: openai" \ -d '{ "model": "gpt-4", "messages": [ {"role": "user", "content": "What programming language do I prefer?"} ] }'
# Store a memory manually curl -X POST "https://your-hypermind.workers.dev/api/memories?userId=user123" \ -H "Content-Type: application/json" \ -d '{ "content": "I prefer TypeScript over JavaScript", "metadata": {"source": "manual", "tags": ["programming"]} }' # Search memories curl -X POST "https://your-hypermind.workers.dev/api/search?userId=user123" \ -H "Content-Type: application/json" \ -d '{ "query": "programming preferences", "limit": 5 }'

⚡ Performance & Optimization

Feature Impact Description
Smart Deduplication 20-30% reduction Merges similar memories (cosine similarity > 0.90)
Significance Filtering 30-40% reduction Skips greetings, filler, low-value content
Tiered Archival 2-3x faster search Removes old memories from active vector index
Memory Consolidation 30-40% reduction Clusters related memories into summaries
Batch Processing 50-70% fewer API calls Queues embeddings for batch processing

Before Optimization:

  • Storage: Linear growth, indefinite
  • Search: 5-10s for 10k+ memories
  • API Calls: Every conversation = 1+ embedding calls

After Optimization:

  • Storage: 40-60% reduction
  • Search: 2-3s for 10k+ memories (2-3x faster)
  • API Calls: 50-70% reduction via batching

Customize optimization thresholds in wrangler.toml:

[vars] DEDUP_SIMILARITY_THRESHOLD = "0.90" # 0.85-0.95 recommended MIN_SIGNIFICANCE_SCORE = "0.60" # 0.5-0.7 recommended CONSOLIDATION_ENABLED = "true" # Enable memory consolidation BATCH_EMBEDDING_SIZE = "50" # Batch size: 10-100 ARCHIVE_COLD_AFTER_DAYS = "90" # Days before archival: 60-180

HyperMind runs automated tasks via cron triggers:

Task Schedule Purpose
Forgetting Cycle Daily 2 AM Update relevance scores, archive old memories
Consolidation Daily 3 AM Cluster and summarize related memories
Batch Processing Every 30 min Process queued embeddings

git clone https://github.com/yourusername/hypermind.git cd hypermind npm install npm run dev
# Create Cloudflare resources wrangler d1 create hypermind-prod wrangler vectorize create hypermind-embeddings --dimensions=768 --metric=cosine wrangler kv:namespace create CACHE # Optional: Create R2 bucket for ancient memory archival wrangler r2 bucket create hypermind-archive # Update wrangler.toml with your resource IDs
# Apply migrations to production wrangler d1 migrations apply hypermind-prod --remote
npm test # Run tests npm run test:coverage # With coverage npm run lint # Code quality

  • memories: Conversation storage with optimization metadata
  • memory_consolidations: Tracks consolidated memory summaries
  • entities: Extracted entities (people, places, concepts)
  • temporal_triplets: Subject-Predicate-Object relationships
  • forgetting_config: Per-user decay settings
-- New fields in memories table significance_score REAL DEFAULT 1.0 -- 0.0-1.0 importance score consolidated INTEGER DEFAULT 0 -- Is this memory consolidated? consolidated_into TEXT -- Reference to summary memory vector_archived INTEGER DEFAULT 0 -- Removed from vector index? r2_archived INTEGER DEFAULT 0 -- Stored in R2? dedup_hash TEXT -- Hash for duplicate detection
-- Example temporal triplet INSERT INTO temporal_triplets (subject, predicate, object, episodic_type, valid_from) VALUES ('user123', 'prefers', 'TypeScript', 'factual', '2024-01-01');

Build chatbots that remember user preferences, conversation history, and context across sessions - without bloating your database.

Use HyperMind as your vector store with automatic optimization for document-based AI applications.

Create AI tutors that remember student progress, learning patterns, and knowledge gaps - with intelligent consolidation.

Build AI assistants that remember customer interactions while archiving old, irrelevant data automatically.

Create NPCs with persistent memory that evolves and consolidates over time.


Prevents storing duplicate or near-duplicate memories:

// Automatic similarity detection const similarity = cosineSimilarity(newEmbedding, existingEmbedding); if (similarity > 0.90) { // Merge with existing memory instead of creating new await mergeMemories(existing, newContent); }

Filters out low-value content automatically:

  • ❌ Generic greetings: "hi", "hello", "thanks"
  • ❌ Acknowledgments: "ok", "got it", "understood"
  • ❌ Emoji-only messages
  • ❌ Very short content (< 20 characters)
  • ✅ Technical discussions (high significance score)
  • ✅ Personal information (high significance score)

Automatically clusters and summarizes related memories:

// Daily consolidation process 1. Find related memories (cosine similarity > 0.70) 2. Group into clusters (3+ memories per cluster) 3. Generate summary memory 4. Mark originals as consolidated 5. Update vector index with summary

Result: 30-40% reduction in active corpus size

Automatically moves memories through storage tiers:

// Archival process Hot (0-7d) Full vector search, all features active Warm (7-30d) Full vector search, lower priority Cold (30-90d) Vector search only if needed Archived Removed from vector index, D1 only Ancient Compressed, stored in R2 (optional)

Result: 2-3x faster search on large datasets


We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request
  • 🐛 Bug fixes
  • ✨ New features (LLM-powered summarization, multi-language support)
  • 📚 Documentation improvements
  • 🧪 Test coverage
  • 🎨 UI/UX enhancements
  • ⚡ Performance optimizations

This project is licensed under the MIT License - see the [LICENSE](https://opensource.org/license/MIT] for details.


  • Ebbinghaus for the forgetting curve research
  • Cloudflare for the amazing Workers platform
Read Entire Article