Show HN: I built an LLM that never forgets – persistent user memory with RAG
2 weeks ago
1
An intelligent proxy for LLMs with long-term memory using PostgreSQL + pgvector. The system automatically extracts and remembers facts about users using AI, without hardcoded triggers.
🤖 AI-Driven Fact Extraction
No hardcoded triggers - AI autonomously decides what's worth remembering
Single LLM call generates response AND extracts facts simultaneously
{
"user_id": "john_doe",
"message": "Hi! I'm a Python developer and I love FastAPI.",
"save_to_memory": true
}
Response:
{
"response": "Hello! Nice to meet you. FastAPI is indeed a great framework!",
"relevant_memories": [
{
"content": "User is a Python developer",
"type": "personal",
"similarity": 0.87,
"created_at": "2025-10-21T10:30:00",
"importance": 1.5
}
],
"memory_saved": true
}
Use the built-in chat tester for easy testing:
This will:
Ask for your user ID
Start an interactive chat session
Show AI responses and used memories
Display when new facts are saved
You can also test directly with curl:
# First message - AI will extract facts
curl -X POST http://localhost:8001/chat \
-H "Content-Type: application/json" \
-d '{ "user_id": "alice", "message": "Hi! My name is Alice and I love hiking in the mountains. I work as a data scientist.", "save_to_memory": true }'# Second message - AI will use remembered facts
curl -X POST http://localhost:8001/chat \
-H "Content-Type: application/json" \
-d '{ "user_id": "alice", "message": "What do you know about me?", "save_to_memory": true }'
The system uses a single optimized LLM call that handles both response generation and fact extraction:
Input: User message + retrieved memories (context)
Single LLM call: AI generates structured JSON containing:
response - natural reply to user
facts - list of extracted memories
Categorization: AI automatically assigns memory type for each fact
Importance scoring: AI determines value (0.5-2.0) for each fact
Validation: System validates JSON and saves facts to database
Example LLM output:
{
"response": "Hello! Nice to meet a fellow Python enthusiast!",
"facts": [
{"content": "User is a Python developer", "memory_type": "personal", "importance": 1.5},
{"content": "User works on AI projects", "memory_type": "personal", "importance": 1.4},
{"content": "User loves FastAPI framework", "memory_type": "preference", "importance": 1.3}
]
}
⚡ Efficient: Only 1 API call per user message
🎯 Smart: AI decides what's worth remembering
📊 Structured: Returns both conversation response and extracted facts
id - SERIAL PRIMARY KEY
user_id - VARCHAR(255) - User identifier
content - TEXT - Memory content
memory_type - VARCHAR(50) - Type (preference/personal/skill/etc.)
embedding - VECTOR(768) - Vector embedding
created_at - TIMESTAMP - Creation date
importance - FLOAT - Importance weight (0.5-2.0)
Note: The conversations table has been removed - system doesn't store full conversations.
AI not extracting facts correctly:
Check logs - system prints warnings about parsing errors
Increase max_tokens in generate_response_and_extract_facts (currently 600)
Use a stronger LLM model (e.g., llama-3-8b instead of smollm-135m)
Adjust the prompt in generate_response_and_extract_facts
Database dimension errors:
If you get expected 384 dimensions, not 768 error:
# Clean up database and recreate with correct dimensions
python cleanup_db.py
# Restart proxy
python proxy.py
This happens when the database was created with wrong embedding dimensions.