Show HN: Secure retrieval augmented generation using reBAC and SQLite-vec

7 hours ago 1

RAG (Retrieval-Augmented Generation) lets LLMs answer questions about documents by fetching relevant content and adding it to the prompt. It's everywhere: customer support, enterprise search, legal discovery. But RAG doesn't work in multi-user contexts where different users have different permissions. This repository shows how to fix it with ReBAC (relationship based access control) using Ollama and Ory Keto, an open source Google Zanzibar implementation.

TL;DR: Most RAG systems leak private data across users. This repo demonstrates permission-aware RAG that guarantees the LLM never sees unauthorized documents. Think Google Zanzibar meets embeddings — fork it, break it, extend it.

# Alice queries the system curl -X POST /query -H "Auth: bad-actor" \ -d '{"question": "What was the total refund?"}' # Response: "$1,200 for John Doe and $3,500 for ABC Corp" ❌ DATA LEAK

With ReRAG (ReBAC-powered RAG)

# Alice queries (can only see John Doe's docs) curl -X POST /query -H "Auth: alice" \ -d '{"question": "What was the total refund?"}' # Response: "$1,200 for John Doe" ✅ # Bob queries (can only see ABC Corp's docs) curl -X POST /query -H "Auth: bob" \ -d '{"question": "What was the total refund?"}' # Response: "$3,500 for ABC Corporation" ✅ # Bad actor queries (no docs at all) curl -X POST /query -H "Auth: bad-actor" \ -d '{"question": "What was the total refund?"}' # Response: "You don't have access to any tax returns." ✅

The model never sees text the user isn't authorized for. No prompt injection can leak it.

Prerequisites:

Docker (for Ollama)
Golang (1.22+)
tmux (optional, for make dev)

First clone the repository:

git clone https://github.com/ory/rerag-rbac-rag-llm.git cd rerag-rbac-rag-llm

Then run the demo:

# Install dependencies (starts Ollama via Docker, installs Keto, pulls models) make install # If you have tmux (starts Keto and app in split panes): make dev # If you do not have tmux (run in separate terminals): make start-keto # Terminal 1 make start-app # Terminal 2 # Setup and run demo make demo

Note: This project requires CGO (C compiler) for sqlite-vec integration. Ensure you have a C compiler installed:

macOS: Install Xcode Command Line Tools (xcode-select --install)
Linux: Install build-essential (apt-get install build-essential)
Windows: Install MinGW-w64 or use WSL

This will:

Start Ollama via Docker and pull required models (llama3.2:1b, nomic-embed-text)
Install Keto and Go dependencies
Start Keto and the application server
Load demo documents
Run permission-aware queries showing different results per user

The Ollama container runs as rerag-ollama on port 11434. To stop it, run make reset.

See config.example.yaml for all configuration options.

Standard RAG pulls all matching documents into context, then relies on the LLM to "respect" permissions. That's a compliance nightmare waiting to happen. This architecture:

Filters at retrieval: Only authorized documents enter the vector search results
Never leaks: Unauthorized content never reaches the LLM context window
No prompt injection: Users can't trick the LLM into revealing data they shouldn't see
Audit-ready: Every permission check is logged and traceable
Transport security: Optional TLS/HTTPS encryption
Data at rest: Optional SQLite database encryption

All open source, runs locally:

Ory Keto: Google Zanzibar-based ReBAC for permissions
Ollama: Local LLM runner via Docker (llama3.2:1b for inference, nomic-embed-text for embeddings)
SQLite: Persistent vector storage with optional encryption
sqlite-vec: Fast vector similarity search directly in SQLite using KNN
Go: For performance and hackability (requires CGO for sqlite-vec)
Docker: For running Ollama in a container
TLS/HTTPS: Optional SSL encryption for secure transport

graph TD %% ------------------------ %% Add documents flow %% ------------------------ subgraph ADD["📥 Document Management"] AA["New Document (POST /documents)"] AA --> H["Permission Assignment (Ory Keto)"] AA --> DD["Generate Embeddings (Ollama)"] DD --> I end %% ------------------------ %% Query flow %% ------------------------ subgraph QUERY["🔎 Query Documents"] A["📝 User Query"] A --> B["🔒 Auth Middleware"] B --> D["🔍 Vector KNN Search (sqlite-vec)"] D --> E["🛂 Permission Check (Ory Keto)"] E --> F["🤖 LLM Processing (Ollama)"] F --> G["✅ Secure Response"] I["SQLite vec0 Virtual Table"] J["Ollama / LLM"] end %% Wiring external systems H --> E I --> D J --> F

Upload: Documents tagged with owner metadata, embeddings stored in sqlite-vec
Permissions: Relationships defined in Keto (who can see what)
Query: User asks a question, embedding generated
Vector Search: sqlite-vec performs efficient KNN search in SQLite
Filter: Permission check ensures user can access retrieved documents
Answer: LLM processes authorized subset only

Vector Search Performance

The system uses sqlite-vec for efficient vector similarity search directly in SQLite:

Native SQL operations: Vector search happens in the database, not in application memory
KNN algorithm: K-nearest neighbors search using cosine distance
Efficient storage: Vectors stored in a vec0 virtual table with automatic indexing
No memory overhead: Documents don't need to be loaded into memory for similarity computation
Scales with SQLite: Leverages SQLite's proven performance and reliability
Adaptive recursive search: Dynamically increases candidate pool when filtering reduces results
Permission-aware filtering: Efficiently handles sparse permission scenarios without over-fetching

Recursive Search Algorithm

When searching with permission filters, the system uses an adaptive approach:

Initial Search: Fetches topK × 2 candidates from sqlite-vec
Filter Application: Applies permission filter to candidates
Adaptive Expansion: If insufficient matches found:
- Recursively doubles the candidate pool (growth factor: 2.0)
- Continues until topK matches found or all documents searched
- Safety limit of 10 attempts prevents infinite recursion
Optimization: Stops early when enough matches found or no more documents exist

This approach balances efficiency with completeness, adapting to different permission distributions without requiring manual tuning.

# Upload document curl -X POST localhost:4477/documents \ -d '{"title": "Tax Return", "content": "...", "metadata": {"taxpayer": "John Doe"}}' # Query with permissions curl -X POST localhost:4477/query \ -H "Authorization: Bearer alice" \ -d '{"question": "What was the refund amount?"}' # Check what Alice can see curl localhost:4477/permissions -H "Authorization: Bearer alice"

ReRAG supports flexible configuration via config files and environment variables.

Create a config.yaml file for persistent settings:

# Example configuration file for LLM RAG ReBAC OSS # Copy this to config.yaml and modify as needed # Server configuration server: host: 'localhost' port: 4477 read_timeout: 30 # seconds write_timeout: 30 # seconds # TLS/HTTPS configuration tls: enabled: false # Set to true to enable HTTPS cert_file: '' # Path to TLS certificate file (required if enabled) key_file: '' # Path to TLS private key file (required if enabled) min_version: '1.3' # Minimum TLS version ("1.2" or "1.3") # Database configuration database: path: 'data/vector_store.db' # Database encryption using SQLCipher encryption: enabled: false # Set to true to enable database encryption key: '' # Encryption key (required if enabled) # External services services: # Ollama configuration ollama: base_url: 'http://localhost:11434' embedding_model: 'nomic-embed-text' llm_model: 'llama3.2:1b' # A model that fits on your machine / use case timeout: 60 # seconds # Ory Keto configuration keto: read_url: 'http://localhost:4466' write_url: 'http://localhost:4467' timeout: 10 # seconds # Security settings security: auth_mode: 'mock' # "mock" or "jwt" jwt_secret: '' # JWT secret (required if auth_mode is "jwt") error_mode: 'detailed' # "detailed" or "secure" # Application settings app: environment: 'development' # "development", "staging", or "production" log_level: 'info' # "debug", "info", "warn", or "error" log_format: 'text' # "text" or "json"

Override any setting with environment variables:

# Enable HTTPS export SERVER_TLS_ENABLED=true export SERVER_TLS_CERT_FILE=certs/cert.pem export SERVER_TLS_KEY_FILE=certs/key.pem # Enable database encryption export DATABASE_ENCRYPTION_ENABLED=true export DATABASE_ENCRYPTION_KEY=your-secret-key # Production settings export APP_ENVIRONMENT=production export SECURITY_ERROR_MODE=secure

For HTTPS support, generate certificates:

# Development certificates (not for production!) mkdir certs openssl req -x509 -newkey rsa:4096 -keyout certs/key.pem \ -out certs/cert.pem -days 365 -nodes \ -subj "/CN=localhost" # Enable in config echo "server:" > config.yaml echo " tls:" >> config.yaml echo " enabled: true" >> config.yaml echo " cert_file: certs/cert.pem" >> config.yaml echo " key_file: certs/key.pem" >> config.yaml

Enable SQLite encryption for data at rest:

database: encryption: enabled: true key: 'your-32-character-encryption-key'

⚠️ Important: Store encryption keys securely using environment variables or key management systems in production.

Vector Storage with sqlite-vec

The system uses a dual-table approach for efficient storage and retrieval:

documents table: Stores document metadata (id, title, content)
vec_documents virtual table: Stores vector embeddings using sqlite-vec's vec0 module

This separation allows:

Fast metadata queries without loading embeddings
Efficient vector similarity search using native SQLite operations
Dynamic embedding dimension support (auto-detected from first document)
Adaptive search that scales with permission filtering requirements

Permission-Aware Vector Search

The vector search implementation combines sqlite-vec's KNN algorithm with an adaptive recursive approach:

SQL Query Pattern:

-- Vector KNN search returning top K candidates SELECT d.id, d.title, d.content, v.distance FROM vec_documents v JOIN documents d ON d.id = v.id WHERE v.embedding MATCH ? AND k = ? ORDER BY v.distance;

Adaptive Filtering Algorithm:

1. Start: Fetch topK × 2 candidates via KNN 2. Filter: Apply permission check to candidates 3. Evaluate: - If ≥ topK matches → Return results ✓ - If all documents fetched → Return partial results ✓ - Otherwise → Increase multiplier (×2) and recurse 4. Safety: Stop after 10 attempts, return best effort

Example Scenario:

User requests 5 documents - Attempt 1: Fetch 10 candidates → 2 authorized → insufficient - Attempt 2: Fetch 20 candidates → 4 authorized → insufficient - Attempt 3: Fetch 40 candidates → 6 authorized → success (return 5)

This approach is particularly efficient when:

Users have access to a significant subset of documents (minimal recursion)
Permission distribution is sparse but consistent (predictable growth)
Document corpus is large but user access is limited (avoids loading all vectors)

The project requires CGO enabled for sqlite-vec:

# Build with CGO CGO_ENABLED=1 go build -o bin/server . # Run tests CGO_ENABLED=1 go test ./...

The Makefile automatically sets CGO_ENABLED=1 for all build operations.

This is a working reference, not production code. Ideas for extensions:

Real Auth: Replace mock tokens with OAuth2/OIDC ([Ory Hydra] works great with Ory Keto)
Scale Storage: Swap SQLite for Pinecone/Weaviate/pgvector (keep sqlite-vec approach)
Audit Trail: Add comprehensive logging for compliance
Reverse Expand: Instead of using vector search to filter, use Keto to pre-filter document IDs
UI: Build a simple web interface for uploading/querying documents
Vector Indexing: Add HNSW or other ANN indexes for larger datasets

The GitHub Actions workflow includes optimizations for faster CI runs:

🎯 Model Caching: Ollama models are cached between CI runs using GitHub's cache action
⚡ Simple Setup: Straightforward installation with minimal complexity
🔍 Quick Health Checks: Simple service readiness verification

First run: Downloads and caches models (~3-4 minutes)
Subsequent runs: Uses cached models (~1-2 minutes)
Cache hit rate: 90%+ for models that don't change

Problem Solution

Ollama connection refused	Run make install-ollama or docker start rerag-ollama
Models missing	Run docker exec rerag-ollama ollama pull llama3.2:1b nomic-embed-text
Keto not running	Check with curl localhost:4467/health/ready
Docker not found	Install Docker from https://www.docker.com/get-started
Port 11434 in use	Stop other Ollama instances: docker stop rerag-ollama
TLS certificate errors	Check cert file paths and permissions
Database encryption fails	Verify encryption key and SQLite encryption support
Config validation errors	Check required fields when features are enabled
CGO build errors	Ensure C compiler is installed (see requirements above)
sqlite-vec not found	Run go mod tidy and ensure CGO is enabled