RAG (Retrieval-Augmented Generation) lets LLMs answer questions about documents by fetching relevant content and adding it to the prompt. It's everywhere: customer support, enterprise search, legal discovery. But RAG doesn't work in multi-user contexts where different users have different permissions. This repository shows how to fix it with ReBAC (relationship based access control) using Ollama and Ory Keto, an open source Google Zanzibar implementation.
TL;DR: Most RAG systems leak private data across users. This repo demonstrates permission-aware RAG that guarantees the LLM never sees unauthorized documents. Think Google Zanzibar meets embeddings — fork it, break it, extend it.
The model never sees text the user isn't authorized for. No prompt injection can leak it.
Prerequisites:
First clone the repository:
Then run the demo:
Note: This project requires CGO (C compiler) for sqlite-vec integration. Ensure you have a C compiler installed:
- macOS: Install Xcode Command Line Tools (xcode-select --install)
- Linux: Install build-essential (apt-get install build-essential)
- Windows: Install MinGW-w64 or use WSL
This will:
- Start Ollama via Docker and pull required models (llama3.2:1b, nomic-embed-text)
- Install Keto and Go dependencies
- Start Keto and the application server
- Load demo documents
- Run permission-aware queries showing different results per user
The Ollama container runs as rerag-ollama on port 11434. To stop it, run make reset.
See config.example.yaml for all configuration options.
Standard RAG pulls all matching documents into context, then relies on the LLM to "respect" permissions. That's a compliance nightmare waiting to happen. This architecture:
- Filters at retrieval: Only authorized documents enter the vector search results
- Never leaks: Unauthorized content never reaches the LLM context window
- No prompt injection: Users can't trick the LLM into revealing data they shouldn't see
- Audit-ready: Every permission check is logged and traceable
- Transport security: Optional TLS/HTTPS encryption
- Data at rest: Optional SQLite database encryption
All open source, runs locally:
- Ory Keto: Google Zanzibar-based ReBAC for permissions
- Ollama: Local LLM runner via Docker (llama3.2:1b for inference, nomic-embed-text for embeddings)
- SQLite: Persistent vector storage with optional encryption
- sqlite-vec: Fast vector similarity search directly in SQLite using KNN
- Go: For performance and hackability (requires CGO for sqlite-vec)
- Docker: For running Ollama in a container
- TLS/HTTPS: Optional SSL encryption for secure transport
- Upload: Documents tagged with owner metadata, embeddings stored in sqlite-vec
- Permissions: Relationships defined in Keto (who can see what)
- Query: User asks a question, embedding generated
- Vector Search: sqlite-vec performs efficient KNN search in SQLite
- Filter: Permission check ensures user can access retrieved documents
- Answer: LLM processes authorized subset only
The system uses sqlite-vec for efficient vector similarity search directly in SQLite:
- Native SQL operations: Vector search happens in the database, not in application memory
- KNN algorithm: K-nearest neighbors search using cosine distance
- Efficient storage: Vectors stored in a vec0 virtual table with automatic indexing
- No memory overhead: Documents don't need to be loaded into memory for similarity computation
- Scales with SQLite: Leverages SQLite's proven performance and reliability
- Adaptive recursive search: Dynamically increases candidate pool when filtering reduces results
- Permission-aware filtering: Efficiently handles sparse permission scenarios without over-fetching
When searching with permission filters, the system uses an adaptive approach:
- Initial Search: Fetches topK × 2 candidates from sqlite-vec
- Filter Application: Applies permission filter to candidates
- Adaptive Expansion: If insufficient matches found:
- Recursively doubles the candidate pool (growth factor: 2.0)
- Continues until topK matches found or all documents searched
- Safety limit of 10 attempts prevents infinite recursion
- Optimization: Stops early when enough matches found or no more documents exist
This approach balances efficiency with completeness, adapting to different permission distributions without requiring manual tuning.
ReRAG supports flexible configuration via config files and environment variables.
Create a config.yaml file for persistent settings:
Override any setting with environment variables:
For HTTPS support, generate certificates:
Enable SQLite encryption for data at rest:
⚠️ Important: Store encryption keys securely using environment variables or key management systems in production.
The system uses a dual-table approach for efficient storage and retrieval:
- documents table: Stores document metadata (id, title, content)
- vec_documents virtual table: Stores vector embeddings using sqlite-vec's vec0 module
This separation allows:
- Fast metadata queries without loading embeddings
- Efficient vector similarity search using native SQLite operations
- Dynamic embedding dimension support (auto-detected from first document)
- Adaptive search that scales with permission filtering requirements
The vector search implementation combines sqlite-vec's KNN algorithm with an adaptive recursive approach:
SQL Query Pattern:
Adaptive Filtering Algorithm:
Example Scenario:
This approach is particularly efficient when:
- Users have access to a significant subset of documents (minimal recursion)
- Permission distribution is sparse but consistent (predictable growth)
- Document corpus is large but user access is limited (avoids loading all vectors)
The project requires CGO enabled for sqlite-vec:
The Makefile automatically sets CGO_ENABLED=1 for all build operations.
This is a working reference, not production code. Ideas for extensions:
- Real Auth: Replace mock tokens with OAuth2/OIDC ([Ory Hydra] works great with Ory Keto)
- Scale Storage: Swap SQLite for Pinecone/Weaviate/pgvector (keep sqlite-vec approach)
- Audit Trail: Add comprehensive logging for compliance
- Reverse Expand: Instead of using vector search to filter, use Keto to pre-filter document IDs
- UI: Build a simple web interface for uploading/querying documents
- Vector Indexing: Add HNSW or other ANN indexes for larger datasets
The GitHub Actions workflow includes optimizations for faster CI runs:
- 🎯 Model Caching: Ollama models are cached between CI runs using GitHub's cache action
- ⚡ Simple Setup: Straightforward installation with minimal complexity
- 🔍 Quick Health Checks: Simple service readiness verification
- First run: Downloads and caches models (~3-4 minutes)
- Subsequent runs: Uses cached models (~1-2 minutes)
- Cache hit rate: 90%+ for models that don't change
Ollama connection refused | Run make install-ollama or docker start rerag-ollama |
Models missing | Run docker exec rerag-ollama ollama pull llama3.2:1b nomic-embed-text |
Keto not running | Check with curl localhost:4467/health/ready |
Docker not found | Install Docker from https://www.docker.com/get-started |
Port 11434 in use | Stop other Ollama instances: docker stop rerag-ollama |
TLS certificate errors | Check cert file paths and permissions |
Database encryption fails | Verify encryption key and SQLite encryption support |
Config validation errors | Check required fields when features are enabled |
CGO build errors | Ensure C compiler is installed (see requirements above) |
sqlite-vec not found | Run go mod tidy and ensure CGO is enabled |
This is experimental code meant for learning and extending. PRs welcome!
Found this useful? Hit us with a star. Have ideas? Open an issue or PR.