I built a knowledge system that gives AI perfect codebase memory

4 months ago 16

© 2025 Muvon Un Limited (Hong Kong) | Website | Product Page

License Rust

Octocode is a powerful code indexer and semantic search engine that builds intelligent knowledge graphs of your codebase. It combines advanced AI capabilities with local-first design to provide deep code understanding, relationship mapping, and intelligent assistance for developers.

  • Natural language queries across your entire codebase
  • Multi-mode search (code, documentation, text, or all)
  • Intelligent ranking with similarity scoring
  • Symbol expansion for comprehensive results

🕸️ Knowledge Graph (GraphRAG)

  • Automatic relationship discovery between files and modules
  • Import/export dependency tracking
  • AI-powered file descriptions and architectural insights
  • Path finding between code components
  • Rust, Python, JavaScript, TypeScript, Go, PHP
  • C++, Ruby, JSON, Bash, Markdown
  • Tree-sitter based parsing for accurate symbol extraction
  • Smart commit message generation
  • Code review with best practices analysis
  • Memory system for storing insights, decisions, and context
  • Semantic memory search with vector similarity
  • Memory relationships and automatic context linking
  • Multiple LLM support via OpenRouter
  • Built-in Model Context Protocol server
  • Seamless integration with AI assistants (Claude Desktop, etc.)
  • Real-time file watching and auto-reindexing
  • Rich tool ecosystem for code analysis

Performance & Flexibility

  • Optimized indexing: Batch metadata loading eliminates database query storms
  • Smart batching: 16 files per batch with token-aware API optimization
  • Frequent persistence: Data saved every 16 files (max 16 files at risk)
  • Fast file traversal: Single-pass progressive counting and processing
  • Local embedding models: FastEmbed and SentenceTransformer (macOS only)
  • Cloud embedding providers: Voyage AI (default), Jina AI, Google
  • Free tier available: Voyage AI provides 200M free tokens monthly
  • Lance columnar database for fast vector search
  • Incremental indexing and git-aware optimization

Download Prebuilt Binary (Recommended)

# Universal install script (Linux, macOS, Windows) - requires curl curl -fsSL https://raw.githubusercontent.com/Muvon/octocode/master/install.sh | sh

Or download manually from GitHub Releases.

cargo install --git https://github.com/Muvon/octocode

Prerequisites:

git clone https://github.com/Muvon/octocode.git cd octocode # macOS: Full build with local embeddings cargo build --release # Windows/Linux: Cloud embeddings only (due to ONNX Runtime issues) cargo build --release --no-default-features

Note: Prebuilt binaries use cloud embeddings only. Local embeddings require building from source on macOS.

🔑 Getting Started - API Keys

⚠️ Important: Octocode requires API keys to function. Local embedding models are only available on macOS builds.

Required: Voyage AI (Embeddings)

export VOYAGE_API_KEY="your-voyage-api-key"
  • Free tier: 200M tokens per month
  • Get API key: voyageai.com
  • Used for: Code and text embeddings (semantic search)

Optional: OpenRouter (LLM Features)

export OPENROUTER_API_KEY="your-openrouter-api-key"
  • Get API key: openrouter.ai
  • Used for: Commit messages, code review, GraphRAG descriptions
  • Note: Basic search and indexing work without this
  • Windows/Linux: Must use cloud embeddings (Voyage AI default)
  • macOS: Can use local embeddings (build from source) or cloud embeddings

1. Setup API Keys (Required)

# Set Voyage AI API key for embeddings (free 200M tokens/month) export VOYAGE_API_KEY="your-voyage-api-key" # Optional: Set OpenRouter API key for LLM features (commit, review, GraphRAG) export OPENROUTER_API_KEY="your-openrouter-api-key"

Get your free API keys:

# Index your current directory octocode index # Search your codebase octocode search "HTTP request handling" # View code signatures octocode view "src/**/*.rs"

3. AI-Powered Git Workflow (Requires OpenRouter API Key)

# Generate intelligent commit messages git add . octocode commit # Review code for best practices octocode review

4. MCP Server for AI Assistants

# Start MCP server octocode mcp # Use with Claude Desktop or other MCP-compatible tools # Provides: search_code, search_graphrag, memorize, remember, forget
# Store important insights and decisions octocode memory memorize \ --title "Authentication Bug Fix" \ --content "Fixed JWT token validation in auth middleware" \ --memory-type bug_fix \ --tags security,jwt,auth # Search your memory with semantic similarity octocode memory remember "JWT authentication issues" # Get memories by type, tags, or files octocode memory by-type bug_fix octocode memory by-tags security,auth octocode memory for-files src/auth.rs # Clear all memory data (useful for testing) octocode memory clear-all --yes
# Enable GraphRAG with AI descriptions (requires OpenRouter API key) octocode config --graphrag-enabled true octocode index # Search the knowledge graph octocode graphrag search --query "authentication modules" # Watch for changes octocode watch
Command Description Example
octocode index Index the codebase octocode index --reindex
octocode search <query> Semantic code search octocode search "error handling"
octocode graphrag <operation> Knowledge graph operations octocode graphrag search --query "auth"
octocode view [pattern] View code signatures octocode view "src/**/*.rs" --md
octocode commit AI-powered git commit octocode commit --all
octocode review Code review assistant octocode review --focus security
octocode memory <operation> Memory management octocode memory remember "auth bugs"
octocode mcp Start MCP server octocode mcp --debug
octocode watch Auto-reindex on changes octocode watch --quiet
octocode config Manage configuration octocode config --show

Octocode includes a powerful memory system for storing and retrieving project insights, decisions, and context using semantic search and relationship mapping.

Command Description Example
memorize Store new information octocode memory memorize --title "Bug Fix" --content "Details..."
remember Search memories semantically octocode memory remember "authentication issues"
forget Delete specific memories octocode memory forget --memory-id abc123
update Update existing memory octocode memory update abc123 --add-tags security
get Retrieve memory by ID octocode memory get abc123
recent List recent memories octocode memory recent --limit 10
by-type Filter by memory type octocode memory by-type bug_fix
by-tags Filter by tags octocode memory by-tags security,auth
for-files Find memories for files octocode memory for-files src/auth.rs
stats Show memory statistics octocode memory stats
cleanup Remove old memories octocode memory cleanup
clear-all Delete all memories octocode memory clear-all --yes
relate Create relationships octocode memory relate source-id target-id
  • code - Code-related insights and patterns
  • bug_fix - Bug reports and solutions
  • feature - Feature implementations and decisions
  • architecture - Architectural decisions and patterns
  • performance - Performance optimizations and metrics
  • security - Security considerations and fixes
  • testing - Test strategies and results
  • documentation - Documentation notes and updates
# Store a bug fix with context octocode memory memorize \ --title "JWT Token Validation Fix" \ --content "Fixed race condition in token refresh logic by adding mutex lock" \ --memory-type bug_fix \ --importance 0.8 \ --tags security,jwt,race-condition \ --files src/auth/jwt.rs,src/middleware/auth.rs # Search for authentication-related memories octocode memory remember "JWT authentication problems" \ --memory-types bug_fix,security \ --min-relevance 0.7 # Get all security-related memories octocode memory by-tags security --format json # Clear all memory data (useful for testing/reset) octocode memory clear-all --yes

Octocode stores configuration in ~/.local/share/octocode/config.toml.

# Set Voyage AI API key (required for embeddings) export VOYAGE_API_KEY="your-voyage-api-key" # Optional: Set OpenRouter API key for LLM features export OPENROUTER_API_KEY="your-openrouter-api-key"
# View current configuration octocode config --show # Use local models (macOS only - requires building from source) octocode config \ --code-embedding-model "fastembed:jinaai/jina-embeddings-v2-base-code" \ --text-embedding-model "fastembed:sentence-transformers/all-MiniLM-L6-v2-quantized" # Use different cloud embedding provider octocode config \ --code-embedding-model "jina:jina-embeddings-v2-base-code" \ --text-embedding-model "jina:jina-embeddings-v2-base-en" # Enable GraphRAG with AI descriptions octocode config --graphrag-enabled true # Set custom OpenRouter model octocode config --model "openai/gpt-4o-mini"
  • Code embedding: voyage:voyage-code-2 (Voyage AI)
  • Text embedding: voyage:voyage-2 (Voyage AI)
  • LLM: openai/gpt-4o-mini (via OpenRouter)
  • Windows/Linux: Cloud embeddings only (Voyage AI, Jina AI, Google)
  • macOS: Local embeddings available (FastEmbed, SentenceTransformer) + cloud options
  • 🏠 Local-first option: FastEmbed and SentenceTransformer run entirely offline (macOS only)
  • 🔑 Secure storage: API keys stored locally, environment variables supported
  • 📁 Respects .gitignore: Never indexes sensitive files or directories
  • 🛡️ MCP security: Server runs locally with no external network access for search
  • 🌐 Cloud embeddings: Voyage AI and other providers process only file metadata, not source code
Language Extensions Features
Rust .rs Full AST parsing, pub/use detection, module structure
Python .py Import/class/function extraction, docstring parsing
JavaScript .js, .jsx ES6 imports/exports, function declarations
TypeScript .ts, .tsx Type definitions, interface extraction
Go .go Package/import analysis, struct/interface parsing
PHP .php Class/function extraction, namespace support
C++ .cpp, .hpp, .h Include analysis, class/function extraction
Ruby .rb Class/module extraction, method definitions
JSON .json Structure analysis, key extraction
Bash .sh, .bash Function and variable extraction
Markdown .md Document section indexing, header extraction

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Built with ❤️ by the Muvon team in Hong Kong

Read Entire Article