This document introduces the LLKV database system, explaining its purpose, architecture, and core components. LLKV is a SQL database engine built in Rust that provides ACID transactions with MVCC snapshot isolation, columnar storage, and comprehensive SQL support.
For detailed information about specific subsystems:
- Architecture and component interactions: see Architecture
- SQL interface and query processing: see SQL Interface
- Transaction management: see Transaction Management
- Testing infrastructure: see Testing Infrastructure
Purpose and Scope
LLKV is a relational database system that executes SQL queries against columnar data structures. The system is organized as a Cargo workspace containing 15 specialized crates, each responsible for a distinct layer of the database stack—from SQL parsing and query planning through execution and storage.
The codebase emphasizes modularity, performance optimization through SIMD acceleration, and correctness validation via extensive SQL Logic Testing. The architecture follows a layered design where higher-level crates depend on lower-level abstractions, enabling independent testing and optimization of each subsystem.
Sources: Cargo.toml1-81 llkv-sql/src/sql_engine.rs141-185
System Architecture
The following diagram illustrates the major subsystems and their dependencies:
Crate Responsibilities:
| llkv | Root crate aggregating all subsystems |
| llkv-sql | SQL parsing, preprocessing, and SqlEngine API |
| llkv-plan | Logical query plan structures (SelectPlan, InsertPlan, etc.) |
| llkv-expr | Expression AST types (Expr, ScalarExpr, Literal) |
| llkv-runtime | Transaction orchestration (RuntimeEngine, RuntimeSession) |
| llkv-executor | Query plan execution and result streaming |
| llkv-table | Table abstraction, catalog management, schema handling |
| llkv-column-map | Columnar storage implementation |
| llkv-storage | Storage pager interface (Pager, BoxedPager, MemPager) |
| llkv-transaction | MVCC transaction management |
| llkv-aggregate | Aggregate function evaluation |
| llkv-join | Join operation execution |
| llkv-result | Common error types (Error, Result) |
| llkv-test-utils | Testing utilities and tracing initialization |
| llkv-slt-tester | SQL Logic Test harness |
Sources: Cargo.toml61-80 Diagram 1 from high-level overview
Core Components
SqlEngine - Entry Point
The SqlEngine struct serves as the primary interface for executing SQL statements. It wraps a RuntimeEngine and handles SQL preprocessing, parsing, and statement routing.
Key methods:
- SqlEngine::new() - Create engine with storage pager
- SqlEngine::execute() - Execute one or more SQL statements
- SqlEngine::sql() - Execute single SELECT and return batches
Sources: llkv-sql/src/sql_engine.rs272-314 llkv-sql/src/sql_engine.rs685-760
RuntimeEngine - Transaction Coordinator
The RuntimeEngine manages transaction sessions and coordinates between the execution layer and storage. Each engine maintains a RuntimeContext (shared catalog and storage) and a RuntimeSession (per-connection transaction state).
Sources: Diagram 4 from high-level overview, llkv-sql/src/sql_engine.rs55-60
Query Execution Pipeline
The execution pipeline transforms SQL text into Arrow RecordBatch results through multiple stages:
Plan Types:
- SelectPlan - Query with projections, filters, joins, aggregates
- InsertPlan - Insert rows (literal values or SELECT source)
- UpdatePlan - Update rows with assignments and filter
- DeletePlan - Delete rows matching filter
- CreateTablePlan - Create table with column specifications
Sources: llkv-plan/src/plans.rs1-1230 Diagram 2 from high-level overview
Data Representation
LLKV uses Apache Arrow for columnar data throughout the system:
The llkv-column-map crate implements columnar storage that maps Arrow schemas to storage blobs via the Pager interface. All query results flow through RecordBatch structures, ensuring consistent data representation across subsystem boundaries.
Sources: Cargo.lock100-117 (Arrow dependencies), Diagram 3 from high-level overview
Transaction Model
LLKV implements Multi-Version Concurrency Control (MVCC) with snapshot isolation:
| TxnIdManager | Generates monotonic transaction IDs and tracks watermarks |
| TransactionSnapshot | Captures txn_id + snapshot_id for visibility checks |
| MVCC Tags | Each row has created_by and deleted_by transaction IDs |
| Row Visibility Filter | Compares MVCC tags against snapshot to determine visibility |
Transactions maintain two execution contexts:
- Base Context - Access to committed tables with MVCC filtering
- Staging Context - Isolated in-memory storage for new tables created in the transaction
Sources: Diagram 4 from high-level overview, Cargo.toml25 (llkv-transaction dependency)
Storage Architecture
The storage layer provides pluggable persistence through the Pager trait:
The simd-r-drive crate provides SIMD-optimized storage operations. EntryHandle represents persistent blob references that the ColumnStore uses to serialize and deserialize Arrow arrays.
Sources: Cargo.toml26-27 (simd-r-drive dependencies), llkv-sql/src/sql_engine.rs34-35
Testing Infrastructure
LLKV includes comprehensive testing at multiple levels:
| Unit Tests | cargo test | Per-crate functionality |
| Integration Tests | cargo test | Cross-crate interactions |
| SQL Logic Tests | llkv-slt-tester | Thousands of SQL test cases |
| Benchmarks | Criterion + CodSpeed | Performance regression tracking |
The llkv-slt-tester crate implements a SQL Logic Test runner that executes .test files containing SQL statements and expected results. The harness supports test preprocessing (loop expansion, conditional filtering) and integrates with CI/CD for continuous validation.
Sources: Cargo.toml20 (llkv-slt-tester), Diagram 5 from high-level overview
Key Features
SQL Capabilities:
- DDL: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, CREATE VIEW
- DML: INSERT, UPDATE, DELETE, TRUNCATE, SELECT
- Queries: Joins, subqueries, aggregates, GROUP BY, HAVING, ORDER BY, set operations
- Transaction control: BEGIN, COMMIT, ROLLBACK
Performance Optimizations:
- INSERT buffering: Batches multiple literal INSERT statements (up to 8,192 rows)
- SIMD acceleration: Vectorized operations via simd-r-drive
- Parallel execution: Rayon-based parallelism in table scans and aggregations
- Streaming results: Memory-efficient batch processing
- Roaring bitmaps: Efficient indexing
Concurrency:
- Snapshot isolation preventing dirty reads
- Optimistic concurrency control with conflict detection
- Session-isolated transactions
- Watermark-based visibility tracking
Sources: llkv-sql/src/sql_engine.rs186-191 (MAX_BUFFERED_INSERT_ROWS), Diagram 3 from high-level overview
Development Model
The workspace follows a modular design where each crate has a focused responsibility and minimal dependencies on peers. Shared types are defined in foundational crates:
- llkv-result - Common Result<T> and Error types used by all crates
- llkv-expr - Expression AST types shared between planning and execution
- llkv-plan - Plan structures consumed by the runtime and executor
This layering enables:
- Independent testing of each subsystem
- Clear separation of concerns
- Ability to swap storage backends via the Pager trait
- Performance optimization at individual layers
Sources: Cargo.toml9-28 (workspace dependencies), Diagram 7 from high-level overview
.png)


