I vibe coded a database engine from scratch

2 hours ago 1

Relevant source files

This document introduces the LLKV database system, explaining its purpose, architecture, and core components. LLKV is a SQL database engine built in Rust that provides ACID transactions with MVCC snapshot isolation, columnar storage, and comprehensive SQL support.

For detailed information about specific subsystems:

Architecture and component interactions: see Architecture
SQL interface and query processing: see SQL Interface
Transaction management: see Transaction Management
Testing infrastructure: see Testing Infrastructure

Purpose and Scope

LLKV is a relational database system that executes SQL queries against columnar data structures. The system is organized as a Cargo workspace containing 15 specialized crates, each responsible for a distinct layer of the database stack—from SQL parsing and query planning through execution and storage.

The codebase emphasizes modularity, performance optimization through SIMD acceleration, and correctness validation via extensive SQL Logic Testing. The architecture follows a layered design where higher-level crates depend on lower-level abstractions, enabling independent testing and optimization of each subsystem.

Sources: Cargo.toml1-81 llkv-sql/src/sql_engine.rs141-185

System Architecture

The following diagram illustrates the major subsystems and their dependencies:

Crate Responsibilities:

CratePurpose

llkv	Root crate aggregating all subsystems
llkv-sql	SQL parsing, preprocessing, and SqlEngine API
llkv-plan	Logical query plan structures (SelectPlan, InsertPlan, etc.)
llkv-expr	Expression AST types (Expr, ScalarExpr, Literal)
llkv-runtime	Transaction orchestration (RuntimeEngine, RuntimeSession)
llkv-executor	Query plan execution and result streaming
llkv-table	Table abstraction, catalog management, schema handling
llkv-column-map	Columnar storage implementation
llkv-storage	Storage pager interface (Pager, BoxedPager, MemPager)
llkv-transaction	MVCC transaction management
llkv-aggregate	Aggregate function evaluation
llkv-join	Join operation execution
llkv-result	Common error types (Error, Result)
llkv-test-utils	Testing utilities and tracing initialization
llkv-slt-tester	SQL Logic Test harness

Sources: Cargo.toml61-80 Diagram 1 from high-level overview

Core Components

SqlEngine - Entry Point

The SqlEngine struct serves as the primary interface for executing SQL statements. It wraps a RuntimeEngine and handles SQL preprocessing, parsing, and statement routing.

Key methods:

SqlEngine::new() - Create engine with storage pager
SqlEngine::execute() - Execute one or more SQL statements
SqlEngine::sql() - Execute single SELECT and return batches

Sources: llkv-sql/src/sql_engine.rs272-314 llkv-sql/src/sql_engine.rs685-760

RuntimeEngine - Transaction Coordinator

The RuntimeEngine manages transaction sessions and coordinates between the execution layer and storage. Each engine maintains a RuntimeContext (shared catalog and storage) and a RuntimeSession (per-connection transaction state).

Sources: Diagram 4 from high-level overview, llkv-sql/src/sql_engine.rs55-60

Query Execution Pipeline

The execution pipeline transforms SQL text into Arrow RecordBatch results through multiple stages:

Plan Types:

SelectPlan - Query with projections, filters, joins, aggregates
InsertPlan - Insert rows (literal values or SELECT source)
UpdatePlan - Update rows with assignments and filter
DeletePlan - Delete rows matching filter
CreateTablePlan - Create table with column specifications

Sources: llkv-plan/src/plans.rs1-1230 Diagram 2 from high-level overview

Data Representation

LLKV uses Apache Arrow for columnar data throughout the system:

The llkv-column-map crate implements columnar storage that maps Arrow schemas to storage blobs via the Pager interface. All query results flow through RecordBatch structures, ensuring consistent data representation across subsystem boundaries.

Sources: Cargo.lock100-117 (Arrow dependencies), Diagram 3 from high-level overview

Transaction Model

LLKV implements Multi-Version Concurrency Control (MVCC) with snapshot isolation:

ComponentRole

TxnIdManager	Generates monotonic transaction IDs and tracks watermarks
TransactionSnapshot	Captures txn_id + snapshot_id for visibility checks
MVCC Tags	Each row has created_by and deleted_by transaction IDs
Row Visibility Filter	Compares MVCC tags against snapshot to determine visibility

Transactions maintain two execution contexts:

Base Context - Access to committed tables with MVCC filtering
Staging Context - Isolated in-memory storage for new tables created in the transaction

Sources: Diagram 4 from high-level overview, Cargo.toml25 (llkv-transaction dependency)

Storage Architecture

The storage layer provides pluggable persistence through the Pager trait:

The simd-r-drive crate provides SIMD-optimized storage operations. EntryHandle represents persistent blob references that the ColumnStore uses to serialize and deserialize Arrow arrays.

Sources: Cargo.toml26-27 (simd-r-drive dependencies), llkv-sql/src/sql_engine.rs34-35

Testing Infrastructure

LLKV includes comprehensive testing at multiple levels:

Test TypeFrameworkCoverage

Unit Tests	cargo test	Per-crate functionality
Integration Tests	cargo test	Cross-crate interactions
SQL Logic Tests	llkv-slt-tester	Thousands of SQL test cases
Benchmarks	Criterion + CodSpeed	Performance regression tracking

The llkv-slt-tester crate implements a SQL Logic Test runner that executes .test files containing SQL statements and expected results. The harness supports test preprocessing (loop expansion, conditional filtering) and integrates with CI/CD for continuous validation.

Sources: Cargo.toml20 (llkv-slt-tester), Diagram 5 from high-level overview

Key Features

SQL Capabilities:

DDL: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, CREATE VIEW
DML: INSERT, UPDATE, DELETE, TRUNCATE, SELECT
Queries: Joins, subqueries, aggregates, GROUP BY, HAVING, ORDER BY, set operations
Transaction control: BEGIN, COMMIT, ROLLBACK

Performance Optimizations:

INSERT buffering: Batches multiple literal INSERT statements (up to 8,192 rows)
SIMD acceleration: Vectorized operations via simd-r-drive
Parallel execution: Rayon-based parallelism in table scans and aggregations
Streaming results: Memory-efficient batch processing
Roaring bitmaps: Efficient indexing

Concurrency:

Snapshot isolation preventing dirty reads
Optimistic concurrency control with conflict detection
Session-isolated transactions
Watermark-based visibility tracking

Sources: llkv-sql/src/sql_engine.rs186-191 (MAX_BUFFERED_INSERT_ROWS), Diagram 3 from high-level overview

Development Model

The workspace follows a modular design where each crate has a focused responsibility and minimal dependencies on peers. Shared types are defined in foundational crates:

llkv-result - Common Result<T> and Error types used by all crates
llkv-expr - Expression AST types shared between planning and execution
llkv-plan - Plan structures consumed by the runtime and executor

This layering enables:

Independent testing of each subsystem
Clear separation of concerns
Ability to swap storage backends via the Pager trait
Performance optimization at individual layers

Sources: Cargo.toml9-28 (workspace dependencies), Diagram 7 from high-level overview

Read Entire Article