Building a programming language using SQLite's VM – Pt 1

3 hours ago 1

SQLite is cool.

It’s the most deployed database in the world, it’s tiny, expressive and its architecture is quite different from traditional relational databases. SQLite uses a Virtual Machine (VM) called Virtual DataBase Engine (VDBE) to interpret SQL, rather than Volcano-style or vectorization. Do you know which other project uses VMs to interpret code? Lua, Java, Erlang and god forbid JavaScript.

But what if we used SQLite’s VM to build a programming language? That’s what this series is about.

Introduction

So, what’s a VM?

It’s basically an emulation of how computers work, you can have emulators for real world computer systems with full OS and driver support (like QEMU), simple CPU emulators (like this 8080 emulator) or even ✨abstract machines✨ that only exist on paper but can, through emulation, run real code.

For instance, the instructions ran by Lua are defined here, this is the instruction to add two numbers:

SQLite also defines its own instruction format, with simple instructions like Add and fancier ones like CreateBtree.

Choosing which instructions your Instruction Set should have is a non-trivial problem, it’s a balance between performance and simplicity specially considering other architecture decisions like: Stack vs Register based, dispatch strategy, whether or not to do JIT, etc.

VMs are powerful, they allow you to encode full Turing complete programs with a relatively simple structure, you just need a Program Counter (PC), an Instruction Set Architecture (ISA) and a form of storing your data/opcodes (e.g pilling them onto a stack or using virtual registers), and… you’re done.

We’ll start by building a tiny VM in Rust to keep things less abstract. Once we’ve got that foundation we’ll look at how SQLite’s VM works. but if you truly want to understand this I couldn’t recommend a better book than Crafting Interpreters – the second part :).

The DVM (Dumb Virtual Machine)

“What I cannot create I cannot understand” – Richard Feynman

We’re going to do a simple VM that just executes mathematical expressions. We can define our dumb VM and instruction format like this:

Running a program is dead simple, just execute the instruction in the position indicated by pc’s value and increment pc:

Yep, the heart of our VM is just a match inside a for loop, no magic at all. The task of our parser is to transform the language defined by our syntax in a sequence of instructions. A simple architecture for this is:

For instance, 2 + 2 results in the program:

You can check out the full implementation here

Now that you just did VMs 101, let’s take a deeper look into how SQLite’s works. Each SQL statement is transformed in a sequence of instructions, for example:

It works a bit different than our VM but the concept is the same, let’s break this step by step (pun intendent):

  1. Every program start at 0 with the Init instruction, the value in the register p2 indicates where should we go next, in this case to instruction 4;
  2. Here we define a integer constant, 1, in the register 2 r[2]. Note the code generator is smart enough to know that our lhs and rhs are the same number, so it only allocates a single constant.
  3. Goto jumps to the instruction with the index defined in p2.
  4. Add takes the value of p1 and p2 and stores the result in p3 (quite similar to what we did).
  5. Roghtly ResultRow takes the registers p1 through p1+p2-1 and builds a single row of results.
  6. Halt is equivalent to our Return, it ends the computation.

Now look what’s the program to create a new table:

I won’t go through every instruction but I think you got the idea.

Its implementation lies at vdbe.c, the equivalent of our run function is sqlite3VdbeExec, which is basically a “massive switch statement”. The main interface of SQLite is sqlite3_step, every time we call it it will perform a step in the computation inside the VM, until it reaches an error or ends the result. Cool right?

Now that we’re SQLite experts we can start hacking on it. What I want here is to detach the VM of SQL and be able to write programs like:

Excited? Me too. Stay tune to the second part where we’ll start to do it!

Read Entire Article