Written by Chris Zetter
Published 2025-06-16
This is about rgSQL, a test suite to help you learn about SQL and databases by building your own database server.
Why
Every day of my career as a software engineer I’ve worked with databases in some way. But I knew I had gaps in my understanding of them. Exactly how does a relational database implement a right join? Or why doesn’t SQL warn you when it evaluates NULL + 4?
Taking inspiration from projects such as Nand2Tetris and Building Git, I thought that the best way to gain a deeper understanding was to try to create my own database engine.
Test driven database development
Going from nothing to running a query that could sort, join and group data was a bit daunting. I started thinking about the smaller steps that could get me there.
I wrote test cases that covered the behaviour I wanted to implement. The cases started with simpler behaviour and gradually increased in complexity. I created a Python based test runner that could run these test cases.
One of the first test cases looks like this:
This test case runs SELECT 1; and then makes sure the output is 1. To make this, and the related cases pass you will need to start parsing statements and returning rows of values.
Later on tables are introduced which requires data to be persisted between statements:
The tests keep building on each other, getting you to evaluate expressions, join tables, group data and run aggregate functions, until you reach the last test that combines all of these ideas:
rgSQL has more than 200 test cases that are organized into 13 groups. Each group focuses on a particular aspect of SQL such as tables, expressions, joins and aggregate functions.
I chose to name the project ‘rgSQL’ as each time a test passes it goes from red to green (but I also think it’s really good).
Handling errors
There are many different kinds of errors that might happen when running SQL statements. rgSQL has tests that check that an error is returned when a statement cannot be parsed:
Or an error when the statement fails validation:
rgSQL also makes sure that the correct error is returned when references cannot be resolved and when a division by zero error occurs at runtime.
Not all SQL databases behave the same, especially when it comes to what queries pass validation and type checking. All of rgSQL tests mirror the behaviour of PostgreSQL.
Freedom to experiment
I used the tests in rgSQL to create my own database implementation.
The high-level tests gave me a lot of freedom to experiment in my implementation and helped me refactor my growing codebase as I went along.
SQL is well suited to this style of behavioral testing. SQL is an abstraction that gives databases lots of freedom to choose their low level implementation - databases don’t have to use a particular sort algorithm or store their data in a specific format.
There are similar projects that test running SQL against different databases such as sqltest and sqllogictest but these are designed to verify the behaviour of existing databases, not guide you through creating a new one.
The rgSQL test suite talks to the database under test using TCP. This means the tests can run against a database built in any programming language, as long as it can start a TCP server. Rather than use a binary protocol to communicate with the test suite, rgSQL uses human-readable JSON to make it easier to get started.
Sharing what I’ve learned
Each new set of functionality I added to my rgSQL implementation led me to new areas of computer science and database research:
- Parsing more complex statements led me to write a tokenizer and recursive descent parser.
- So I could handle joins efficiently, I investigated what algorithms other databases used and found out about sort-merge joins and hash joins.
- The need to validate statements made me write a type checker, and I learnt about type coercion in SQL and how SQLite handles types differently to databases such as PostgreSQL.
- I found out how some databases use iterators to process queries and how you can speed this up by working with batches of data, or replacing the iterator model with a JIT compiler.
These are a few examples. I got so much from using rgSQL to build my own databases that I have written a book to guide others through the same project.
You can still make use of rgSQL without the book. Just fork the repository on GitHub and follow the instructions in README.md.
The book explains the steps to building your own database server and has 30 additional ideas for extending rgSQL. It also has comparisons between rgSQL databases like SQLite, MySQL, PostgreSQL and DuckDB.
.png)


