So, confession, I love playing on the computer, but I’m terrible at SQL. I know that it’s Good or whatever and there’s this whole very sensible looking Relational Algebra behind it all, but like when I have to interact with it, I get the job done and then immediately wipe whatever I learned from my brain.
Enough is enough. It’s time to learn it properly. And what better way than to write an ANSI SQL database from scratch!
Disclaimers etc
Setting some expectations, this is very much a “fun” learning project, and we’re gonna get a lot of things wrong. We’ll be using Rust, because I write that a lot atm and so it’s what my brain thinks in, but I would definitely not read this hoping for a Rust tutorial because I am almost as bad at Rust as SQL. I am going to describe things in a very hand-wavy way that will probably annoy people that know what they’re talking about, but it’s my blog and I can do what I want.
So, let’s do it?
OK. So what we’re going to do in part 1 is:
- Make (steal) a SQL parser
- Do a table scan (From)
- Filter results from that table (Filter)
- Choose the fields we want to look at (Project)
We’re going to be concentrating on the Query Engine part of the database initially,mostly because I’m more interested in it, so our tables are going to be static JSON files taken from the chinook dataset. We are using this because it’s full of rock albums and it’s nice to be reminder that Led Zeppelin are a totally sick band from time to time.
Our types
We might not be implementing in Haskell but it’s spirit lives on - let’s start by defining the types of our query.
A parser
This could quickly become a parsing tutorial and we don’t want that, so we’re going to use the sqlparser crate. It takes a string input and returns either it’s own AST or an error. We’ll pattern match on this and extract only the things we support into a Query type we defined above.
Nothing about this is very interesting, so I will just link to it. Know that we parse some SQL and make the types above.
Our run_query function
Now we’ve worked out what the user wants, we need to run the query. Initially we’ll do this by matching on the Query type.
We’ll fill these todo! out one by one now.
Query::From
The first thing we’ll implement is a simple table scan. A table scan is “get all of the rows in the table”. If you’re thinking “that doesn’t sound wildly performant”, rest assured your Software Craftsperson spidey-sense is still working correctly. However, our tables only have ~300 items in them, so for now we’ll live with it until we start thinking about indexes.
Here is some code. Forgive me, Padre.
Let’s smash that into our run_query function:
It is not good code, but it is code. We’d test it, but it would still fail because of the other todo!. Oh well. Onwards.
Query::Filter
Call me a staunch traditionalist, but often when I am accessing a database I do not wish to download all of it’s data at once. We’re going to allow users to filter data using a where clause, which lets us define properties about rows we are interested in.
Let’s recap on our Expr type:
This is pretty limited, but it does let us express select * from Album where album_id = 1.
We start by defining a function for deciding whether we care about a row. It takes a row (which we store as a serde_json::Value) and an Expr, returning a bool telling us to keep the row or throw it in the bin.
Let’s use it in our run_query function:
We’d test a query, but it’d still fail. But nearly!
Query::Project
So far we return every single field from our table scan, so every select is a select * from .... We can do better than that, let’s implement Project, which is how we extract fields from rows. Eventually, we’ll allowing renaming things with aliases, but that’s quite boring and fiddly, so for now we’re just supporting stuff like select Title, ArtistId from Album.
Then we add it to run_query, completing it for now.
Bringing it together
We then add a basic CLI using clap, that takes a single argument --sql. Nothing surprising or interesting here, sorry.
This means we can run a query with cargo run --bin cli -- --sql 'select Title from Album where AlbumId = 48' | jq and look what we get:
Not bad, not bad at all.
What’s next?
If you’ve made it this far without being furious about my use of unwrap(), then in part two we’re going to add some JOINS.
Make sense? If not, get in touch!