I’m building another programming language - purple-garden - this time less lispy, but still s-expr based. It’s not inherently functional, but has pure functions. I wanted to make a fast programming language that’s fun to implement and fun (for me) to program in. So far, this is the result.
Some virtual machines are stack based and implement function calls via this approach, others are using registers for all function arguments. I decided to go with a hybrid approach and use registers for function calls with a single argument and a stack for functions with n>1 arguments. This will hopefully allow me to keep the vm impl simple, optimize for the common case of a single argument and use the least amount of instructions possible.
Since some consider programming languages, and implementing one modern day magic, I’ll introduce some basic concepts while going over the architecture of purple gardens interpretation pipeline.
(@println "Hello World") or (* 3.1415 (* 1.5 1.5)) for a bytecode interpeter, would generally pass through the following stages:
- Tokenisation
- Parsing
- Compiling to Bytecode
- Executing Bytecode
For instance (* 3.1415 (* 1.5 1.5)):
1. Tokenisation
TEXT
Tokenisation refers to - as the name implies, splitting the input up into tokens with meaning. In purple garden a token is simply a tagged value container. Generally it would also hold data related to its source location, like a line and a column.
C
2. Parsing
The parser uses the tokens produced by the tokenizer to build an abstract syntax tree (AST):
TEXT
Now we know:
- the precedence of the s-expr: 1.5*1.5 needs to be executed before multiplying the result with 3.1415
- how many elements to apply which s-expr to
- what element belongs to what s-expr
3. Compiling to Bytecode
Bytecode is generally defined as a list of operators and their respective arguments encoded as bytes.
Since this vm is registerbased, the bytecode operates on the accumulator register (r0) and its operand. The compiler emits bytecode for nodes of the AST from bottom to top:
Atoms
N_ATOM[T_DOUBLE](1.5)Compiles to:
ASM
Where the argument 3 refers to the index into the global pool at which Double(1.5) is stored at.
“Binary” operators
TEXT
The atoms parts will be compiled as explained above, the only change is moving the temporary value needed for addition into another register:
ASM
The multiplication itself is done by MUL. Lhs is assumed to be r0, rhs is the argument passed to the opcode:
ASM
This means the first multiplication s-expr (* 1.5 1.5) compiles to the above and leaves the result in the accumulator register r0. Applying this to the root s-expr, we get:
ASM
4. Executing Bytecode
At a high level the bytecode virtual machine:
- indexes into bytecode array at vm->pc (program counter) for the current op code
- indexes into bytecode array at vm->pc+1 for the current argument
- execute corresponding logic for the opcode and manipulate registers or other internal state
The whole pipeline uses a two byte tuple: OP and ARG, this keeps the fetching, decoding and emitting lean + fast.
Since there may be some not obvious instruction meanings, here’s a table:
| STORE <reg> | Moves the contents of r0 to reg |
| ARGS <amount> | Instructs the vm to expect amount of arguments to calls and builtins |
| LEAVE | Leaves a scope |
| LOAD <global pool idx> | Loads a global into r0 |
| BUILTIN <hash> | Calls a builtin by the hash of its name |
| CALL <hash> | Calls a function by the hash of its name |
| VAR <reg> | Creates a variable by its name in r0 and its value in reg |
| LOADV <hash> | Loads a variable by its hash from the variable table into r0 |
| POP | Pops a value from the stack into r0 |
| PUSH | Pushes r0 to the stack |
| PUSHG <global pool idx> | Pushes a global to the stack |
PUSHG is an optimisation to merge LOAD and PUSH, which is a common pattern in function calling setup…
Since every good language needs a way to structure and reuse code, purple garden (pg) needs functions.
SCHEME
And we just call them in the
SCHEME
Once a function is defined the compiler emits bytecode for said function including its body and the setup for all params, this differs for 0, 1 and n arguments:
No argument
SCHEME
No argument, thus we can omit prep for argument passing:
ASM
Single Argument
SCHEME
A singular argument is passed by a pointer to its value being stored in r0, this makes calling functions with a single argument way faster than using the stack based method for enabling n arguments:
ASM
n Arguments
SCHEME
This is where the mix of registers and stack is used. Since the value for the last argument is always in r0. Any further arguments are stored on the stack:
TEXT
In the disassembled bytecode this results in simply one POP before comencing with the already known argument setup:
ASM
No argument
SCHEME
Since we don’t pass any arguments, the call is just the OP_CALL and bytecode index for the function definition, here 0:
ASM
Single Argument
SCHEME
As introduced before, a single argument can just be used from r0, thus the callsite loads the global and invokes the call, no stack interaction necessary:
ASM
n Arguments
SCHEME
Like above, the last argument is always in r0, thus all other arguments will need to be on the stack. This is implemented by PUSH (or PUSHG if LOAD and PUSH are emitted consecutively) at the callsite. For each argument except the last, a PUSH is required. The instruction before CALL or BUILTIN is ARGS, instructing the vm to pop the right amount of arguments of the stack.
ASM
Warning
This is a microbenchmark done on a high noise & high load laptop system, take everything here with a grain of salt, either way, here are the specs:
TEXT
TLDR: 25k function calls with builtins in each of them (~50k calls) in ~4.2ms vm runtime, which is ~15.5% of the total runtime, it also scales very well, for 250k function calls and 500k total calls with 13ms vm runtime.
Terminal
Terminal
Terminal
Terminal
.png)

