So, don’t freak out, but TypeScript’s type system is advanced enough to be able to host entire programming languages inside of it, using template literal types and conditional types, among other techniques.
This has been done before, to varying degrees of insanity. Check out ts-sql and typefuck.
As this is some mind-bogglingly crazy stuff, I had to start by testing if it’s even possible for more complex languages. At first, I tried creating a full programming language similar to Lua and Python. This was mostly successful but due to the fact that it required an operator precedence parser, it got complicated and slow pretty fast.
I decided to revisit that later and pursue something with a simpler grammar first. I settled on a dialect of Assembly inspired by RISC and ARMv7.
And thus, ts-asm was born. The only thing I ever wanted was to calculate the Fibonacci sequence in TypeScript type annotations using Assembly. I can sleep peacefully now.
It’s pretty cursed.
Don’t worry if you’re not familiar with Assembly though. At its core, it’s a relatively simple language of statements that instruct the processor which primitive operation to perform next, in a linear fashion. The processor has an array of registers, which store binary numbers that the processor can do calculations on. It can also read from and write to memory. Here’s an example of a simple Assembly program:
MOV r0, #00000010 MOV r1, #00000011 ADD r2, r0, r1 SUB r2, r2, #00000001My artificial TypeScript processor will operate on 8-bit binary numbers. As a side note, attempts have been made to simulate arithmetic operations on decimal numbers by generating tuples. This is not a viable option for larger numbers though, since for any number N, it recursively creates N type instances.
Regardless, that’s the way actual processors work. It should be able to perform bitwise logic on the numbers stored in the registers.
If you’re familiar with String Literal Types and Template Literal Types, you know that TypeScript treats string constants as their own types and allows us to transform them to a surprising degree.
For instance, we can make a function that only accepts these particular strings:
type VerticalDirection = 'north' | 'south'; type HorizontalDirection = 'west' | 'east'; type Direction = VerticalDirection | HorizontalDirection; const go = (direction: Direction) => {}; go('north'); go('up');We can also generate new literal types by combining them like so:
type MixedDirection = | Direction | `${VerticalDirection}-${HorizontalDirection}`; type MixedDirection = | 'north' | 'south' | 'west' | 'east' | 'north-west' | 'north-east' | 'south-west' | 'south-east';We can then use Conditional Types to transform types based on their value.
type FirstCharacter<T> = T extends `${infer First}${string}` ? First : never; FirstCharacter<'abcdef'> type LastCharacter<T> = T extends `${string}${infer Rest}` ? Rest extends '' ? T : LastCharacter<Rest> : never; LastCharacter<'abcdef'>All that to say, we can replicate exactly what a real processor does by manipulating strings of ones and zeros.
As a proof of concept, I began prototyping a Binary Adder, i.e. a circuit that, given two input bits, adds them together and spits out the result, as well as the carry.
Adder
While a real processor performs this operation using multiple logic gates, we can instead define a table of input and output values for our adder circuit and implement it using all the aforementioned techniques.
- A and B represent our input bits.
- C represents our input carry bit. This is used so that we can chain adders to create the Full Adder. So the adder actually adds three bits.
- result represents our result output bit.
- carry represents the carry output bit, which we can pipe into another adder.
The implementation is actually quite simple, as it directly follows this table.
type Adder<A, B, C> = [A, B, C] extends ['0', '0', '0'] ? {result: '0'; carry: '0'} : [A, B, C] extends ['1', '0', '0'] | ['0', '1', '0'] | ['0', '0', '1'] ? {result: '1'; carry: '0'} : [A, B, C] extends ['1', '1', '0'] | ['1', '0', '1'] | ['0', '1', '1'] ? {result: '0'; carry: '1'} : {result: '1'; carry: '1'}; Adder<'0', '0', '0'> Adder<'1', '0', '0'> Adder<'0', '1', '0'> Adder<'0', '0', '1'> Adder<'1', '1', '0'> Adder<'1', '0', '1'> Adder<'0', '1', '1'> Adder<'1', '1', '1'>Incredible complexity can arise from a set of simple rules. We can build surprisingly sophisticated systems on top of this adder.
Full Adder
Quite easily, we can create a Full Adder circuit that adds two 8-bit numbers together. First, however, we need to create a couple of utility types.
Recall how we extracted the last character from a string literal type. This way, we can add the two last bits of our two input numbers, and then use the carry to add the two second-to-last bits of our numbers. And so on. Our type needs to be thus recursive.
type Head<T> = type Head<T> = T extends `${infer R}${string}` ? R : never; type Tail<T> = T extends `${string}${infer R extends string}` ? R extends '' ? never : R : never; Tail<'abcdef'>The algorithm for our recursive Add<A, B> type is roughly as follows:
- Given binary string inputs A and B, if A and B are single-bit numbers, simply return their sum with no input carry bit, Adder<A, B, '0'>
- Otherwise, calculate FullAdder for the Tails of A and B (ATail and BTail). Using the result and the output carry bit of that, calculate the return value:
- Add the left-most bits of A and B, as well as the output carry bit of the tail (so, Adder<Head<A>, Head<B>, TailAdded['carry']>).
- For the result, concatenate the result of the above operation with the result of the tail.
- For the carry, simply use the carry of the above operation.
Quite convoluted, I know. Or I might just be bad at explaining. Let’s just jump into the implementation.
type Add<A, B> = [Tail<A>, Tail<B>] extends [infer ATail, infer BTail] ? [ATail, BTail] extends [never, never] ? Adder<A, B, '0'> : Add<ATail, BTail> extends infer TailAdded extends {carry: any; result: any} ? Adder< Head<A>, Head<B>, TailAdded['carry'] > extends infer Current extends {carry: any; result: any} ? { result: `${Current['result']}${TailAdded['result']}`; carry: Current['carry']; } : never : never : never;Voilà! We have a Full Adder that can handle binary numbers of any length. Let’s see it in practice.
Add<'01', '01'> Add<'00100110', '00000010'> Add<'11111111', '00000001'>The last test in particular shows that we can actually cause overflow! This could be a useful thing to track with flag registers, but in the scope of this project, we won’t be adding that complexity.
The ability to add two numbers is of course an important operation that the user should be able to perform. It is also however instrumental to how our artificial processor works. For instance, we will need it to increment the Program Counter.
With that behind us, we can move on to the more practical parts of the project.
In short, we want to transform each Assembly instruction into an object describing that instruction. Conveniently, an instruction is always one line long. The object will contain the instruction type, as well as its operands, be it the register numbers, or immediate (i.e. constant) values.
First things first, let’s then define those two operand types.
type Register = | 'r0' | 'r1' | 'r2' | 'r3' | 'r4' | 'r5' | 'r6' | 'r7' | 'sp' | 'lr' | 'pc'; type Immediate = string;Probably the simplest instruction to implement is the Move instruction. It has two variants, MOV (immediate) and MOV (register), which dictate the type of the second operand.
MOV r2, #00010110 MOV r0, r1As you can see, the instruction type is clearly stated at the beginning of the line. This makes parsing extremely easy. However, typically for Assembly dialects, including my dialect, we need to do some more work to figure out the actual variant, be it MOV (immediate) or MOV (register).
To make things easier and less bug-prone down the line, let’s define the object types for those two instruction variants.
type MovImmediateInstr< Rd extends Register, imm extends Immediate > = { type: 'MovImmediate'; Rd: Rd; imm: imm; }; type MovRegisterInstr< Rd extends Register, Rm extends Register > = { type: 'MovRegister'; Rd: Rd; Rm: Rm; };The terms Rd, Rm (also Rn, Rt, Rs) come from the ARM Assembly grammar. Typically,
- Rd stands for destination register,
- Rn and Rm refer to the first and second operation operands (so second and third instruction operands),
- Rt is used for instructions dealing with memory, and
- Rs stands for shift amount register.
Onto the parsing. We will define generic types (you can think of them as functions that take types as arguments and return a type) for every instruction and instruction variant.
If we find a match, the function should return the object type, such as a MovRegisterInstr type. Otherwise, it should return never. This is a clever trick that will let us narrow down on the matching parsing rule without resorting to nested conditional types.
type ParseInstr<T> = | ParseMovImmediateInstr<T> | ParseMovRegisterInstr<T>; type ParseMovImmediateInstr<T> = T extends `MOV ${ infer Rd extends Register }, #${ infer imm extends Immediate }` ? MovImmediateInstr<Rd, imm> : never; type ParseMovRegisterInstr<T> = T extends `MOV ${ infer Rd extends Register }, ${ infer Rm extends Register }` ? MovRegisterInstr<Rd, Rm> : never;In short, we differentiate between the two instructions with the presence or absence of the # character that indicates an immediate value for the second operand.
Let’s test these bad boys out.
ParseInstr<'MOV r3, r4'> ^? { type: 'MovRegister', Rd: 'r3', Rm: 'r4' } ParseInstr<'MOV r5, #00001000'> ^? { type: 'MovImmediate', Rd: 'r5', imm: '00001000' } ParseInstr<'ADD something, bla bla'> ^? neverPerfect! Oh, by the way, if you’re wondering about that union pattern in type ParseInstr<T>, it works because for this grammar, we should only have one instruction match at all times. Thus, only one parsing type (like ParseMovRegisterInstr or maybe ParseAddImmediateInstr) will return an object and all the others will return never.
This leverages the fact that TypeScript will remove the never variants from the union, leaving us with just the generated object.
type Foo = bar | buzz | never | yeet | never; ^? bar | buzz | yeet; type Bar = never | 123 | never | never; ^? 123Parsing multiple instructions
Obviously, our program needs to be able to parse more than one instruction at a time.
ParseProgram<` MOV r1, #00001111 MOV r0, r1 `>We just need a couple of utility types to achieve that. First, let’s get rid of the surrounding whitespace.
type TrimLeft<T> = T extends `${' ' | '\n'}${infer R}` ? TrimLeft<R> : T; type TrimRight<T> = T extends `${infer R}${' ' | '\n'}` ? TrimRight<R> : T; type Trim<T> = TrimRight<TrimLeft<T>>;Then let’s split our multi-line string into an array of single lines.
type Split<T, Sep extends string> = T extends `${infer Line}${Sep}${infer Rest}` ? [Line, ...Split<Rest, Sep>] : [T]; type SplitLines<T> = Split<T, '\n'>;Finally, gluing everything together, using Mapped Types.
type ParseLine<LineString> = ParseInstr<Trim<LineString>>; type ParseProgram<ProgramString> = SplitLines<Trim<ProgramString>> extends infer R ? { [K in keyof R]: ParseLine<R[K]> } : never;Big wow, it works 🎉.
ParseProgram<` MOV r1, #00001111 MOV r0, r1 `> ^? [ { type: 'MovImmediate', Rd: 'r1', imm: '00001111', }, { type: 'MovRegister', Rd: 'r0', Rm: 'r1', }, ]Instruction indices
Our parser could very well just work for a basic assembly format where instructions are executed from top to bottom, always. However, if we want our interpreter to support the Branch instruction, we need to design it around the concept of a Program Counter, which is a special register that stores the index of the current instruction.
Since the registers are using binary numbers stored in strings, we cannot simply index into the above array. Rather, we have to add the index of each element in binary format into the element itself.
Perhaps surprisingly, we will be using the Full Adder in the parser and not just in the interpreter itself.
Since our MovImmediateInstr and MovRegisterInstr types don’t have an idx (index) field, let’s create a utility wrapper type to handle that.
type Indexed<T, idx extends string> = T & {idx: idx};Then to actually index all of our instructions, we create a recursive type that increments the index in binary format on every recursive call.
type ParseLoop< T extends any[], idx extends string = '00000000' > = T extends [ infer Current, ...infer Rest, ] ? [ Indexed<ParseLine<Current>, idx>, ...ParseLoop< Rest, Add<idx, '00000001'>['result'] >, ] : []; type ParseProgram<T> = SplitLines<Trim<T>> extends infer R extends any[] ? ParseLoop<R> : never;Now our parsed program will contain this information.
ParseProgram<` MOV r1, #00001111 MOV r0, r1 `> ^? [ { type: 'MovImmediate', Rd: 'r1', imm: '00001111', idx: '00000000', }, { type: 'MovRegister', Rd: 'r0', Rm: 'r1', idx: '00000001', }, ]Our parser is far from being done. But we should already be able to evaluate the MOV instruction with an interpreter.
Registers and memory
As you know, our processor can operate on registers and memory. To evaluate an instruction, we pass our current registers and memory to the evaluating type (similar to our parsing types). After performing the necessary calculations, the type will return the new state of the registers and memory.
Let’s define those in a type.
type Context = { registers: { r0: string; r1: string; r2: string; r3: string; r4: string; r5: string; r6: string; r7: string; sp: string; lr: string; pc: string; }; memory: Record<string, string>; };Notice that our memory is an object instead of an array. This is because our indices are strings representing binary numbers. It also saves a lot of space, as only the used memory slots will be represented.
Evaluating one instruction
EvalInstr< ParseInstr<'MOV r0, #00001111'>, StartingContext >;No need to evaluate the whole program at once for now. Let’s see if we can create an interpreter that, given a starting Context, can manipulate it. In our case, we will start with r0 = 00000000, and we want to end up with r0 = 00001111.
We will use that same union pattern we used for the parser.
type EvalInstr<Instr, Ctx extends Context> = | EvalMovImmediateInstr<Instr, Ctx> | EvalMovRegisterInstr<Instr, Ctx>;Now, for the hard part. Remember that we want to transform the Ctx so that the return value contains our immediate value in r0. This means that we will need to overwrite the register value.
There isn’t a very easy way to do that in TypeScript, since the intersection operator & would not fit here. It’s time for another utility type.
type Overwrite< A extends Record<string, any>, B extends Record<string, any>, > = Omit<A, keyof B> & B; Overwrite<{r0: '00000000'}, {r0: '00001111'}>; ^? { r0: '00001111' }Finally, the evaluating type.
type EvalMovImmediateInstr<Instr, Ctx extends Context> = Instr extends MovImmediateInstr< infer Rd, infer imm > ? { memory: Ctx['memory']; registers: Overwrite< Ctx['registers'], Record<Rd, imm> >; } : never;Are you still here? Hope this isn’t too crazy. The evaluating type for our other variant, MOV (register) will be nearly identical.
type EvalMovRegisterInstr<Instr, Ctx extends Context> = Instr extends MovRegisterInstr< infer Rd, infer Rm > ? { memory: Ctx['memory']; registers: Overwrite< Ctx['registers'], Record< Rd, Ctx['registers'][Rm] > >; } : never;Please work, please work, please work.
type StartingContext = { registers: { r0: '00000000'; }; memory: {}; }; EvalInstr< ParseInstr<'MOV r0, #00001111'>, StartingContext >; ^? { registers: { r0: '00001111'; }; memory: {}; }We did it gamers coders. Now, onto –
Evaluating the whole program
Our program is simply an array of indexed instruction objects. As I mentioned multiple times now (hope you’re paying attention), the pc Program Counter register will contain the index of the current instruction to be processed.
Our instructions should then modify that pc register so that the interpreter can process the next instruction and we don’t end up with an infinite loop. For normal instructions, it should simply increment the pc register. Special instructions, such as Branch, will instead modify the Program Counter so the processor can continue execution from a different place.
type EvalMovImmediateInstr<Instr, Ctx extends Context> = Instr extends MovImmediateInstr< infer Rd, infer imm > ? { memory: Ctx['memory']; registers: Overwrite< Overwrite< Ctx['registers'], { pc: Add< Ctx['registers']['pc'], '00000001' >['result'] } >, Record<Rd, imm> >; } : never;I’ll spare you the implementation of the EvalMovRegisterInstr, as it is again nearly identical.
This pattern of incrementing pc will be so universal that we should abstract it away with a utility type.
type IncrementPC<Ctx extends Context> = Add<Ctx['registers']['pc'], '00000001'>['result'];Now we need our interpreter to actually have a program loop and pick the right instruction in each iteration. It should halt as soon as the instruction with a given index does not exist.
TypeScript was giving me a hard time with this one, so the code is a bit hacky. But it works.
type FindInstrByIdx<Instrs, PC> = { [K in keyof Instrs as K extends number ? K : never]: FindInstrByIdxPredicate< Instrs[K], PC >; }[any]; type FindInstrByIdxPredicate<T, PC> = T extends {idx: PC} ? T : never;Bask in the glory of the pièce de résistance.
type EvalProgram< Instrs, Ctx extends Context = StartingContext > = FindInstrByIdx< Instrs, Ctx['registers']['pc'] > extends never ? Ctx : EvalProgram< Instrs, EvalInstr< FindInstrByIdx< Instrs, Ctx['registers']['pc'] >, Ctx > >; EvalProgram< ParseProgram<` MOV r0, #00001111 MOV r1, #11110000 MOV r2, r0 `> >['registers'] ^? { r0: '00001111'; r1: '11110000'; r2: '00001111'; pc: '00000011'; }API glue
Since EvalProgram<ParseProgram<... is awkward to use, let’s define a type that glues everything together. This will also prove useful after we add more compiler steps into the mix.
type InterpretProgram<T> = EvalProgram< ParseProgram<T> >We can’t do much with just the MOV instruction. We should support a plethora of arithmetic, logical and memory operations, as well as implement branching so that we can write programs with loops and conditional statements.
Addition
Let’s start small — the ADD instruction. Just like MOV, it has an immediate and register variants, which determine the type of the third and last operand.
ADD r0, r1, #00001111 ADD r0, r1, r2The majority of the boilerplate will feel quite familiar. Let’s go through the most important stuff one last time.
type AddImmediateInstr< Rd extends Register, Rn extends Register, imm extends Immediate, > = { type: 'AddImmediate'; Rd: Rd; Rn: Rn; imm: imm; }; type AddRegisterInstr< Rd extends Register, Rn extends Register, Rm extends Register, > = { type: 'AddRegister'; Rd: Rd; Rn: Rn; Rm: Rm; }; type ParseAddImmediateInstr<T> = T extends `ADD ${ infer Rd extends Register }, ${ infer Rn extends Register }, #${ infer imm extends Immediate }` ? AddImmediateInstr<Rd, Rn, imm> : never; type ParseAddRegisterInstr<T> = T extends `ADD ${ infer Rd extends Register }, ${ infer Rn extends Register }, ${ infer Rm extends Register }` ? AddRegisterInstr<Rd, Rn, Rm> : never;Here’s where it’s a bit different. For the evaluation, instead of simply putting the value into the desired register, we use the Full Adder to apply addition.
type EvalAddImmediateInstr<Instr, Ctx extends Context> = Instr extends AddImmediateInstr< infer Rd, infer Rn, infer imm > ? { memory: Ctx['memory']; registers: Overwrite< Overwrite< Ctx['registers'], {pc: IncrementPC<Ctx>} >, Record< Rd, Add< Ctx['registers'][Rn], imm >['result'] > >; } : never;As you can see, it’s not too difficult to add more functionality to our parser and interpreter. Going forward, I will only show the nitty gritty stuff and skip this boilerplate.
InterpretProgram<` MOV r0, #00000110 MOV r1, #00001010 ADD r2, r0, r1 `>['registers']; ^? { r0: '00000110'; r1: '00001010'; r2: '00010000'; }Subtraction
To subtract one binary number from another, we need to add the minuend and the number opposite to the subtrahend, using Two’s Complement.
type OnesComplement<T> = T extends '' ? '' : T extends `0${infer Rest}` ? `1${OnesComplement<Rest>}` : T extends `1${infer Rest}` ? `0${OnesComplement<Rest>}` : never; type TwosComplement<T> = Add< OnesComplement<T>, '00000001' >['result']; type Subtract<A, B> = Add<A, TwosComplement<B>>; InterpretProgram<` MOV r0, #00000110 MOV r1, #00001010 SUB r2, r1, r0 `>['registers']; ^? { r0: '00000110'; r1: '00001010'; r2: '00000100'; }Memory instructions
We want to be able to store and load values from memory addresses into registers and the other way around. Easy enough. Since our processor architecture works with 8-bit registers, we don’t have to worry about memory alignment and different instructions for saving bytes, words, etc.
Let’s have a look at the syntax.
LDR r0, [r1] LDR r0, [r1, r2] LDR r0, [r1, #00000001] STR r0, [r1] STR r0, [r1, r2] STR r0, [r1, #00000001]For the LDR instruction, we simply get the desired value from memory and place it in the register.
type EvalLdrImmediateInstr<Instr, Ctx extends Context> = Instr extends LdrImmediateInstr< infer Rt, infer Rn, infer imm > ? { memory: Ctx['memory']; registers: Overwrite< Overwrite< Ctx['registers'], {pc: IncrementPC<Ctx>} >, Record< Rt, Ctx['memory'][ Add< Ctx['registers'][Rn], imm >['result'] ] > >; } : never;For STR, it’s the opposite — get the value from the register and put it in the desired place in memory.
type EvalStrImmediateInstr<Instr, Ctx extends Context> = Instr extends StrImmediateInstr< infer Rt, infer Rn, infer imm > ? { memory: Overwrite< Ctx['memory'], Record< Add< Ctx['registers'][Rn], imm >['result'], Ctx['registers'][Rt] > >; registers: Overwrite< Ctx['registers'], {pc: IncrementPC<Ctx>} >; } : never;We can now save values for later in memory, and even start the program with an allocated data section by modifying the StartingContext.
Here’s a program that swaps the values in two registers using the memory.
InterpretProgram<` MOV r0, #11110000 MOV r1, #00001111 MOV r2, #10000000 STR r0, [r2, #00000000] STR r1, [r2, #00000001] LDR r0, [r2, #00000001] LDR r1, [r2, #00000000] `>; ^? { registers:{ r0: '00001111'; r1: '11110000'; r2: '10000000'; }; memory: { '10000000': '11110000'; '10000001': '00001111' }; }There are still more instructions to be added. There is something we need to deal with first though.
As the interpreter’s complexity kept growing, I encountered a problem I hoped wouldn’t pop up in this project. It’s that same problem that plagued me in my original project of implementing a full programming language inside of TypeScript’s type system.
Type instantiation is excessively deep and possibly infinite. ts(2589)The error would present itself for programs that had about 30 instructions, or more. This is because our program loop is a recursive type that needs to evaluate one line at a time and pass the result to the next line.
Clearly, we’ve gone too far. . No human should be able to wield such power.
I found a solution. Just don’t tell anyone. Some say the best way to fix an error is to disable it. And so, I found the code responsible for generating that error in TypeScript’s codebase.
if ( instantiationDepth === 100 || instantiationCount >= 5000000 ) {Yeaaah, let’s just bring that up to a more sane number.
if ( instantiationDepth === 100 * 10 || instantiationCount >= 5000000 * 10 ) {Whaddaya know! We can now run about 300 instructions. We can also increase this number even more, if need be.
Right, where were we?
The time has come to revisit one of the other special registers I mentioned earlier, sp — the Stack Pointer register.
The user could of course manually manage such a structure. The way to do it would basically be to keep track of where in memory the stack starts, which is a constant, and where it ends, which changes based on how many elements in the stack there are.
The need for a stack structure though is so common, that processors tend to implement it anyway (not to mention it can be used by the processor itself).
One use for them is pushing values from registers before subroutine calls, and popping them back into registers to restore their state after the subroutine returns. We will see this in practice after we implement the Branch instruction.
In our case, the stack base is at the end of our memory – at address 11111111. The Stack Pointer is initialized to this value as well.
Pushing additional elements onto the stack places them at the memory position dictated by the Stack Pointer and post-decrements the Stack Pointer.
Popping does the opposite — it pre-increments the Stack Pointer and retrieves the value from the memory position dictated by the Stack Pointer.
The syntax of the two instructions is simple.
PUSH {r0} POP {r0}The implementation is somewhat similar to the LDR and STR instructions, with the caveat of having to manage the sp register.
type EvalPushInstr<Instr, Ctx extends Context> = Instr extends PushInstr<infer Rm> ? { memory: Overwrite< Ctx['memory'], Record< Ctx['registers']['sp'], Ctx['registers'][Rm] > >; registers: Overwrite< Ctx['registers'], { pc: IncrementPC<Ctx>; sp: Sub< Ctx['registers']['sp'], '00000001' >['result']; } >; } : never; type EvalPopInstr<Instr, Ctx extends Context> = Instr extends PopInstr<infer Rm> ? { memory: Ctx['memory']; registers: Overwrite< Overwrite< Ctx['registers'], { pc: IncrementPC<Ctx> sp: Add< Ctx['registers']['sp'], '00000001' >['result']; } >, Record< Rm, Ctx['memory'][ Add< Ctx['registers']['sp'], '00000001' >['result'] ] > >; } : never;Now we can swap two numbers using the memory much more easily.
InterpretProgram<` MOV r0, #11110000 MOV r1, #00001111 PUSH {r0} PUSH {r1} POP {r0} POP {r1} `>; ^? { registers:{ r0: '00001111'; r1: '11110000'; sp: '11111111'; }; memory: { '11111111': '11110000'; '11111110': '00001111'; }; }You made it this far. No reason to stop now.
What is branching, really? To jump to a different place in the program sounds an awful lot like overwriting the pc register. And that’s just what it is. The B instruction is just syntax sugar for a MOV pc, ....
00000000: B exit 00000001: ADD r0, r1, r2 00000010: exit: 00000000: MOV pc, #00000010 00000001: ADD r0, r1, r2 00000010: exit:Corporate needs you to find the differences between these.
Schloop. The ADD instruction was not executed. exit: is a label that we can use in Branch instructions as a way to point to that address in the program.
And so, we will need a way to actually link B exit to the exit: label. This step should happen after parsing and before evaluation.
The resolver
Since B is literally syntax sugar for MOV pc, ..., our interpreter doesn’t need to know any new functionality has been added. We can simply desugar the former into the latter.
type BranchInstr = <label extends string> = { type: 'Branch'; label: label; }; type ResolveLabels<Instrs> = { [K in keyof Instrs]: Instrs[K] extends Indexed< BranchInstr<infer label>, infer idx > ? Indexed< MovImmediateInstr< 'pc', FindLabel<Instrs, label>['idx'] >, idx > : Instrs[K]; };We use the trick from before to find the label instruction. Only this time, we’re filtering by the label name and not the instruction index.
type FindLabel<Instrs, name> = { [K in keyof Instrs as K extends number ? K : never]: FindLabelPredicate< Instrs[K], name >; }[any]; type FindLabelPredicate<T, name> = T extends {name: name} ? T : never;Then, we just need to make the label a no-op in the interpreter. And by no-op, I mean that it should simply increment the Program Counter.
type EvalLabelInstr<Instr, Ctx extends Context> = Instr extends LabelInstr<string> ? { memory: Ctx['memory']; registers: Overwrite< Ctx['registers'], {pc: IncrementPC<Ctx>} >; } : never;Some final touches...
type InterpretProgram<T> = EvalProgram< ResolveLabels< ParseProgram<T> > >Cool.
InterpretProgram<` MOV r1, #00001111 MOV r2, #11110000 B exit ADD r0, r1, r2 exit: `>['registers'] ^? { r0: '00000000', r1: '00001111', r2: '11110000', }Conditional branching
Branching by itself isn’t all that useful. To write if statements, we need a way to branch only when a condition is met. Let’s add two more instructions, Compare and Branch If Zero and Compare and Branch If Not Zero. They will read the value from the specified register to determine whether or not to branch.
MOV r0, #00000000 CBZ r0, mylabel CBNZ r0, mylabelThis time, we do need to make changes to the interpreter, as there’s no easy way to desugar this into any set of other instructions. Since the two instructions are similar, I used just one object type BranchIf to denote them.
type EvalBranchIfInstr<Instr, Ctx extends Context> = Instr extends BranchIf< infer type, infer Rn, infer address > ? { memory: Ctx['memory']; registers: Overwrite< Ctx['registers'], { pc: Ctx['registers'][Rn] extends '00000000' ? type extends 'BranchIfZero' ? address : IncrementPC<Ctx> : type extends 'BranchIfZero' ? IncrementPC<Ctx> : address; } >; } : never;The possibilities are now endless. Have a look at this program that multiplies a number by 2 raised to any power — r0 * 2^r1.
InterpretProgram<` MOV r0, #00000011 MOV r1, #00000100 loop: CBZ r1, exit ADD r0, r0, r0 SUB r1, r1, #00000001 B loop exit: `>['registers'] ^? { r0: '00110000', r1: '00000000', }Branching with links
You’ve waited long enough. Say hi to the lr Link Register. Labels? More like subroutines, am I right coders?
You can imagine a makeshift way to return from a subroutine like so. The program calls a subroutine to negate a number in r0.
MOV r0, #00001101 MOV lr, pc ADD lr, lr, #00000011 B negate B exit negate: MOV r1, r0 SUB r0, r0, r1 SUB r0, r0, r1 MOV pc, lr exit:With a little bit of syntax sugar, we can make this a lot nicer.
MOV r0, #00001101 BL negate B exit negate: MOV r1, r0 SUB r0, r0, r1 SUB r0, r0, r1 BR lr exit:The desugaring and evaluation of BL and BR is trivial. I won’t be boring you with it 😉.
Once a daunting task, now a distant memory. This article doesn’t cover everything, but you can check the whole codebase for the project in the official repository, ts-asm.
On the topic of programming languages and compiler theory, you might also want to read The journey of Queso, my programming language.
Thanks for reading.