How I stopped worrying and started loving the Assembly

5 hours ago 1

Or why robots are playing DOOM now

Who am I

Hello, my name is Jonas Eschenburg. I feel like my path was pre-determined ever since my dad brought home this wonderful computer back in 1986. The computer was an Atari ST, originally intended for serious tasks like writing letters and articles. I soon found out I could also have a lot of fun with it. I was six years old back then and what started with playing games like Megaroids ended up with me wanting to learn programming in order to create my own games. By the age of ten I was creating simple games and demos in GfA BASIC.

Later in my life I studied computer science and pursued a career as a software developer. At KUKA, I help create system software for industrial robots. As part of a large company it isn’t always easy to keep the spirit as my influence on the products is obviously limited. At some point, I missed the feeling of actually having the world at my fingertips. It was time to go back to the roots.

What this is about

This is an experience report of a voyage into the land of retro programming. If I’m doing it right, I will instill a certain sensation in you, the fascination that kept me going and that made me write this article. An article about how to write software for the Atari ST, a computer system launched in 1985.

The Atari ST was part of the second major wave of home computers. The first wave, based on 8 bit CPUs, brought icons such as the Commodore C64 and the Sinclair Spectrum into the living rooms. The second wave was based on 16 bit CPUs and gave birth to icons such as the Apple Macintosh, the Commodore Amiga and, of course, the Atari ST.

Press enter or click to view image in full size

Atari ST 1040 STF with color monitor Atari SC1224 as released in 1985.
By © Bill Bertram, 2006 — Own work, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=500910

A typical Atari ST (like the 1040 STFM) would run on the Motorola 68000 CPU clocked at 8 MHz. It would sport 1 MB of RAM and load software from 3½ inch floppy disks. Its graphics capabilities would comprise a monochrome “high resolution” mode with 640x400 pixels on screen and a 16 color mode in 320x200 that turned out most popular for games.

The Atari ST’s GEM desktop featured an interface very similar to what Xerox and Apple had pioneered, with a file manager, desktop icons, drop-down windows and overlapping windows. Here the monochrome high-resolution version.

If you don’t happen to own an actual Atari ST, you can simply emulate one on your modern computer. In fact, emulators have been around since the 90s. A good one is Hatari which is available for most modern platforms. Using an emulator also gives you access to modern development tools. Whereas back in the days you had to use clunky development software loaded from floppy, nowadays you can just do everything in VSCode and use a modern GCC compiler. You have access to modern image manipulation tools like GIMP and you can even ask the AI of your choice for help.

Hello World

Did I convince you? So let’s take our first steps in the shiny world of retro computing. Let’s open a text editor and write the famous Hello World (hello.c):

#include <stdio.h>

int main() {
printf("Hello, world!\n");
return 0;
}

In order to compile, we need a C compiler. Luckily, there has been an ongoing effort to maintain an up-to-date GNU toolchain. You can download precompiled binaries for all major platforms (including the Atari ST itself, but that’s not what I’d recommend) on Thorsten Otto’s Crossmint site. You need GCC and binutils to compile hello.c:

m68k-atari-mint-gcc hello.c -o hello.prg

And there you have it: the iconic Hello World compiled for a 16 bit CPU.

Press enter or click to view image in full size

HELLO.PRG running on the Atari ST. Due to the program terminating right after startup, the message is only visible for a fraction of a second.

What makes programming for the Atari ST so special?

One of the things I appreciate most about programming for the Atari ST is its simplicity. The Motorola 68000 CPU, often labeled as a 16-bit processor, actually offers much more. It features a set of sixteen 32-bit registers and supports flat 32-bit addressing and arithmetic operations, albeit with slightly reduced performance compared to 16-bit operations. This stands in stark contrast to the PC platform of the time, which was limited by the Intel 8086’s small set of 16-bit registers and a segmented memory model that made programming far more complex. In comparison, the M68K assembly language is quite elegant and user-friendly.

The system lacks virtual memory and offers no memory protection. Most of the hardware is accessible through memory-mapped I/O, allowing any program to directly interact with the registers of hardware components. This direct access becomes particularly valuable when working with graphics or sound. Additionally, there is no dedicated video memory. While an API exists for hardware-independent graphics operations, developers can also write directly to video memory if they choose.

Another notable aspect is the near absence of a strict separation between user space and kernel space. For instance, it’s possible to install interrupt service routines directly from within a user-space program. This level of control and openness makes the ST a uniquely accessible and rewarding platform for low-level programming.

All that said, what really distinguishes programming for such a system from modern software development is of course not the system itself, but the radically reduced scope of viable projects. No web-based frontend, no HTTP, no client-server interaction, no JSON, no multithreaded async I/O backend. Just a computer and all its hardware, for the programmer to abuse.

A VoxelSpace adventure

My ambitions started with a game I played on my friend’s PC for the first time most likely around 1993. Comanche: Maximum Overkill was an action-filled military helicopter flight simulator that earned its place in history mostly for one aspect. Whereas previous flight simulators rendered the landscape and all surroundings in a very abstract way using flat-shaded polygons, Comanche introduced visuals unseen so far. Suddenly there were rolling hills, narrow canyons and steep mountains and meandering rivers to smoothly fly over. All due to an algorithm called VoxelSpace.

Press enter or click to view image in full size

Screenshot of Comanche: Maximum Overkill

Comanche quickly earned a reputation as a system seller for high-end PCs (which at the time most likely featured an Intel 486 processor and between 4 and 8 MB of RAM). The time of humble systems like the Atari ST or its more advanced contender, the Commodore Amiga, was over and would never come back. Comanche would never run on these machines and nothing would ever come close.

One day in 2024 I had the crazy idea to make VoxelSpace run on the Atari ST. That was a terrible idea and those are sometimes the best.

I found terrain and texture data of the original Comanche in an easily usable format (Sebastian Macke’s excellent GitHub page). The data still needed a bit of fiddling in order to be usable. I created my own tooling based on the GIMP image manipulation program and its LISP-based scripting language “Script-Fu”. Using the Crossmint GCC compiler I created a hacky little demo program for the ST that loaded the terrain data into memory and brought it to the screen. This is what it looked like:

Picture of first voxel-st version

Of course it was slow but it did look nice. I even posted a screen capture on Twitter and received very positive responses. I kept iterating and optimizing and finally ended up with something that ran at interactive speeds. Had it been a game released in 1990 it would have blown a lot of minds.

A later version of voxel-st with a minimap and a lighted instrument panel.

Graphics on the Atari ST

But how did I actually get there? How do you even draw to the screen on an Atari ST? Turns out you just have to write data into the part of RAM that is used as video RAM. Easy, right? In fact, it appears easy, but as with many things in retro computing, there’s a twist. Time for some 16-bit computing folklore (technical content ahead!).

Remember the Atari ST has a resolution of 320x200 pixels in 16 colors? In order to distinguish 16 colors, you need 4 bits, right? This is a little bit unfortunate as the smallest addressable unit in a computer is the byte, which is famously composed of 8 bits. So there are two sane ways to map pixels to bytes:

Use only the lower four bits of every byte and leave the upper four bits unused (therefore wasting four bits of information per pixel but keeping a nice 1:1 pixel-to-byte mapping)
Stuff two pixels into one byte. This doesn’t waste any memory but comes at the expense of generating an additional overhead when drawing single pixels.

Unfortunately, the hardware designers at Atari went for option 3, which is an utterly insane choice: they used interleaved bit-planes.

Press enter or click to view image in full size

The Atari ST’s video memory is laid out in an interleaved bitplane arrangement.

To understand a bit-plane, you decompose pixels into their bits and group the corresponding bits of neighboring pixels together. For example, take bit 0 from each of pixel 0, pixel 1, pixel 2, and so on. There are four bits per pixel, hence there are four distinct maps of bits (this is actually where the term “bitmap” comes from). Each map represents one bit position across all pixels. It helps to visualize these maps layered on top of each other, hence the term “bit-planes”.

To make matters more complicated, Atari’s designers came up with the idea of interleaving the bit-planes, making the graphics hardware simpler to implement. The interleaving works as following: Every group of 16 horizontally neighboring pixels are encoded into 8 bytes in the following way:

Bytes 0 and 1 contain the 16 bits of bitplane 0
Bytes 2 and 3 contain the 16 bits of bitplane 1
Bytes 4 and 5 contain the 16 bits of bitplane 2
Bytes 6 and 7 contain the 16 bits of bitplane 3

So every 8 bytes encode a span of 16 pixels. A screen line is 320 pixels wide so it takes 160 bytes to encode. There are 200 screen lines so a full screen occupies exactly 160*200 = 32000 bytes. Why group 16 pixels and not 8? That’s because the ST is a 16-bit machine and has a data bus that is 16 bits wide, which results in a general preference of 16-bit words over 8-bit bytes. However, this makes programming graphics on this platform a bit of a challenge.

Enter the assembly

When programming for a retro system such as the Atari ST with its (ridiculously slow by today’s standards) 8MHz processor, efficiency is immensely important. While the GNU C Compiler outputs reasonably optimized code, it is very beneficial to keep a close eye on the actual assembly generated. I already knew some 68000 assembly when I started the VoxelSpace project, but I learned a lot more by analyzing what the compiler generated and refining the original C code to guide the compiler toward better output. Reading assembly code was difficult at first but, as with all things, it became easier with time.

When you study Computer Science, you are made aware of the dangers of premature optimization. You learn to focus on algorithmic improvement first, yielding so-called asymptotic complexity, expressed in the famous big-O notation. An algorithm that is said to have O(N) is better than one that has O(N²). However, the VoxelSpace algorithm is pretty much resistant to asymptotic analysis. It’s already bounded by O(N), where N is the number of samples to take from the terrain in each frame, and that’s a fixed number you can choose to suit your needs.

Now is the time for assembly-level optimization. There are basically two ways to do that:

By inspecting the assembly code generated by the compiler. When compiling with option “-fverbose-asm”, you will get assembly code with a comment per line that explains which C variables a particular register maps to.
By embedding assembly code into your C code. This is most easily done using GCC’s inline assembly feature.

Press enter or click to view image in full size

Using GCC’s inline assembly syntax can be challenging but unlocks the full potential of your target CPU.

During the course of a couple of weeks I managed to get at least an order of magnitude out of my VoxelSpace demo. Through experimentation and research I learned many optimization tricks. Here are just a few:

Concentrate on your inner loops. Profile as good as you can. Analyze the generated assembly code. Don’t just trust the compiler!
Simplify arithmetics. Avoid multiplication and division at all costs.
Use look-up tables. Memory access is relatively fast on old CPUs and caches were only introduced in later CPUs (that’s very different today).
Avoid bit-shift operations. The original Motorola 68000 used an unoptimized version where shifting by larger amounts was actually more expensive than shifting by just a few positions. Shift-left by one or two bits can be replaced with addition.
Pre-shift data that otherwise needs to be bit-shifted in tight loops.
Economize on registers. Using fewer registers avoids having to spill registers to the stack.
Use poor-man’s SIMD: Use 32-bit registers to perform two 16-bit additions at once. This comes in very handy if you do any kind of vector arithmetic and can live with the reduced precision.
Use 16-bit integers. While the Motorola 68000 CPU is perfectly able to perform arithmetics on 32-bit registers, working with the 16-bit variants of operations such as “add”, “asl” or “eor” can yield a few cycles in ultra-tight loops.

Modern Tooling

One might be inclined to think that programming for a retro system is a backward experience, using outdated tools and having a high barrier of entry. Thanks to the efforts of many great people, this is not at all the case. In fact, some of the possibilities are more advanced than what software developers on current systems experience daily.

It starts with the Integrated Development Environment. The recommendation is to use Visual Studio Code and Microsoft’s C/C++ extension which can be easily configured to use a cross-compiler.

The retrocomputing community has put a lot of effort into maintaining a branch of the GNU Compiler Collection (gcc). The ports are part of FreeMiNT, a larger project aiming to provide a UNIX-like operating system for Atari computers. However, the generated program files also work on plain TOS, the Atari ST’s original operating system. I recommend using the builds provided by Thorsten Otto on his m68k-atari-mint cross-tools page. It has very up-to-date versions of GCC (at the time of this writing, the most recent version was gcc 15.2.0). You also need a build of GNU binutils which contain an assembler and a linker.

Emulators are the tool of choice for actually running software written for the Atari ST. Of course you can use original hardware as well, but emulators make your life so much easier, especially for developing and debugging. Great emulators of the Atari ST platform have been available since the 90s. The one I use most is Hatari which is available for Windows, Mac OS and Linux.

The main advantage of using an emulator for development purposes is that you can interrupt it at any time, enter debugging mode, and introspect the state of the emulated machine. You can disassemble the code currently being executed. You can step-by-step execute through everything the CPU does, including interrupt service routines.You can measure performance at a granularity way beyond what actual hardware is able to provide even on modern computers. Built-in profiling will even tell you how much time is spent on executing individual instructions.

Press enter or click to view image in full size

Screenshot of a profiling/debugging session in Hatari. While console-based, the introspection capabilities you get are second to none.

Doomed from the beginning

I kind of dug my own grave when I asked publicly on X whether DOOM had ever been ported to the Atari ST platform. This didn’t really come out of the blue — I’d been toying with the idea for a while. Why not apply the knowledge I had gained from my VoxelSpace demo and port a game to the Atari ST platform, a game that had not previously been ported?

The requirements for such a project were:

The game would have to be from an era before 3D accelerators became commonplace and before high-resolution graphics became standard.
A memory requirement of no more than 4MB of RAM.
Full source code would have to be available, mostly written in C or C++.
Game assets would need to be available for free, legally.

DOOM, released in 1993 by id software, ticked all the boxes. The game was originally written to run on IBM-compatible PCs, with 256-color VGA graphics at a resolution of 320x200. Notorious for bringing consumer PCs to its limits, which at the time had Intel 80486-class CPUs running at 25MHz or more, porting DOOM to a humble 1980s home computer would need a lot of hubris. It was clear that major concessions would have to be made but that was part of the fun.

Source code for DOOM has famously been made available in 1999 under the terms of the GNU Public License (GPL), so everybody is free to tinker with it provided they also release their work under the same license.

After downloading the DOOM source code from GitHub, I opened Visual Studio Code and adapted the Makefile to work with my cross-compiler. Running make, to my surprise, resulted in successfully compiling most of the files right away. The places where the compiler did complain were, unsurprisingly, those with deeper ties into the operating system:

i_video.c: This is where the screen is set up. Since the Open Source release of DOOM is based on linuxdoom, it uses the API of the X Windowing System for this purpose. This is also where mouse and keyboard events are sourced from.
i_sound.c: This is where a sound device is opened, using Linux’s Open Sound System, for playback of sound effects.
i_net.c: This is where UDP-based multiplayer networking is implemented.

It was relatively easy to replace all of these with dummy code that would compile but do nothing. Starting DOOM then resulted in a number of runtime errors.

Using the debugging features of the Hatari emulator I was able to identify the exact parts of the source code that needed fixing. One was the function for measuring time (the function used could be replaced by a native counter that is incremented 200 times per second by TOS). Another was an instance of memory access that wasn’t properly word-aligned, a restriction of the Motorola 68000 CPU. There was also a place where memory was reserved en bloc for DOOM’s internal memory allocation routine and that memory had to be reduced from 6 MB to under 4 MB to fit into the ST’s limited RAM.

As you can imagine, running DOOM without any kind of graphics is not very spectacular, so I quickly began implementing that part. Luckily, DOOM renders all its graphics into an internal buffer of 64000 bytes storing 320x200 pixels. In this buffer, every byte corresponds to one pixel. The color of that pixel is determined by a palette of 256 colors each mapped to a distinct combination of red, green and blue components. The challenge was thus to convert DOOM’s internal video buffer to the Atari ST’s video memory which, as explained above, consists of 32000 bytes organized in a very different fashion.

I chose to ignore the subject of color for now and went with a grayscale rendition. It took me one night of dedicated hacking to get something on screen.

Press enter or click to view image in full size

The first version of STDOOM, still running in grayscale.

I proudly posted about it on X and made a small wave which motivated me to strive for more. First I wanted to move beyond the grayscale rendition and have proper colors on screen. The idea was to pick 16 colors from DOOM’s 256-color palette, then approximate the remaining 240 using color mixing. A technique called Bayer Dithering creates the illusion of rich color by blending a limited set of colors in a regular pattern. A few nights of hacking later I had a working prototype up and running.

Press enter or click to view image in full size

STDOOM finally running in 16 colors.

Press enter or click to view image in full size

By mixing a chosen set of 16 colors (column on the right), STDOOM creates the illusion of a palette of 256 colors.

For a short time, STDOOM (a name I had foolishly chosen without giving any second thought) became an internet celebrity. Articles appeared on Tom’s Hardware and Hackaday, and various Youtubers featured the unexpected novelty. Given that by now I had put relatively little effort into this port, the feedback was a great motivation to continue.

Article on Tom’s Hardware featuring the Atari ST port of DOOM.

During the next weeks STDOOM grew most of the features the users expected. Proper keyboard and mouse handling, lots and lots of speed improvements, support for sound effects and music. I took a certain pride in eventually supporting all of the ST’s native color resolutions, including the famous monochrome one, making this one of the platform’s most versatile games. Support for slightly faster platforms such as the Atari ST’s short lived successor, the Atari TT, made sure that DOOM could actually be played at acceptable framerates, and for lower-end Ataris I provided modes with half and quarter resolution.

DOOM in 256, 16, 4 and 2 colors.

The only benchmark I ever trusted

Back in the day, people used DOOM as a benchmark. A special feature allowed the game to run in a non-interactive mode, playing a so-called demo file. Such a file contained mouse and keyboard input recorded during a previous playthrough. DOOM would replay the demo as fast as possible, and upon reaching the end, it would print out the total time taken.

After DOOM’s source code was released, enthusiasts expanded on the benchmark idea by creating a version of DOOM without any graphics, sound, or interactive gameplay. This stripped-down version became known as HeadlessDoom.

HeadlessDoom takes a well-known playthrough known as “DOOM Done Quick” that completes not just one, but 32 levels in under 20 minutes when played in real-time. This requires a total of 56111 frames to be rendered — entirely in RAM, so the graphics never appear on screen. Modern computers complete this benchmark in a few seconds, which I find pretty impressive.

HeadlessDoom is now also used at KUKA for assessing the performance of robot controllers. For me personally, it has become my go-to benchmark for single-threaded, integer-only compute loads.

HeadlessDoom playing 32 levels of DOOM in less than four seconds on my laptop.

What I learned

While working on Voxel-ST and STDOOM, I spent more than a few nights optimizing assembly code or completing new features. That’s the kind of motivation that is hard to come by in everyday work. Not only did this enthusiasm give me a lot of joy, I also learned a few things. Here’s my takeaway:

Legacy Development Can Still Feel Modern
Working on old systems isn’t just nostalgia. It teaches modern tooling and solid engineering under strict constraints. Take the IDE of your choice, use the latest compiler and profit from state-of-the-art profiling and introspection features. Granted, not everything feels modern, but some things are really next-level.
Assembly Teaches Hardware Limits
Always wanted to learn coding in Assembly language? Fortunately, the Motorola 68000’s instruction set is exceptionally easy to learn. You’ll be optimizing tight loops in no time and gain a clear sense of what the hardware can do and what its limits are.
Real-Time For Real
Tasks like audio processing can’t wait. Learning interrupts and writing custom handlers is essential and you’ll get instantaneous audible feedback (aka horrible noise) if something goes wrong.
The Community Is Strong
Legacy platforms still have passionate, active fan bases that keep the scene alive. You’ll receive a lot of encouragement and your work is recognized as a piece of art.
Don’t Be Afraid To Ask
There are places like the Atari-Forum where more than a few professionals are eager to provide help, hints and ideas. You’ll learn new skills and meet a few scene legends along the way.

Resources

STDOOM on GitHub: Source code and binary download.
Voxel-ST on GitHub: Source code and binary download.
Atari-Forum: A great place to ask for help and connect with the community.
Thorsten Otto‘s compilation of GCC and other tools for developing software for the Atari ST.
Sebastian Macke’s tutorial of the VoxelSpace algorithm. Also hosts the original maps of the Comanche: Maximum Overkill flight simulator.
Hatari, a cross-platform Atari ST emulator.
The documentation for TOS (the Atari ST’s OS).
The Atari Profibuch, a digital version of a legendary German compendium, available for free download.
Headless DOOM on GitHub: DOOM as a benchmark.
Happy 40th Birthday, Atari ST! Interactive 3D web demo featuring STDOOM.