I Made Vim into an Portable Executable

2 hours ago 2
  • Replace vimrc with your own.

  • Copy your ~/.vim (extensions, etc.) inside vimfiles.

  • Run make.

  • vim.com is created.

What is an Actually Portable Executable?

Actually Portable Executable is a binary format by Justine Tunney that makes it possible to run a single executable on a variety of operating systems and architectures, including Windows, macOS and Linux on both arm64 and x86-64.

The APE format is used by Cosmopolitan, an implementation of the C standard library that reconfigures gcc and clang to output APE binaries and packs in an astonishing bag of tricks.

I am often connected to remote machines with different operating systems, and as an avid Vim user I'd like to bring my configuration with me. This project is part personal curiosity and part scratching my own itch.

The purpose of this README is to serve as a "post-hoc lab notebook" of sorts. I have no previous experience with the Cosmopolitan ecosystem, so maybe most of this story, dear reader, might seem rather basic to you. I am writing this in the hope that it can be somehow useful, perhaps to one of my future selfs. At the very least I'm a noob you can laugh at, that should have some entertainment value.

I'd like to acknowledge the work done by the developers at superconfigure, a repository of software built with Cosmopolitan. This work was developed independently, but I did occasionally "compare notes" with their work.

As suggested by the Cosmopolitan documentation, the easiest way to start is to download a binary build of the toolchain, available here. The compiler and associated tools are ridiculously easy to use, especially considering the black magic we're about to summon. If you have ever used gcc or clang you'll be right at home.

wget https://cosmo.zip/pub/cosmocc/cosmocc.zip unzip cosmocc.zip -d cosmocc

At the Savior's command and formed by divine teaching, we dare write inside hello.c:

#include <stdio.h> int main(void) { printf("hello, world\n"); }

Issuing the obvious command just works:

/cosmocc/bin/cosmocc hello.c -o hello

Unlike a normal compiler, cosmocc produced three output executables:

$ ls -l hello* -rwxr-xr-x. 1 edoardo edoardo 409294 Apr 4 15:30 hello -rwx--x--x. 1 edoardo edoardo 689878 Apr 4 15:30 hello.aarch64.elf -rw-r--r--. 1 edoardo edoardo 69 Apr 4 15:03 hello.c -rwx--x--x. 1 edoardo edoardo 1056430 Apr 4 15:30 hello.com.dbg

This is because cosmocc is actually running a bunch of stuff under the hood. We can get more information about that setting the BUILDLOG environment variable:

BUILDLOG=log; /cosmocc/bin/cosmocc hello.c -o hello cat log (cd /home/edoardo/tutorial; cosmocc/bin/x86_64-linux-cosmo-gcc -o/tmp/fatcosmocc.ekwu71tq5t2z2.o -D__COSMOPOLITAN__ -D__COSMOCC__ -D__FATCOSMOCC__ -include libc/integral/normalize.inc -fportcosmo -fno-semantic-interposition -fno-optimize-sibling-calls -mno-omit-leaf-frame-pointer -fno-schedule-insns2 -Wno-implicit-int -mno-tls-direct-seg-refs -fpatchable-function-entry=18,16 -fno-inline-functions-called-once -DFTRACE -DSYSDEBUG -fno-pie -nostdinc -isystem cosmocc/bin/../include -c hello.c -fno-omit-frame-pointer) (cd /home/edoardo/tutorial; cosmocc/bin/aarch64-linux-cosmo-gcc -o/tmp/fatcosmocc.5zju32xc4e3l1.o -D__COSMOPOLITAN__ -D__COSMOCC__ -D__FATCOSMOCC__ -include libc/integral/normalize.inc -fportcosmo -fno-semantic-interposition -fno-optimize-sibling-calls -mno-omit-leaf-frame-pointer -fno-schedule-insns2 -Wno-implicit-int -ffixed-x18 -ffixed-x28 -fpatchable-function-entry=7,6 -fno-inline-functions-called-once -DFTRACE -DSYSDEBUG -fno-pie -nostdinc -isystem cosmocc/bin/../include -fsigned-char -c hello.c -fno-omit-frame-pointer) (cd /home/edoardo/tutorial; cosmocc/bin/fixupobj /tmp/fatcosmocc.ekwu71tq5t2z2.o) (cd /home/edoardo/tutorial; cosmocc/bin/fixupobj /tmp/fatcosmocc.5zju32xc4e3l1.o) (cd /home/edoardo/tutorial; cosmocc/bin/x86_64-linux-cosmo-gcc -o/tmp/fatcosmocc.vokhorjqgmm52.com.dbg cosmocc/bin/../x86_64-linux-cosmo/lib/ape.o cosmocc/bin/../x86_64-linux-cosmo/lib/crt.o -static -nostdlib -no-pie -fuse-ld=bfd -Wl,-z,noexecstack -Wl,-z,norelro -Wl,--gc-sections -Lcosmocc/bin/../x86_64-linux-cosmo/lib -Lcosmocc/bin/../x86_64-linux-cosmo/lib -Wl,-T,cosmocc/bin/../x86_64-linux-cosmo/lib/ape.lds -Wl,-z,common-page-size=4096 -Wl,-z,max-page-size=16384 /tmp/fatcosmocc.ekwu71tq5t2z2.o -lcosmo) (cd /home/edoardo/tutorial; cosmocc/bin/aarch64-linux-cosmo-gcc -o/tmp/fatcosmocc.7mbrfufn709n3.aarch64.elf cosmocc/bin/../aarch64-linux-cosmo/lib/crt.o -static -nostdlib -no-pie -fuse-ld=bfd -Wl,-z,noexecstack -Wl,-z,norelro -Wl,--gc-sections -Lcosmocc/bin/../aarch64-linux-cosmo/lib -Lcosmocc/bin/../aarch64-linux-cosmo/lib -Wl,-T,cosmocc/bin/../aarch64-linux-cosmo/lib/aarch64.lds -Wl,-z,common-page-size=16384 -Wl,-z,max-page-size=16384 /tmp/fatcosmocc.5zju32xc4e3l1.o -lcosmo) (cd /home/edoardo/tutorial; cosmocc/bin/fixupobj /tmp/fatcosmocc.7mbrfufn709n3.aarch64.elf) (cd /home/edoardo/tutorial; cosmocc/bin/fixupobj /tmp/fatcosmocc.vokhorjqgmm52.com.dbg) (cd /home/edoardo/tutorial; cosmocc/bin/apelink -V -1 -l cosmocc/bin/ape-x86_64.elf -l cosmocc/bin/ape-aarch64.elf -M cosmocc/bin/ape-m1.c -o hello /tmp/fatcosmocc.vokhorjqgmm52.com.dbg /tmp/fatcosmocc.7mbrfufn709n3.aarch64.elf) (cd /home/edoardo/tutorial; cosmocc/bin/pecheck hello)

Well, that's a mouthful. But reading it carefully illustrates what's happening. The first two lines are the compilation of the translation unit. The compiler is invoked twice, the first time to generate x86 object code, the second time to generate ARM object code. The two object files are given random names and put into a temporary directory.

With each of those files as input, fixupobj is invoked. It optimizes the code and modifies the layout so that the final compiled object can double as a zip archive to put assets inside. This is yet another trick up Cosmopolitan's sleeve, as we'll see later.

Then, the object files are linked with the libc and the C runtime. Nothing unusual, but Cosmopolitan has to tell the underlying compiler to use static linking and not to add its own builtin implementations. This results in ELF binaries. The two binaries are finally merged into their final APE form with apelink, and finally a PE signature is generated.

Black magic indeed.

Vis, roboris, robori, vim, vi, vis.

Vi is a venerable text editor. It has been implemented many times in many different variants, and ported to a staggering amount of operating systems and architectures. The most common variation you'll find in the wild is vim, which is the Vi clone of choice in most Linux distributions and macOS. Other popular clones are neovim, a new project that bundles the Lua interpreter, a new terminal emulator and a LSP client among other things, and the traditional vi clone that comes with OpenBSD. There are many more: openvi, elvis, vile, nvi, and even Busybox embeds a tiny vi. The last one holds a special place in my heart: it was the first vi I ever used, and the editor I wrote my first "hello world" on. The focus of this project is Vim, not any of the other clones.

For a program I have used so much, I am ashamed to admit I had never built it myself, let alone took a look at the source code. Getting the vim source code is easier than you might expect: it's on github, a mere git clone away. The core project is written in C, and the code is rather readable and not too hard to hack on (foreshadowing is a powerful narrative technique). The only downside is that the years and the many platforms supported by the project have taken their toll, and as a result some sections have become a jungle of #ifdefs.

Vim is a GNU Autotools project, which should be easy to build with the Cosmopolitan machinery. The official documentation suggests to set the CC environment variable and a few related ones (CXX for the C++ compiler if needed, which Cosmopolitan also provides; INSTALL and AR), and then run the configure script with --prefix.

The last part is critical. Cosmopolitan uses an ABI that is not compatible with the rest of the libraries on your operating system (or indeed any operating system!) Looking back at the log output of our smoke test, we already see a clear indication of this in the name of the compiler itself:

For all intents and purposes, we should treat that as a target triplet different from whatever our machine is using. An immediate consequence is that we can't rely on our OS package manager (dnf, apt, pacman, etc.) for libraries. If we wanted to write a program against, say, the Postgres library, we would ordinarily install it with a command like:

sudo dnf install libpq-devel

After our package manager is done, the net result will be that libpq.so is put in a designated library directory, the required headers in the include directory, and we can go our merry way linking our program with the shared object.

But in our case this will never work. Our package manager will give us correctly compiled object files for our machine's target triplet, which doesn't mix and match with Cosmopolitan. Whatever we need, we will have to build from scratch.

So, after cloning the source, we try to use autotools:

mkdir -p opt/include mkdir -p opt/lib export CC="$HOME/tutorial/cosmocc/bin/cosmocc -I$HOME/tutorial/opt/include -L$HOME/tutorial/opt/lib" export CXX="$HOME/tutorial/cosmocc/bin/cosmoc++ -I$HOME/tutorial/opt/include -L$HOME/tutorial/opt/lib" export INSTALL="$HOME/tutorial/cosmocc/bin/cosmoinstall" export AR="$HOME/tutorial/cosmocc/bin/cosmoar" cd vim/src ./configure --prefix=$HOME/tutorial/opt

A flurry of activity fills the terminal, as the configure script makes its sanity checks. A familiar sight appears: an error message.

... ... ... no terminal library found checking for tgetent()... configure: error: NOT FOUND! You need to install a terminal library; for example ncurses. On Linux that would be the libncurses-dev package. Or specify the name of the library with --with-tlib.

Vim depends on a "terminal library", a piece of software that knows the capabilities of the terminal it's running on (colors, ANSI support, etc.) and can correctly render the screen. The configure script suggests that ncurses is one such library, and it makes the normally sensible suggestion of installing it from the package manager. As we discussed before, this simple solution is unfortunately unavailable to us.

Our hero goes on a sidequest

Ncurses, unlike Vim itself, doesn't seem to be developed on a "forge" platform like Github or Gitlab, but being a very popular piece of software its source code is mirrored everywhere, and it's easy enough to obtain a copy of the most recent stable release, for example from the GNU FTP server.

It is an Autotools project as well, which means we can recycle the same strategy as above to try and configure it. The only caveat is that we need to specify --disable-lib-suffixes as an option to the configure script, or otherwise the library will be compiled as ncursesw. Other than that, there are no problems, and the configure script terminates successfully:

... ... ... ** Configuration summary for NCURSES 6.5 20240427: extended funcs: yes xterm terminfo: xterm-new bin directory: /home/edoardo/tutorial/opt/bin lib directory: /home/edoardo/tutorial/opt/lib include directory: /home/edoardo/tutorial/opt/include/ncursesw man directory: /home/edoardo/tutorial/opt/share/man terminfo directory: /home/edoardo/tutorial/opt/share/terminfo ** Include-directory is not in a standard location

Great! We can go on to compile and install the library with the usual make -j && make install, there won't be any problems whatsoev...

../ncurses/curses.priv.h:1941:28: error: implicit declaration of function 'wcwidth' [-Wimplicit-function-declaration] 1941 | #define _nc_wacs_width(ch) wcwidth(ch) |

For all those not conversant in ancient bullshit, that error message is a fascinating piece of history. Before ISO C99, undeclared functions could be assumed by the compiler to have return type int and unspecified parameters. We did not specify a standard in our compilation options, therefore that program could technically compile. Fortunately, the compiler declines to maliciously comply and gracefully informs us of the problem: wcwidth is undeclared.

A quick glimpse at the manual reveals the problem: wcwdith is POSIX but not ISO C. We have two options: compile without wide character support, or provide an implementation of the missing function. Since Cosmopolitan does include wcwidth, it's easy enough to patch the code to make it compile, we just add the appropriate header at the top of curses.priv.h:

#ifdef __COSMOPOLITAN__ #include "libc/str/unicode.h" #endif

We run it back, and apart from a few reminders that const in C is actually a lie:

../ncurses/./tinfo/lib_baudrate.c:105:27: warning: not sure if modding const structs is good 105 | static struct speed const speeds[] = |

the process completes without too many issues. The library is now compiled and installed, and we can have another go at trying to build Vim.

Running back the configure script now works, and we can finally attempt to build our program again. The fan spins up as electricity fills the room. Eventually we are returned to the prompt. It's moving! It's alive! Now I know what it feels like to be God!

Inside opt/bin the lifeless mass of code has been imbued with the spark of life and transmuted into a vim executable, the +x bit already set. But trying to run it has a quite disappointing effect:

bash: ./vim: cannot execute binary file: Exec format error

Looking inside the source directory we see the raw ELF files which seem to run just fine. What happened? The mystery lies in the install target of the generated Makefile:

installvimbin: $(VIMTARGET) $(DESTDIR)$(exec_prefix) $(DEST_BIN) -if test -f $(DEST_BIN)/$(VIMTARGET); then \ mv -f $(DEST_BIN)/$(VIMTARGET) $(DEST_BIN)/$(VIMNAME).rm; \ rm -f $(DEST_BIN)/$(VIMNAME).rm; \ fi $(INSTALL_PROG) $(VIMTARGET) $(DEST_BIN) $(STRIP) $(DEST_BIN)/$(VIMTARGET) chmod $(BINMOD) $(DEST_BIN)/$(VIMTARGET)

The penultimate step of the build process, just before making the binary executable, invokes strip. Cosmopolitan builds an Actually Portable Executable, a format that strip doesn't know; running it regardless will break the executable. We can disable this by replacing strip with "do nothing":

It works! We finally have our very own Vim!

Are we actually portable yet?

The next step is what we're all here for: we move the executable to other machines, running different operating systems, and see what happens. It runs. The very same executable, compiled on an x86 Linux machine, can successfully run on x86 Windows, aarch64 Linux, even on an ARM Mac under macOS!

As technically impressive as this is, the end product still seems to have a few problems. For one, colors are missing, and attempting to set them with :color default results in an ominous error message. Many features are broken, for example, trying to open a directory with :e . simply doesn't work, while normally a nice TUI menu shows up.

This suggests there's a problem with the Vim runtime. Vim isn't just the executable, but also a set of runtime files that normally live under /usr/share/vim in a Unix system. When you install vim through a package manager, those files are magically dropped in place and seamlessly referenced by the main program. But on a different machine they might not exist! Let's investigate by typing :version:

... fall-back for $VIM: "/home/edoardo/tutorial/opt/share/vim" ...

Indeed, a hardcoded string. The problem here is twofold: we need to find a way to ship the runtime with the executable, and have the $VIM variable point to the right place.

Cosmopolitan gives us a massive hand. Believe it or not, APE binaries are also zip files, their contents accessible under the /zip path from the executable. This means that simply "updating" the executable as though it were a regular zip file:

cd opt cp bin/vim vim.com zip -qur vim.com share/vim share/terminfo

has the effect of bundling those directories with the executable! Running the program again and trying to open some of those files from within Vim confirms that they are indeed accessible.

We still need to find a way to replace the hardcoded string. The solution is a bit janky but hey, it works. Back when I was a kid, in a misguided attempt to "understand programs", I frequently tried to open binary executables with a plain text editor. Obviously most of the file was line noise, but I noticed some parts were plain text strings that I could modify. A bit of trial and error revealed that this worked as long as the replacement string did not exceed the previous one in length, violating this "rule" would cause the program to act weirdly and crash at random. At the time I had no way of knowing, but there is a pretty straightforward reason for this, which involves even more ancient bullshit.

Once upon a time, there was a computer called PDP-11. It was originally released in 1970, 55 years ago. The way this computer represented strings was as a series of ASCII-encoded characters terminated by a byte set to zero. This is a dreadful way to represent a string, it's inefficient and ripe with security problems. So, why do we care about factoids on old pieces of junk? Because the PDP line is where C was born. So obviously, to this day, that's how strings are represented inside and ELF file.

When you define a string like this:

#include <stdio.h> int main(void) { const char* s = "hello, world"; printf("%s\n", s); }

The string is put into a section of the ELF file called .rodata. Compiling the above program and extracting the segment with objdump -s -j .rodata a.out shows the bare bytes:

a.out: file format elf64-x86-64 Contents of section .rodata: 402000 01000200 00000000 00000000 00000000 ................ 402010 68656c6c 6f2c2077 6f726c64 00 hello, world.

If someone were to replace those bytes at that exact location with something else, as long as the content is still a null-terminated string, the program wouldn't even notice.

So, let's do just that. It's easy enough to hack this together in bash with regular expressions (see replace.sh for the actual script).

We end up with a patched executable where the $VIM variable finally points to the right path:

fall-back for $VIM: "/zip/share/vim"

Bundling plugins and .vimrc

A reason why Vim has such a large following is its ability to be easily customized via plugins. Normally, you would put your plugins inside ~/.vim and your configuration in a file named ~/.vimrc.

Vim will also automatically load anything under $VIM/vimfiles as well, and will use $VIM/vimrc (note the missing dot) as initial configuration.

You normally don't want to do this, using your own home directory is safer and easier, but for the sake of practicality the zip trick we saw above can be used to bundle entire plugins!

It's sufficient to dump your ~/.vim folder under opt/vim/vimfiles, copy your ~/.vimrc as opt/vim/vimrc and update the zip again with the same command as above:

cd opt cp bin/vim vim.com zip -qur vim.com share/vim share/terminfo

This repository has a fresh copy of the popular ALE plugin as a git submodule, the color scheme and the vimrc I normally use. That's obviously just an example of what can be done. Replace as desired.

What OS still gives problems, and why is it Windows?

On most operating systems, the above procedure results in a perfectly working and perfectly portable Vim distribution, contained in a single file. On Windows, there are a few annoyances that unfortunately require manually patching the code. They are mostly related to the way Vim handles the external shell.

A handy feature of modern versions of Vim is quickly spawning a terminal: it can be easily done with :term, :vert term or tab term. This results in a new buffer that contains the shell and can be interacted with by means of the usual ctrl-W commands.

By default, on Windows this is broken. If Vim is launched as a console application from the Windows GUI, :term fails outright, offering no explanation other than error code 0xc0000142. Both DuckDuckGo and Google print pages of smoking crap if we try to search it, but Claude Sonnet suggests it refers to STATUS_DLL_INIT_FAILED, which could be caused by a variety of conditions, including "corrupted DLL files, incompatible applications, insufficient permissions, antivirus software interference, outdated or incompatible drivers".

If on the other hand Vim is launched from within a cmd.exe or Powershell session, the terminal buffer is created correctly, but won't echo typed characters and will ignore pressing return, only registering it when emulated with ctrl-J.

Reading the code, it turns out that the terminal handling code for Unix-like systems and Windows is effectively entirely separate, with the Windows code further split between the older "winpty" terminal interface and the newer "conpty". Cosmopolitan compiles the code as though the platform were a normal Unix, the Windows codepath would require compiling against the Win32 API, which Cosmopolitan does not implement. As a result there is no easy fix for this problem. With a bit of trial and error it turns out that running the command shell (cmd.exe, powershell.exe, etc.) inside conhost.exe solves the problem. The downside of this hack is that this increases the likelihood that our executable is flagged as malware. Obviously, conhost.exe is a core component of Windows and not, contrary to the folk legend, in and of itself malware, but liberally spawning processes under conhost.exe is exactly what a virus would do!

The code that sets the default shell is inside option.c, under the function set_init_default_shell(). The code already has a compile-time ifdef depending on the platform. A simple pattern to introduce our fix is this:

  • Compiling with Cosmopolitan sets the __COSMOPOLITAN__ preprocessor variable. We introduce a new compile-time choice using #ifdef __COSMOPOLITAN__.

  • We then detect at runtime whether we are running under Windows. Cosmopolitan provides a very handy IsWindows() macro to figure that out.

An additional problem emerges with filters. An important feature of vi clones is the ability to "filter" text through an arbitrary shell command, which is done by applying :!command to the current selection. A popular application of this feature is having Vim double as a hex editor with :!xxd.

This is broken under Windows, with Vim complaining about a "file not found" type error. Investigating this problem is a good way to showcase another incredible feature of Cosmopolitan: every compiled program includes strace! Simply running the executable with --strace causes the program to print out every system call it performs via kprintf.

set KPRINTF_LOG=log C:\vim.com --strace

There is, predictably, a lot of output. Searching for an execve should help us figure out something:

... "/c/windows/system32/cmd.exe", {"/c/windows/system32/cmd.exe", "-c", "(xxd) < /tmp/vnvcw59/0 > /tmp/vnvcw59/1", NULL}, {"=C:=C:\\", "ALLUSERSPROFILE=/C/ProgramData", "APPDATA=/C/Users/edo/AppData/Roaming", "COMMONPROGRAMFILES=/C/Program Files/Common Files", "COMMONPROGRAMFILES(X86)=/C/Program Files (x86)/Common Files", "COMMONPROGRAMW=/C/Program Files/Common Files", "COMPUTERNAME=DESKTOP-1A9HCN4", "COMSPEC=/...) ...

There's the problem. Vim implements its filter feature with shell redirecting, in this way:

  • First, Vim determines the temporary directory, and opens a file with a random name inside it.

  • Second, the input for the filter command is written inside that file.

  • Third, Vim constructs a shell command that runs the program the user specified and redirects the output to another temporary file.

  • Finally, Vim reads from the output file.

This won't work for two reasons. First of all, because the command uses Unix-like arguments, cmd.exe should be invoked with /c rather than -c. Additionally, /tmp doesn't exist under Windows; we need a way to generate a temporary path that works portably.

Indeed, looking around a bit we see the failing system calls that cause our observed error:

fstatat(AT_FDCWD, "/tmp/vnvcw59/0", [n/a], 0) → -1 ENOENT ... fstatat(AT_FDCWD, "/tmp/vnvcw59/1", [n/a], 0) → -1 ENOENT

The first issue (improper flag) is very easily fixed. Grep reveals that the flag to add is set in optiondefs.h. We can recycle the "ifdef" pattern we used before to decide at runtime whether to add -c or /c.

Finding a temporary path that works portably is a bit more difficult. Luckily Cosmopolitan comes to our help once again: it bundles a handy __get_tmpdir() that does just that. There are still a few details that need to be fixed, mainly the fact that Cosmopolitan internally represents Windows paths with a Unix-like notation:

But Windows requires paths to start with the drive letter like this:

so we'll have to convert between the two. Still, this is resolved with a fairly compact set of code changes, impacting very few files.

Since there are changes to the upstream code, the easiest way to keep track of them and future-proof the project is to fork the official Vim repository. As mentioned before, the changes are relatively localized, always guarded by #ifdef __COSMOPOLITAN__ and not very obtrusive. They live on the vimape_master branch.

If for whatever reason you'd rather use the original unmodified source, that's an option too: simply use the master branch instead.

macOS Screenshot Windows Screenshot Linux Screenshot

Read Entire Article