AppImage from Scratch

1 day ago 2

An AppImage is a single file that contains an entire Linux application. In most cases, it doesn’t require any particular installation – the user just executes the file, and the AppImage takes care of the rest. AppImage doesn’t automate the collection or installation of an application’s dependencies – the AppImage file is expected simply to provide them. All of them. Technologies like FlatPak and Snap are superficially similar, but these all require some management infrastructure on the computer where the application is to run. AppImage requires only the Linux desktop (sometimes not even that), and some fudamental Linux utilities.

AppImage is increasingly popular, because it’s a very simple technology for the end user. It’s not necessarily simple for the application packager, but there are tools to help with that. Although the tools are pretty well documented, I’ve not seen a lot of documentation about how the technology operates fundamentally. It’s possible that the developers of AppImage technology think it’s just too simple to document. If so, I disagree and, in this article, I set out to explain how AppImage works from first principles.

Fundamentals of the AppImage technology

We’re probably all familiar with the ‘executable installers’ and ‘self-extracting zipfiles’ that are commonplace in the Windows world. On Linux, however, applications usually come in some kind of package, that has to be unpacked and installed. The installation process is usually coordinated by a package manager, in collaboration with a software repository. The application package states what its dependencies (often libraries) are, and the package management framework attempts to resolve, obtain, and install the dependencies.

Tools like yum and apt, and the repositories with which they interact, are very good at handling complex dependency relationships. But what should we do when dependencies are irreconcilably in conflict?

It’s also a problem that somebody has to provide packages for each Linux variant, and often each version of each variant. In practice, the maintainers of specific Linux distributions often shoulder this burden, leading to a situation in which a package exists for some Linux distributions and not for others. Sometimes the only way to get the latest version of an application is to upgrade the entire Linux installation.

AppImage solves these problems by supplying a single executable that contains the entire application, along with all its dependencies, in some executable, compressed format. It’s a bit like a self-extracting zipfile, without the actual extraction – the compressed data is loaded into memory every time the user runs the application.

So how can we supply a whole application, including all its dependencies, in a single file, requiring no installation?

AppImage technology relies on three fundamental features of Linux.

First, Linux doesn’t care what an executable file contains, so long as the start of the file is executable. An AppImage has a small executable header, followed by the application’s complete set of files in a compressed format.

Second, Linux allows a file, or part of a file, to be mounted as a filesystem. This is called loopback mounting. A filesystem image is embedded in the AppImage at some known offset; at runtime, the AppImage tells Linux to mount that filesystem on some temporary directory.

Third, on modern Linux systems we can perform the loopback mount without elevated privileges. You don’t need to be root to mount the filesystem on a temporary director. The technology that handles the mounting is called FUSE – Filesystem on USerspacE.

At runtime, the AppImage header at the start of the AppImage file locates the embedded filesystem, and uses FUSE to mount it on a directory under /tmp. The header then runs a script in that directory called AppRun, which sets up and runs the application.

Building an AppImage-style application from scratch

To explain how this all works in detail, I’ll describe how to build an AppImage-style trivial application. Of course, you can build a real AppImage, using tooling designed for that purpose. But doing it from scratch is more educational. My example will, naturally, be a lot simpler than a real AppImage, but it will use exactly the same principles.

My application will be a simple shell script, that dumps a text file. The application is AppImage-like because it embeds the script and the text file in a single executable. It has an AppImage header, but mine is just a shell script: real AppImages use a statically-linked binary as the header, which does a lot more than my simple script.

I’ll show some of the source code in this article; the whole thing is available from my GitHub repository.

Compressing the filesystem

In my example, the application’s filesystem will start life as a directory called appdir/. My directory only contains two files – the script that comprises the application (run.sh) and the text file it dumps (test.txt). In a full-scale application I would lay out the source directory like a complete root filesystem, with subdirectories /usr/lib, /usr/bin, and so on.

We’ll compress this directory into a complete filesystem image. AppImages seem to use the SquashFS format; since the entire filesystem has to be loaded into memory, and will usually be read-only, I guess it makes sense to use a compressed format like this.

Turning appdir/ into a SquashFS filesystem is easy:

$ mkdir build $ mksquashfs appdir/ build/appdir.img

mksquashfs might not be installed by default; it’s typically part of a package called squashfs-tools. The utility has hundreds of command-line switches, but the defaults are fine for this simple demonstration.

appdir.img is the compressed image that mksquashfs outputs. If we wanted, we could mount this on a directory using squashfuse:

$ squashfuse build/appdir.img /tmp/some_directory

In fact, in my example, it’s the AppImage(-style) header that will run squashfuse – this has to be done when running the application, not building it.

My build process appends the SquashFS filesystem image appdir.img to a file header which, in this demo, is just a shell script. Here it is, in its entirety:

#!/bin/bash my_dir=/tmp/app_mount.$$ mkdir -p $my_dir squashfuse -o offset=NNNN $0 $my_dir export LD_LIBRARY_PATH=$my_dir/usr/lib:$my_dir/usr/lib64 $my_dir/run.sh fusermount -u $my_dir rmdir $my_dir exit

The SquashFS filesytem gets appended after the exit line, which is necessary to ensure that the shell doesn’t try to execute the filesystem data after the application has finished.

The first thing the script does to create a directory on which it will mount the SquashFS filesystem. To reduce the likelihood of different applications using the same directory, we append the process ID ($$).

Then the script mounts the filesystem whose data follows the exit line. It uses squashfuse to do this, with an offset argument. While building the application’s executable, we must change

offset=NNNN

to the actual length of the header (and thus the start of the embedded filesystem). There are many ways to do this (see build.sh for how I do it, but I don’t claim it’s optimal).

I should point out that not all Linux installations will have squashfuse by default (try apt install squashfuse or yum install squashfuse). Real AppImages don’t rely on this utility – I assume that the AppImage header replicates its functionality internally. The ‘real’ method is better, as it doesn’t rely on a Linux package that not everybody will have; but I couldn’t think of a way to replicate the behaviour of squashfuse in a shell script alone.

The script then sets the environment variable LD_LIBRARY_PATH, to tell the Linux loader where to look for shared library (.so) files. My example doesn’t actually need any such libraries, but most real application will. The AppImage builder (whether that’s a person or a software tool) will usually put .so files in usr/lib or usr/lib64, as the maintainer of a traditional package would. The Linux loader will prefer the libraries in LD_LIBRARY_PATH over the default ones in /lib, etc., but will fall back on the defaults for libraries the AppImage doesn’t provide.

Then the header runs the application – run.sh in this case. When the application completes (or is killed) the header unmounts the filesystem and deletes the temporary directory.

The application

My application is very simple: it just prints a text file.

my_dir="$(readlink -f "$(dirname "$0")")" echo Printing test.txt: cat $my_dir/test.txt

Note, though, that the application needs to work out where the text file actually is. Other that shared libraries, which are handled by setting LD_LIBRARY_PATH, the application will need to perform this computation for every file that is bundled with the application. This is potentially a significant limitation of the AppImage technology, which I’ll discuss later.

Building the AppImage

This is just basic Linux shell scripting – please see build.sh. All the build does is concatenate the AppImage(-style) header and the SquashFS filesystem, adjusting the header to indicate the offset of the filesystem in the final file.

AppImage in practice

My simple example works in the same way as a real AppImage, but it doesn’t have to manage any dependencies, and that’s where the real problems begin. In practice, I think that most AppImage maintainers use tools like linuxdeploy to handle the dependency management. This tool scans an executable, and tries to work out what libraries it depends on. It copies these libraries to a directory, which can then be used as the basis for the SquashFS filesystem.

This scanning process isn’t foolproof, particularly if the application loads libraries explicitly at runtime (so library information is absent from the application’s executable). Still, it’s a start.

Another approach to managing AppImage dependencies is to leverage a platform’s existing dependency framework. If applications are available as packages (.deb, .rpm, etc), then the package file should already contain dependency information. It should be possible for AppImage tooling to resolve the dependencies in the same way that a platform’s package manager would.

However, not all applications are easy to convert into AppImage form, even if the dependencies are clear. To work as an AppImage, the application must usually be relocatable. A relocatable application, in this context, is one that could just be unpacked into an arbitrary directory and executed there. Any application that is written to look for its own files at specific locations will need to be modified, perhaps extensively, to use locations in the mounted SquashFS filesystem.

Some of this modification can be automated, but probably not all, because some files may have be at specific filesystem locations. If an application uses configuration files in /etc/, for example, or $HOME/.config, these references shouldn’t be changed to files in the SquashFS filesystem. Apart from anything else, it’s read-only.

In practice, it takes a lot of self-discipline maintain a complex application that is completely relocatable, and most likely it’s only the authors of the application that know how to do this. Converting an existing application – particularly a large one – to be relocatable can be hugely complicated.

In this article, I demonstrated how to build an AppImage-style application, using only shell scripts and commonplace utilities. My approach is conceptually similar to real AppImage technology, but a lot simpler.

Building an AppImage-like package from scratch does highlight a lot of the limitations of the technology, particularly those related to making the application relocatable. I don’t think anybody would use my all-manual approach to package a full-scale application, but I’m sure it would work, with sufficient patience. Real AppImages are typically built with the assistance of a lot of tooling.

When we see how AppImage works at the platform level, we can appreciate how inefficient this technology can be. It’s inefficient in storage because, in practice, multiple AppImage packages are likely providing copies of exactly the same dependencies. It’s inefficient in resources, because the entire AppImage has to be loaded into memory at runtime. To be fair, it’s really mapped into virtual memory, so unused parts of the application use little to no RAM. Still, the run-time decompression of the parts that are used will use some CPU. Given how Linux caches filesystem data, that overhead is probably not significant on a modern, desktop Linux. AppImage might be unsuitable for low-resource or embedded Linux systems. For the desktop, many users will likely find the inefficient use of resources a small price to pay for the simplicity.

AppImage technology is less sophisticated than FlatPak, Snap, Docker, Podman, etc. Any sophistication has to be provided by the AppImage tooling, not the platform. For example, in lieu of a framework for keeping AppImage applications up to date, tooling can incorporate a complete auto-update mechanism into each AppImage application. While this is an interesting development, I can’t help thinking that there are better ways to do what AppImage does, for users that need that kind of automation.

It should also be clear the AppImage is not a container technology. AppImage applications are not sandboxed, or isolated from one another. Concerns about security are making the use of lightweight containers a popular way to run applications that the user doesn’t entirely trust. Of course, AppImage is no worse in this respect that the traditional method of packaging Linux applications: we still have to trust the supplier of the package.

Read Entire Article