The .a File Is a Relic: Why Static Archives Were a Bad Idea All Along

3 months ago 1

I’ll start with a confession: The C programming language is (still) my preferred language for networking/embedded projects. After 13+ years of development in C, this feeling decayed a bit, yet haven’t changed.

That said, in the last 4+ years I’ve been a core developer in a commercial C-based SDK, and oh boy, I did not expect the can of worms I had to handle in the SDK world. At some point I even “gave up” and started learning Rust, believing the C compilation ecosystem is a lost cause (but this is a tale for a different post).

In this post I’ll try to cover the expectations our customers have from a C-based SDK, why static archives (.a files) will never be able to fulfil them, and my initial suggestion for a replacement static linking option.

Revised ELF header — Addition of the “Static Bundle Object” file type.

Just a brief disclaimer before we begin. While this article dives into lessons I learned, and bugs I witnessed, during my work as an SDK software engineer, this article expresses my personal thoughts and has nothing to do with my employer. I do believe that an industry shift into using a better static linking format will benefit also my employer (and surely make my life easier), yet this is still a personal opinion piece, and it does not reflect in any way the opinion or interest of my employer.

Now that we cleared this part, let’s begin.

An SDK is a Software Development Kit. In layman’s terms, these are a set of libraries, header files, compilation definition helpers, and reference code snippets, focusing on a given software functionality. In our case, they provide a C-based programming API (Application Programming Interface) and ABI (Application Binary Interface) for customers to consume when they try to build a program that will leverage said functionality. As an example, this could be the famous, open-source, networking-oriented DPDK, or the closed-source, GPU-oriented CUDA.

An important observation here is that an SDK is more than just a set of libraries. We want our customers to be able to easily use these libraries, hence we are expected to provide as much aids as possible:

Include headers (.h files) — Without these files, no one can really compile against the provided libraries.
Pkg-config compilation definitions (.pc files) — An industry standard for providing compilation & linking related information. This “standard” deserves its own dedicated article, yet this will wait for another day.
Reference coding samples/applications — Simple C programs that showcase how to use the SDK’s API. These could be small samples that showcase a single user-scenario (sending a packet), or more robust applications that demonstrate a real-life use case (building a firewall).

Now that we covered the scope of the SDK, it is time to dive into the overlooked part of it, the libraries themselves.

For most developers, this is the voodoo part that someone else in the team already set-up. All that was left for us is to include the needed header files and write our code accordingly, hoping that the program will compile successfully.

Somewhere long ago, usually at the early stages of the project, an architect or a tech lead decided how are we going to link against the SDKs we use. Will we dynamically link against them? Or will we use them statically instead?

There are different philosophies for each approach. On one side of the spectrum we can find Red Hat’s stated dislike of this technology:

Static linking is emphatically discouraged for all Red Hat Enterprise Linux releases. Static linking causes far more problems than it solves, and should be avoided at all costs.

And on the other, we can find Google’s policy, dictating that everything used in GCP’s production environments will be statically linked. This policy is evident for everyone who ever tried to compile and use gRPC on their own.

From the perspective of an SDK provider, we must not limit our customers. As such, we are expected to provide both the dynamic linking option, as well as the static linking one. And what will this mean?

Dynamic linking — Provide Shared Object (.so) libraries, as well as matching compilation (.pc) definitions.
Static linking — Provide Static Archive (.a) files, as well as matching compilation (.pc) definitions.

When we bundle the installation of our SDK for some Linux distros, we might decide to adapt it to the expected delivery per distro. For instance, when one installs libpng on Red Hat Linux, the installation will not include a static archive (.a) file, although it will include it on an Ubuntu installation.

We will not dive into the pros and cons of each linking method, as this can fill up several operating systems lessons. Instead, the gist of it will be covered below.

The library is taken pretty much “as-is”, and mapped to the virtual address space of the program at load time.

One notable advantage is that security fixes can be provided through updated library versions, as long as the ABI was kept intact (no breaking changes). Applying this update will simply require to reboot the program that uses the library, but will not require a full-blown recompilation.

The file format used for Shared Objects (.so) is the same ELF file format as used for “regular” executables, with some slight differences at the file’s header.

The library is “swallowed” pretty much “as-is” into the built executable file of the target ELF executable.

One notable advantage is that we no longer need to bother ourselves with ensuring our dependencies are installed on the target machine. We simply bring them bundled inside our own program.

The file format used for Static Archives (.a) is actually straightforward. This is literally an archive of raw object (.o) files.

Below is an example from extracting DPDK’s librte_eal.a file into the 66 .o files that are stored within it, one of which is the eal_common_eal_common_dev.c.o file, that we will revisit later on.

$ ar x librte_eal.a
$ ls -lh eal_*.o | wc -l
66
$ file eal_common_eal_common_dev.c.o
eal_common_eal_common_dev.c.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

While the simplistic overview looks good on paper, as always, the devil is in the details.

Given that a static archive is just an archive of plain object files, how does the linker interpret it? Well, not the way you would have expected it to. A simplified way to explain the linker’s resolution process can be described as follows:

Traverse the link flags, in order, and search for a file that could satisfy said symbol.
If a shared object is found, check if this symbol is listed in the export table, and if so, mark this library as “needed” and “take it”.
If a static archive is found, extract all .o files from the archive.
Per .o file, check if the symbol is listed, and if so “take” only this .o file.
When a module is “taken”, add all the symbols it imports to the list of symbols we should resolve, and continue on until finished.

As can be seen above, the default behavior of the linker will be to break down our static archive back into a list of object files, later to pick-and-choose between them. While this has the advantage of possibly reducing the overall binary size (we only take what we need), it comes at a cost.

Who decides which function goes into which compiled object file? Well, the default behavior is that every source (.c) file is being compiled into a single object file bearing the same name.

As an example, let’s take a look on a program that encrypts logs before sending them to a log server. Most cryptographic libraries (OpenSSL is just one example), will have a .c file per cryptographic building block. This file will include all the operations (encryption and decryption) of said block (AES256 for instance). This design decision at the source level, means that in our linked binary we might not have the logic for the 3DES building block, but we would still have unused decryption functions for AES256.

At this point you might be thinking to yourselves that there is some downside but it isn’t this bad. Yes, the reduction of the binary size might not be optimal. Yes, it relies on design decisions at the source level side, and these are not exactly a top priority when designing which functions goes into which source files. So what?

Well, what will happen with more sophisticated coding scenarios? Let’s say someone in our library designed the following logging module:

The logger registers a constructor function to initialize itself, and gives it a given priority.
The logging module exposes a MACRO to define a constructor function stub per C file, that will leverage the already initialized logging module. This ctor will be assigned a matching priority to ensure correct invocation order.

In a dynamic linking scenario, all ctor functions will be taken from the matching “init list” in the ELF file, and will be dynamically added to the init list of the executable program in load time.

However, what will happen in the static linking scenario? Our single object file will “pull in” some files from the logging module when the symbols are being resolved, as needed by the logger’s ctor stub in our object file. If all the logic in the logger module will be bound to the same object file, we should be good to go. Yet, what if the logger’s ctor function is implemented in a different object file? Well, tough luck. No one requested this file, and the linker will never know it needs to link it to our static program. The result? crash at runtime.

While we can argue about the pros and cons of such a logging mechanism, this is just an example of the potential issues with constructors and destructors in both C and C++.

In most cases, the relevant software teams will say it is “too late” to change the software design, and force us to try and fix the issue purely at a linking level.

In the previous section we’ve seen that for some software design choices, the default behavior of the linker will result in an incomplete static linking of our library. Luckily for us, we also provide the users of the SDK with the compilation and linking definitions to be used with our libraries. If there will be any linker flag that can modify this default linker behavior, we can just add it to our definition files and it will seamlessly work for our users.

I wish it would have been that simple.

Linkers do expose the following pair of flags for static archives:

-Wl,--whole-archive- From this point onward, whenever you encounter a static archive, take every object file from it, regardless of any symbol resolution decision.
-Wl,--no-whole-archive- From this point onward, revert back to the default behavior when handling static archives, and take the object files only on a need-to-use basis.

On paper, this looks great. We can wrap our static archive with the above flags, and all object files from it will be taken, case closed.

Or is it? If we re-read the above, we can see there is still some remaining issue. The above states that we should:

… take every object file from it, regardless of any symbol resolution decision

This can also be rephrased as follows:

Take the entire static archive, even if no symbol from it is ever needed

And suddenly, this looks less promising.

First problematic scenario — Our SDK has 4 libraries that are bundled together under the same linking definition (.pc) file. This means that all of them will be taken, even if the user only needs one. In DPDK’s case, this means 190 libraries(!) to be taken, instead of around 10–20 that are actually needed.

$ cat /usr/lib/x86_64-linux-gnu/pkgconfig/libdpdk.pc
prefix=/usr
includedir=${prefix}/include/dpdk
libdir=${prefix}/lib/x86_64-linux-gnu

Name: DPDK
Description: The Data Plane Development Kit (DPDK).
Note that CFLAGS might contain an -march flag higher than typical baseline.
This is required for a number of static inline functions in the public headers.
Version: 23.11.0
Requires: libdpdk-libs, libbsd
Requires.private: libmlx5, libibverbs, libxdp >= 1.2.2, libbpf, zlib, jansson, libmana, libmlx4, libpcap, libcrypto, libisal, libelf
Libs.private: -Wl,--whole-archive -L${libdir} -l:librte_common_cpt.a -l:librte_common_dpaax.a -l:librte_common_iavf.a -l:librte_common_idpf.a -l:librte_common_octeontx.a -l:librte_bus_auxiliary.a -l:librte_bus_cdx.a -l:librte_bus_dpaa.a -l:librte_bus_fslmc.a -l:librte_bus_ifpga.a -l:librte_bus_pci.a -l:librte_bus_platform.a -l:librte_bus_vdev.a -l:librte_bus_vmbus.a -l:librte_common_cnxk.a -l:librte_common_mlx5.a -l:librte_common_nfp.a -l:librte_common_qat.a -l:librte_common_sfc_efx.a -l:librte_mempool_bucket.a -l:librte_mempool_cnxk.a -l:librte_mempool_dpaa.a -l:librte_mempool_dpaa2.a -l:librte_mempool_octeontx.a -l:librte_mempool_ring.a -l:librte_mempool_stack.a -l:librte_dma_cnxk.a -l:librte_dma_dpaa.a -l:librte_dma_dpaa2.a -l:librte_dma_hisilicon.a -l:librte_dma_idxd.a -l:librte_dma_ioat.a -l:librte_dma_skeleton.a -l:librte_net_af_packet.a -l:librte_net_af_xdp.a -l:librte_net_ark.a -l:librte_net_atlantic.a -l:librte_net_avp.a -l:librte_net_axgbe.a -l:librte_net_bnx2x.a -l:librte_net_bnxt.a -l:librte_net_bond.a -l:librte_net_cnxk.a -l:librte_net_cpfl.a -l:librte_net_cxgbe.a -l:librte_net_dpaa.a -l:librte_net_dpaa2.a -l:librte_net_e1000.a -l:librte_net_ena.a -l:librte_net_enetc.a -l:librte_net_enetfec.a -l:librte_net_enic.a -l:librte_net_failsafe.a -l:librte_net_fm10k.a -l:librte_net_gve.a -l:librte_net_hinic.a -l:librte_net_hns3.a -l:librte_net_i40e.a -l:librte_net_iavf.a -l:librte_net_ice.a -l:librte_net_idpf.a -l:librte_net_igc.a -l:librte_net_ionic.a -l:librte_net_ipn3ke.a -l:librte_net_ixgbe.a -l:librte_net_mana.a -l:librte_net_memif.a -l:librte_net_mlx4.a -l:librte_net_mlx5.a -l:librte_net_netvsc.a -l:librte_net_nfp.a -l:librte_net_ngbe.a -l:librte_net_null.a -l:librte_net_octeontx.a -l:librte_net_octeon_ep.a -l:librte_net_pcap.a -l:librte_net_pfe.a -l:librte_net_qede.a -l:librte_net_ring.a -l:librte_net_sfc.a -l:librte_net_softnic.a -l:librte_net_tap.a -l:librte_net_thunderx.a -l:librte_net_txgbe.a -l:librte_net_vdev_netvsc.a -l:librte_net_vhost.a -l:librte_net_virtio.a -l:librte_net_vmxnet3.a -l:librte_raw_cnxk_bphy.a -l:librte_raw_cnxk_gpio.a -l:librte_raw_dpaa2_cmdif.a -l:librte_raw_ifpga.a -l:librte_raw_ntb.a -l:librte_raw_skeleton.a -l:librte_crypto_bcmfs.a -l:librte_crypto_caam_jr.a -l:librte_crypto_ccp.a -l:librte_crypto_cnxk.a -l:librte_crypto_dpaa_sec.a -l:librte_crypto_dpaa2_sec.a -l:librte_crypto_ipsec_mb.a -l:librte_crypto_mlx5.a -l:librte_crypto_nitrox.a -l:librte_crypto_null.a -l:librte_crypto_octeontx.a -l:librte_crypto_openssl.a -l:librte_crypto_scheduler.a -l:librte_crypto_virtio.a -l:librte_compress_isal.a -l:librte_compress_mlx5.a -l:librte_compress_octeontx.a -l:librte_compress_zlib.a -l:librte_regex_mlx5.a -l:librte_regex_cn9k.a -l:librte_ml_cnxk.a -l:librte_vdpa_ifc.a -l:librte_vdpa_mlx5.a -l:librte_vdpa_nfp.a -l:librte_vdpa_sfc.a -l:librte_event_cnxk.a -l:librte_event_dlb2.a -l:librte_event_dpaa.a -l:librte_event_dpaa2.a -l:librte_event_dsw.a -l:librte_event_opdl.a -l:librte_event_skeleton.a -l:librte_event_sw.a -l:librte_event_octeontx.a -l:librte_baseband_acc.a -l:librte_baseband_fpga_5gnr_fec.a -l:librte_baseband_fpga_lte_fec.a -l:librte_baseband_la12xx.a -l:librte_baseband_null.a -l:librte_baseband_turbo_sw.a -l:librte_node.a -l:librte_graph.a -l:librte_pipeline.a -l:librte_table.a -l:librte_pdump.a -l:librte_port.a -l:librte_fib.a -l:librte_pdcp.a -l:librte_ipsec.a -l:librte_vhost.a -l:librte_stack.a -l:librte_security.a -l:librte_sched.a -l:librte_reorder.a -l:librte_rib.a -l:librte_mldev.a -l:librte_regexdev.a -l:librte_rawdev.a -l:librte_power.a -l:librte_pcapng.a -l:librte_member.a -l:librte_lpm.a -l:librte_latencystats.a -l:librte_jobstats.a -l:librte_ip_frag.a -l:librte_gso.a -l:librte_gro.a -l:librte_gpudev.a -l:librte_dispatcher.a -l:librte_eventdev.a -l:librte_efd.a -l:librte_dmadev.a -l:librte_distributor.a -l:librte_cryptodev.a -l:librte_compressdev.a -l:librte_cfgfile.a -l:librte_bpf.a -l:librte_bitratestats.a -l:librte_bbdev.a -l:librte_acl.a -l:librte_timer.a -l:librte_hash.a -l:librte_metrics.a -l:librte_cmdline.a -l:librte_pci.a -l:librte_ethdev.a -l:librte_meter.a -l:librte_net.a -l:librte_mbuf.a -l:librte_mempool.a -l:librte_rcu.a -l:librte_ring.a -l:librte_eal.a -l:librte_telemetry.a -l:librte_kvargs.a -l:librte_log.a -Wl,--no-whole-archive -Wl,--export-dynamic -lIPSec_MB -lrt -latomic
Cflags: -I${includedir}

This is a classic, real-life, example of why static linking will result in a bloated binary size, instead of the promised binary size reduction.

OK, one possible solution is to follow SPDK’s foot steps, and split the definition files of our SDK per groups of libraries so to avoid the above. In SPDK’s scenario, this resulted in more than 80 .pc files(!), which has a toll on the developers who need to decide which linking files are needed for their program.

$ ls -lh /usr/local/lib/pkgconfig/spdk_*.pc | wc -l
83

Assuming that such a split will be accepted by our users, it is still highly likely that at least some of our SDK libraries would depend on a common library. When the user’s program accumulates the link flags across all needed libraries, the end result will list this common library multiple times. This is one of the main reasons why, in my personal opinion, the .pc file format is de-facto broken.

As you might have guessed, having the same library multiple times in our link flags will cause the linker to try to link it multiple times. This will result in the all to familiar “multiple definition” error.

(.text+0x1b0): multiple definition of `rte_power_pause'; /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o):(.text+0x1b0): first defined here
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o): in function `rte_power_monitor_wakeup':
(.text+0x1e0): multiple definition of `rte_power_monitor_wakeup'; /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o):(.text+0x1e0): first defined here
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o): in function `rte_power_monitor_multi':
(.text+0x250): multiple definition of `rte_power_monitor_multi'; /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o):(.text+0x250): first defined here

The linker has no flag that says “take this entire static archive, but only if at least one symbol from it is needed”. Hell, the linker doesn’t even have a flag for “don’t take this same file twice”. The end result is that defining the linking instructions for a project that consumes several SDKs has become an endless game of whack-a-mole.

Every operating system might have slightly different linking definitions for the SDK. Every operating system comes with a different version of pkg-config, each with it’s own quirks and issues. And every version upgrade of the SDK might introduce a new library dependency, suddenly creating yet another linking conflict that should be resolved.

Sadly, most of the “fixes” for these issues will be tailor-made workarounds. I have already seen my fair share of real-life fixes that include sed regex commands added to CMake files with the goal of capturing and removing the -Wl,--whole-archive statements from all but the first occurrence of a given static library. Obviously, this is not a sustainable solution.

XKCD comic — I used regular expressions, now I have 100 problems

Let us assume that someone managed to come up with a silver bullet that addresses all of the above linking issues, maybe using proper link flags and a more robust pkg-config behavior. Sadly, this would still leave us with the broken-by-design structure of the static archives.

If we recall, a static archive is literally an archive of raw object files. A glimpse over one such object file (eal_common_eal_common_dev.c.o) as extracted from DPDK’s librte_eal.a archive, will show the following exposed functions:

0000000000000000 l F .text 00000000000000a0 class_next_dev_cmp
00000000000000a0 l F .text 00000000000000ae build_devargs
0000000000000150 l F .text 000000000000000d cmp_dev_name
0000000000000160 l F .text 0000000000000112 bus_next_dev_cmp
0000000000000280 l F .text 0000000000000077 dev_str_sane_copy
0000000000000300 g F .text 0000000000000009 rte_driver_name
0000000000000310 g F .text 0000000000000009 rte_dev_bus
0000000000000320 g F .text 0000000000000009 rte_dev_bus_info
0000000000000330 g F .text 0000000000000009 rte_dev_devargs
0000000000000340 g F .text 0000000000000009 rte_dev_driver
0000000000000350 g F .text 0000000000000009 rte_dev_name
0000000000000360 g F .text 0000000000000008 rte_dev_numa_node
0000000000000370 g F .text 000000000000000f rte_dev_is_probed
0000000000000380 g F .text 000000000000013b local_dev_probe
00000000000004c0 g F .text 0000000000000032 local_dev_remove
0000000000000500 g F .text 00000000000000c6 rte_dev_probe
00000000000005d0 g F .text 0000000000000062 rte_eal_hotplug_add
0000000000000640 g F .text 000000000000012a rte_dev_remove
0000000000000770 g F .text 0000000000000043 rte_eal_hotplug_remove
00000000000007c0 g F .text 0000000000000166 rte_dev_event_callback_register
0000000000000930 g F .text 0000000000000150 rte_dev_event_callback_unregister
0000000000000a80 g F .text 00000000000000c1 rte_dev_event_callback_process
0000000000000b50 g F .text 00000000000000ef rte_dev_iterator_init
0000000000000c40 g F .text 00000000000001a2 rte_dev_iterator_next
0000000000000df0 g F .text 000000000000009c rte_dev_dma_map
0000000000000e90 g F .text 000000000000009c rte_dev_dma_unmap

From the naming convention, and without even looking at the source file, one can already deduce that the local_dev_probe() function is probably an internal function, and not a public API function. This can easily be verified by inspecting the header file, or examining the symbols of the matching .so file:

$ nm -D --defined librte_eal.so.24.0 | grep rte_ | wc -l
414
$ nm -D --defined librte_eal.so.24.0 | grep local_dev_probe | wc -l
0

Given lack of proper scoping in the C programming language, we now have two critical issues:

Every non-static function in the SDK is suddenly a possible cause of naming conflict with all other functions in the user’s program.
Our closed-source SDK suddenly leaks the names of all internal, non-static functions.

Such a name conflict is demonstrated by the following simplified C program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <rte_dev.h>
#include <rte_eal.h>

void local_dev_probe()
{
printf("This is my project's device probe function\n");
}

int main(int argc, char **argv)
{
printf("Invoking something from EAL Dev module\n");
rte_dev_probe("");
return 0;
}

The function uses a public API function from the librte_eal.a library, and specifically a public API function from the eal_common_eal_common_dev.c.oobject file. This will cause our local_dev_probe()function to conflict with the library’s own internal function with the same name:

/usr/bin/ld: eal_common_eal_common_dev.c.o: in function `local_dev_probe':
(.text+0x380): multiple definition of `local_dev_probe'; main.o:main.c:(.text+0x0): first defined here

And if that wasn’t enough, we also have the second issue — leaking the names of all non-static C functions.

From my (extensive) background as a vulnerability researcher, I can tell you first-hand that this is a treasure trove of information for anyone trying to reverse engineer these binaries. Let’s just say that our product team wasn’t exactly thrilled to hear about this possible exposure of our internal logic.

While Shared Objects have built-in mechanisms to limit the symbol visibility, static archives are just flawed by design. As long as we treat it as a bunch of raw object files, these symbols must be exposed, otherwise the linker might miss an additional object file that should be taken into the final static program.

It is my personal belief that static archives are simply the wrong tool for handling static linking, at least at an SDK level. They are flawed by design, and the surrounding tooling for them (linker flags and pkg-config) are far from optimal.

If static linking is to be provided by C/C++ SDKs, we need a different solution. Something like a “Static Bundle Object” (.sbo) file, that will be closer to a Shared Object (.so) file, than to the existing Static Archive (.a) file.

This bundle file can enjoy the symbol visibility guarantees of a Shared Object, as it anyway bundles together all the .o files and doesn’t need to expose their internal relations.

Yes, we will lose the existing property of having a possibly reduced binary size. Yet, from my experience, this property was never given any attention when engineers design the source-level structure of their software. As such, this property was doomed to be suboptimal from the get go.

From my biased perspective as an SDK developer, sacrificing the possible reduction in binary size, in favor of a robust and production-ready linking ecosystem, is a no brainer decision. Judging by the amount of lost workdays spent on chasing down these linking issues, I’m pretty sure that our customers will agree as well.

I’m aware that this will not be an easy effort. Such a change will take years to implement and integrate. Even after it is implemented across all linkers, linking helpers (pkg-config) and compiler wrappers (Meson, CMake, etc.), these updated versions should be adopted across all Linux distros, each with their own release cadence. Only then can a project reliably transition to this proposed solution.

I had my fair share of long integration efforts in the past with both glibc (safe-linking) and a 2 year-long wait for a FreeRDP major version (part one, part two). And yet, this will be at least one order of magnitude harder, if not more. I am aware of that.

Still, I believe this change is worth it. Every long journey begins with a small step, and I intend on starting this journey today.

To be continued.

Read Entire Article