Load-time relocation of shared libraries (2011)

4 months ago 5

This article's aim is to explain how a modern operating system makes it possible to use shared libraries with load-time relocation. It focuses on the Linux OS running on 32-bit x86, but the general principles apply to other OSes and CPUs as well.

Note that shared libraries have many names - shared libraries, shared objects, dynamic shared objects (DSOs), dynamically linked libraries (DLLs - if you're coming from a Windows background). For the sake of consistency, I will try to just use the name "shared library" throughout this article.

Loading executables

Linux, similarly to other OSes with virtual memory support, loads executables to a fixed memory address. If we examine the ELF header of some random executable, we'll see an Entry point address:

$ readelf -h /usr/bin/uptime ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 [...] some header fields Entry point address: 0x8048470 [...] some header fields

This is placed by the linker to tell the OS where to start executing the executable's code [1]. And indeed if we then load the executable with GDB and examine the address 0x8048470, we'll see the first instructions of the executable's .text segment there.

What this means is that the linker, when linking the executable, can fully resolve all internal symbol references (to functions and data) to fixed and final locations. The linker does some relocations of its own [2], but eventually the output it produces contains no additional relocations.

Or does it? Note that I emphasized the word internal in the previous paragraph. As long as the executable needs no shared libraries [3], it needs no relocations. But if it does use shared libraries (as do the vast majority of Linux applications), symbols taken from these shared libraries need to be relocated, because of how shared libraries are loaded.

Load-time relocation in action

To see the load-time relocation in action, I will use our shared library from a simple driver executable. When running this executable, the OS will load the shared library and relocate it appropriately.

Curiously, due to the address space layout randomization feature which is enabled in Linux, relocation is relatively difficult to follow, because every time I run the executable, the libmlreloc.so shared library gets placed in a different virtual memory address [9].

This is a rather weak deterrent, however. There is a way to make sense in it all. But first, let's talk about the segments our shared library consists of:

$ readelf --segments libmlreloc.so Elf file type is DYN (Shared object file) Entry point 0x3b0 There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x004e8 0x004e8 R E 0x1000 LOAD 0x000f04 0x00001f04 0x00001f04 0x0010c 0x00114 RW 0x1000 DYNAMIC 0x000f18 0x00001f18 0x00001f18 0x000d0 0x000d0 RW 0x4 NOTE 0x0000f4 0x000000f4 0x000000f4 0x00024 0x00024 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 GNU_RELRO 0x000f04 0x00001f04 0x00001f04 0x000fc 0x000fc R 0x1 Section to Segment mapping: Segment Sections... 00 .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .eh_frame 01 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 02 .dynamic 03 .note.gnu.build-id 04 05 .ctors .dtors .jcr .dynamic .got

To follow the myglob symbol, we're interested in the second segment listed here. Note a couple of things:

In the section to segment mapping in the bottom, segment 01 is said to contain the .data section, which is the home of myglob
The VirtAddr column specifies that the second segment starts at 0x1f04 and has size 0x10c, meaning that it extends until 0x2010 and thus contains myglob which is at 0x200C.

Now let's use a nice tool Linux gives us to examine the load-time linking process - the dl_iterate_phdr function, which allows an application to inquire at runtime which shared libraries it has loaded, and more importantly - take a peek at their program headers.

So I'm going to write the following code into driver.c:

#define _GNU_SOURCE #include <link.h> #include <stdlib.h> #include <stdio.h> static int header_handler(struct dl_phdr_info* info, size_t size, void* data) { printf("name=%s (%d segments) address=%p\n", info->dlpi_name, info->dlpi_phnum, (void*)info->dlpi_addr); for (int j = 0; j < info->dlpi_phnum; j++) { printf("\t\t header %2d: address=%10p\n", j, (void*) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr)); printf("\t\t\t type=%u, flags=0x%X\n", info->dlpi_phdr[j].p_type, info->dlpi_phdr[j].p_flags); } printf("\n"); return 0; } extern int ml_func(int, int); int main(int argc, const char* argv[]) { dl_iterate_phdr(header_handler, NULL); int t = ml_func(argc, argc); return t; }

header_handler implements the callback for dl_iterate_phdr. It will get called for all libraries and report their names and load addresses, along with all their segments. It also invokes ml_func, which is taken from the libmlreloc.so shared library.

To compile and link this driver with our shared library, run:

gcc -g -c driver.c -o driver.o gcc -o driver driver.o -L. -lmlreloc

Running the driver stand-alone we get the information, but for each run the addresses are different. So what I'm going to do is run it under gdb [10], see what it says, and then use gdb to further query the process's memory space:

Since driver reports all the libraries it loads (even implicitly, like libc or the dynamic loader itself), the output is lengthy and I will just focus on the report about libmlreloc.so. Note that the 6 segments are the same segments reported by readelf, but this time relocated into their final memory locations.

Let's do some math. The output says libmlreloc.so was placed in virtual address 0x12e000. We're interested in the second segment, which as we've seen in readelf is at ofset 0x1f04. Indeed, we see in the output it was loaded to address 0x12ff04. And since myglob is at offset 0x200c in the file, we'd expect it to now be at address 0x13000c.

So, let's ask GDB:

(gdb) p &myglob $1 = (int *) 0x13000c

Excellent! But what about the code of ml_func which refers to myglob? Let's ask GDB again:

(gdb) set disassembly-flavor intel (gdb) disas ml_func Dump of assembler code for function ml_func: 0x0012e46c <+0>: push ebp 0x0012e46d <+1>: mov ebp,esp 0x0012e46f <+3>: mov eax,ds:0x13000c 0x0012e474 <+8>: add eax,DWORD PTR [ebp+0x8] 0x0012e477 <+11>: mov ds:0x13000c,eax 0x0012e47c <+16>: mov eax,ds:0x13000c 0x0012e481 <+21>: add eax,DWORD PTR [ebp+0xc] 0x0012e484 <+24>: pop ebp 0x0012e485 <+25>: ret End of assembler dump.

As expected, the real address of myglob was placed in all the mov instructions referring to it, just as the relocation entries specified.

Relocating function calls

So far this article demonstrated relocation of data references - using the global variable myglob as an example. Another thing that needs to be relocated is code references - in other words, function calls. This section is a brief guide on how this gets done. The pace is much faster than in the rest of this article, since I can now assume the reader understands what relocation is all about.

Without further ado, let's get to it. I've modified the code of the shared library to be the following:

int myglob = 42; int ml_util_func(int a) { return a + 1; } int ml_func(int a, int b) { int c = b + ml_util_func(a); myglob += c; return b + myglob; }

ml_util_func was added and it's being used by ml_func. Here's the disassembly of ml_func in the linked shared library:

000004a7 <ml_func>: 4a7: 55 push ebp 4a8: 89 e5 mov ebp,esp 4aa: 83 ec 14 sub esp,0x14 4ad: 8b 45 08 mov eax,DWORD PTR [ebp+0x8] 4b0: 89 04 24 mov DWORD PTR [esp],eax 4b3: e8 fc ff ff ff call 4b4 <ml_func+0xd> 4b8: 03 45 0c add eax,DWORD PTR [ebp+0xc] 4bb: 89 45 fc mov DWORD PTR [ebp-0x4],eax 4be: a1 00 00 00 00 mov eax,ds:0x0 4c3: 03 45 fc add eax,DWORD PTR [ebp-0x4] 4c6: a3 00 00 00 00 mov ds:0x0,eax 4cb: a1 00 00 00 00 mov eax,ds:0x0 4d0: 03 45 0c add eax,DWORD PTR [ebp+0xc] 4d3: c9 leave 4d4: c3 ret

What's interesting here is the instruction at address 0x4b3 - it's the call to ml_util_func. Let's dissect it:

e8 is the opcode for call. The argument of this call is the offset relative to the next instruction. In the disassembly above, this argument is 0xfffffffc, or simply -4. So the call currently points to itself. This clearly isn't right - but let's not forget about relocation. Here's what the relocation section of the shared library looks like now:

$ readelf -r libmlreloc.so Relocation section '.rel.dyn' at offset 0x324 contains 8 entries: Offset Info Type Sym.Value Sym. Name 00002008 00000008 R_386_RELATIVE 000004b4 00000502 R_386_PC32 0000049c ml_util_func 000004bf 00000401 R_386_32 0000200c myglob 000004c7 00000401 R_386_32 0000200c myglob 000004cc 00000401 R_386_32 0000200c myglob [...] skipping stuff

If we compare it to the previous invocation of readelf -r, we'll notice a new entry added for ml_util_func. This entry points at address 0x4b4 which is the argument of the call instruction, and its type is R_386_PC32. This relocation type is more complicated than R_386_32, but not by much.

It means the following: take the value at the offset specified in the entry, add the address of the symbol to it, subtract the address of the offset itself, and place it back into the word at the offset. Recall that this relocation is done at load-time, when the final load addresses of the symbol and the relocated offset itself are already known. These final addresses participate in the computation.

What does this do? Basically, it's a relative relocation, taking its location into account and thus suitable for arguments of instructions with relative addressing (which the e8 call is). I promise it will become clearer once we get to the real numbers.

I'm now going to build the driver code and run it under GDB again, to see this relocation in action. Here's the GDB session, followed by explanations:

$ gdb -q driver Reading symbols from driver...done. (gdb) b driver.c:31 Breakpoint 1 at 0x804869e: file driver.c, line 31. (gdb) r Starting program: driver [...] skipping output name=./libmlreloc.so (6 segments) address=0x12e000 header 0: address= 0x12e000 type=1, flags=0x5 header 1: address= 0x12ff04 type=1, flags=0x6 header 2: address= 0x12ff18 type=2, flags=0x6 header 3: address= 0x12e0f4 type=4, flags=0x4 header 4: address= 0x12e000 type=1685382481, flags=0x6 header 5: address= 0x12ff04 type=1685382482, flags=0x4 [...] skipping output Breakpoint 1, main (argc=1, argv=0xbffff3d4) at driver.c:31 31 } (gdb) set disassembly-flavor intel (gdb) disas ml_util_func Dump of assembler code for function ml_util_func: 0x0012e49c <+0>: push ebp 0x0012e49d <+1>: mov ebp,esp 0x0012e49f <+3>: mov eax,DWORD PTR [ebp+0x8] 0x0012e4a2 <+6>: add eax,0x1 0x0012e4a5 <+9>: pop ebp 0x0012e4a6 <+10>: ret End of assembler dump. (gdb) disas /r ml_func Dump of assembler code for function ml_func: 0x0012e4a7 <+0>: 55 push ebp 0x0012e4a8 <+1>: 89 e5 mov ebp,esp 0x0012e4aa <+3>: 83 ec 14 sub esp,0x14 0x0012e4ad <+6>: 8b 45 08 mov eax,DWORD PTR [ebp+0x8] 0x0012e4b0 <+9>: 89 04 24 mov DWORD PTR [esp],eax 0x0012e4b3 <+12>: e8 e4 ff ff ff call 0x12e49c <ml_util_func> 0x0012e4b8 <+17>: 03 45 0c add eax,DWORD PTR [ebp+0xc] 0x0012e4bb <+20>: 89 45 fc mov DWORD PTR [ebp-0x4],eax 0x0012e4be <+23>: a1 0c 00 13 00 mov eax,ds:0x13000c 0x0012e4c3 <+28>: 03 45 fc add eax,DWORD PTR [ebp-0x4] 0x0012e4c6 <+31>: a3 0c 00 13 00 mov ds:0x13000c,eax 0x0012e4cb <+36>: a1 0c 00 13 00 mov eax,ds:0x13000c 0x0012e4d0 <+41>: 03 45 0c add eax,DWORD PTR [ebp+0xc] 0x0012e4d3 <+44>: c9 leave 0x0012e4d4 <+45>: c3 ret End of assembler dump. (gdb)

The important parts here are:

In the printout from driver we see that the first segment (the code segment) of libmlreloc.so has been mapped to 0x12e000 [11]
ml_util_func was loaded to address 0x0012e49c
The address of the relocated offset is 0x0012e4b4
The call in ml_func to ml_util_func was patched to place 0xffffffe4 in the argument (I disassembled ml_func with the /r flag to show raw hex in addition to disassembly), which is interpreted as the correct offset to ml_util_func.

Obviously we're most interested in how (4) was done. Again, it's time for some math. Interpreting the R_386_PC32 relocation entry mentioned above, we have:

Take the value at the offset specified in the entry (0xfffffffc), add the address of the symbol to it (0x0012e49c), subtract the address of the offset itself (0x0012e4b4), and place it back into the word at the offset. Everything is done assuming 32-bit 2-s complement, of course. The result is 0xffffffe4, as expected.

Conclusion

Load-time relocation is one of the methods used in Linux (and other OSes) to resolve internal data and code references in shared libraries when loading them into memory. These days, position independent code (PIC) is a more popular approach, and some modern systems (such as x86-64) no longer support load-time relocation.

Still, I decided to write an article on load-time relocation for two reasons. First, load-time relocation has a couple of advantages over PIC on some systems, especially in terms of performance. Second, load-time relocation is IMHO simpler to understand without prior knowledge, which will make PIC easier to explain in the future. (Update 03.11.2011: the article about PIC was published)

Regardless of the motivation, I hope this article has helped to shed some light on the magic going behind the scenes of linking and loading shared libraries in a modern OS.