|
| 1 | ++ |
| 2 | +title = "Understanding ELFs, part 3" |
| 3 | +date = 2025-01-30 |
| 4 | +authors = ["InnocentZero"] |
| 5 | ++++ |
| 6 | + |
| 7 | +## On relocations, loading binaries, and more |
| 8 | + |
| 9 | +The reason we need relocations is because of a simple fact, the existence of shared libraries. |
| 10 | + |
| 11 | +One question anyone may ask is the necessity of having shared libraries. That is done to avoid |
| 12 | +repitition of pages in memory, a thing which was critical in older days because of low memory. |
| 13 | +Another thing to note is that there is separation of the library and the binary. The library can |
| 14 | +be updated without updating the binary as such. |
| 15 | + |
| 16 | +This is dealt with by using _relocation sections_. These contain the info needed to do the |
| 17 | +relocation of the symbol within the binary's context. The section usually links to an additional |
| 18 | +section where the relocation is going to happen. |
| 19 | + |
| 20 | +There are two ways in which object files may be linked: statically and dynamically. |
| 21 | + |
| 22 | +Static linking is fairly straightforward, the linker takes in all the object files and archive |
| 23 | +files (=libc.a=) and creates a single self-contained binary containing all the required |
| 24 | +functionality. This is done at the end of compilation itself. |
| 25 | + |
| 26 | +Dynamic linking is a slightly more complex and involved process. It defers the linking part from |
| 27 | +compile time to runtime. The binary contains the information about its choice of runtime linker |
| 28 | +(also referred to as an _interpreter_) and the dynamic symbols and how to obtain them. |
| 29 | + |
| 30 | + |
| 31 | +## Loading an ELF on the memory |
| 32 | + |
| 33 | +The system first executes the file's "interpreter" before handing over execution to the binary. |
| 34 | +Over here, the interpreter is obtained from the `.interp` section in the `PT_INTERP` segment in |
| 35 | +memory. This can be read using `readelf -p .interp example`. |
| 36 | + |
| 37 | +``` |
| 38 | +$ readelf -p .interp example |
| 39 | +
|
| 40 | +String dump of section '.interp': |
| 41 | + [ 0] /lib64/ld-linux-x86-64.so.2 |
| 42 | +``` |
| 43 | + |
| 44 | +The interpreter loads the binary into memory first. |
| 45 | + |
| 46 | +The interpreter sets up the environment using the `.dynamic` section of the binary. This can be |
| 47 | +seen using `readelf -d executable`. |
| 48 | + |
| 49 | +In this, the interpreter will recursively begin visiting all the **NEEDED** dynamic libraries to be |
| 50 | +loaded into memory. For each dependency, the following steps are executed: |
| 51 | + |
| 52 | +- The ELF is mapped into memory. |
| 53 | +- Relocations are performed, in the original binary we patch all the absolute addresses and |
| 54 | + resolve references to other object files. |
| 55 | +- Its dynamic table is parsed and dependencies loaded. |
| 56 | +- Run `dl_init`, which executes all the functions from `INIT`, and `INIT_ARRAY` for the just loaded |
| 57 | + libraries. |
| 58 | + |
| 59 | +Now the control is handed over to `_start` in the ELF binary. That gets the pointer to `_dl_fini` |
| 60 | +in `rdx`. This prepares the stack with a few arguments and calls `_libc_start_main`. |
| 61 | + |
| 62 | +`_libc_start_main` receives a function pointer to `main`, `init`, `fini`, and `rtld_fini` (this is the |
| 63 | +same as `dl_fini`). |
| 64 | + |
| 65 | +This function has a bunch of things going on, such as setting up of thread local storage and |
| 66 | +such. Here we only care about two things: |
| 67 | + |
| 68 | +- `__cxa_atexit__` which sets up `_dl_fini` as the destructor after the program is done. |
| 69 | + |
| 70 | +- A call to `call_init` that run the constructors in the `INIT` and `INIT_ARRAY` dynamic table |
| 71 | + entries. Note that `dl_init` was for the entries in the shared libraries themselves, but this |
| 72 | + is for the binary. |
| 73 | + |
| 74 | +- Finally, control after this is handed over to `main`. |
| 75 | + |
| 76 | +- Immediately after `main`, `exit` is called. This only transfers the control to |
| 77 | + `__run_exit_handlers`. |
| 78 | + |
| 79 | +- This runs all the functions registered in `__exit_funcs` which also contains `_dl_fini`. |
0 commit comments