|
| 1 | +# Dynamic Linking |
| 2 | + |
| 3 | +## Build dynamically linked shecc and programs |
| 4 | + |
| 5 | +Build the dynamically linked version of shecc, but notice that shecc currently doesn't support dynamic linking for the RISC-V architecture: |
| 6 | + |
| 7 | +```shell |
| 8 | +$ make ARCH=arm DYNLINK=1 |
| 9 | +``` |
| 10 | + |
| 11 | +Next, you can use shecc to build dynamically linked programs by adding the `--dynlink` flag: |
| 12 | + |
| 13 | +```shell |
| 14 | +# Use the stage 0 compiler |
| 15 | +$ out/shecc --dynlink -o <output> <input.c> |
| 16 | +# Use the stage 1 or stage 2 compiler |
| 17 | +$ qemu-arm -L <LD_PREFIX> out/shecc-stage2.elf --dynlink -o <output> <input.c> |
| 18 | + |
| 19 | +# Execute the compiled program |
| 20 | +$ qemu-arm -L <LD_PREFIX> <output> |
| 21 | +``` |
| 22 | + |
| 23 | +When executing a dynamically linked program, you should set the ELF interpreter prefix so that `ld.so` can be invoked. Generally, it should be `/usr/arm-linux-gnueabihf` if you have installed the ARM GNU toolchain by `apt`. Otherwise, you should find and specify the correct path if you manually installed the toolchain. |
| 24 | + |
| 25 | +## Stack frame layout |
| 26 | + |
| 27 | +In dynamic linking mode, the stack frame layout for each function can be illustrated as follows: |
| 28 | + |
| 29 | +``` |
| 30 | +High Address |
| 31 | ++------------------+ |
| 32 | +| incoming args | |
| 33 | ++------------------+ <- sp + total_size |
| 34 | +| saved lr | |
| 35 | ++------------------+ <- sp + total_size - 4 |
| 36 | +| local variables | |
| 37 | ++------------------+ <- sp + 20 |
| 38 | +| saved r12 (ip) | |
| 39 | ++------------------+ <- sp + 16 |
| 40 | +| outgoing args | |
| 41 | ++------------------+ <- sp (MUST be aligned to 8 bytes) |
| 42 | +Low Address |
| 43 | +``` |
| 44 | + |
| 45 | +* `total_size`: includes the size of the following elements: |
| 46 | + * `outgoing args`: a fixed size - 16 bytes |
| 47 | + * `saved r12`: a fixed size - 4 bytes |
| 48 | + * All local variables |
| 49 | + * `saved lr`: a fixed size - 4 bytes |
| 50 | + |
| 51 | + |
| 52 | +Currently, since the maximal number of arguments is 8, an additional 20 bytes of stack space are allocated for outgoing arguments and register `r12`. |
| 53 | + |
| 54 | +For the Arm architecture, when the callee is an external function, the caller uses the first 16 bytes to push extra arguments onto stack to comply with calling convention.. |
| 55 | + |
| 56 | +In addition, because external functions may modify register `r12`, which holds the pointer of the global stack, the caller also preserves its value at `[sp + 16]` and restores it after the external function returns. |
| 57 | + |
| 58 | +## About function arguments handling |
| 59 | + |
| 60 | +### Arm (32-bit) |
| 61 | + |
| 62 | +If the callee is an internal function meaning that its implementation is compiled by shecc, the caller directly puts all arguments into register `r0` - `r7`. |
| 63 | + |
| 64 | +Conversely, the caller performs the following operations to comply with the Arm Architecture Procedure Call Standard (AAPCS). |
| 65 | + |
| 66 | +* First four arguments are put into `r0` - `r3` |
| 67 | +* Other additional arguments are passed to stack. Arguments are pushed onto stack starting from the last argument, so the fifth argument is at the lower address and the last argument is at the higher address. |
| 68 | +* Align the stack pointer to 8 bytes, as external functions may access 8-byte objects, which require 8-byte alignment. |
| 69 | + |
| 70 | +### RISC-V (32-bit) |
| 71 | + |
| 72 | +(Currently not supported) |
| 73 | + |
| 74 | +## Runtime execution flow |
| 75 | + |
| 76 | +1. Program starts at ELF entry point. |
| 77 | +2. Dynamic linker (`ld.so`) is invoked. |
| 78 | + * For the Arm architecture, the dynamic linker is `/lib/ld-linux-armhf.so.3`. |
| 79 | +3. Linker loads shared libraries such as `libc.so`. |
| 80 | +4. Linker resolves symbols and fills global offset table (GOT). |
| 81 | +5. Control transfers to the program. |
| 82 | +6. Program executes `__libc_start_main` at the beginning. |
| 83 | +7. `__libc_start_main` calls the *main wrapper*, which sets up a global stack for all global variables (but excluding read-only variables) and initializes them. |
| 84 | +8. Execute the *main wrapper*. |
| 85 | +9. After the *main wrapper* completes, it passes `argc` and `argv` to registers correctly, then jumps to the `main` function to continue execution. |
| 86 | +10. After the `main` function returns, `__libc_start_main` implicitly calls `exit(3)` to terminate the program. |
| 87 | + |
| 88 | +## Dynamic sections |
| 89 | + |
| 90 | +When using dynamic linking, the following sections are generated for compiled programs: |
| 91 | + |
| 92 | +1. `.interp` - Path to dynamic linker |
| 93 | +2. `.dynsym` - Dynamic symbol table |
| 94 | +3. `.dynstr` - Dynamic string table |
| 95 | +4. `.rel.plt` - PLT relocations |
| 96 | +5. `.plt` - Procedure Linkage Table |
| 97 | +6. `.got` - Global Offset Table |
| 98 | +7. `.dynamic` - Dynamic linking information |
| 99 | + |
| 100 | +### PLT explanation for Arm32 |
| 101 | + |
| 102 | +The first entry contains the following instructions to invoke resolver to perform relocation. |
| 103 | + |
| 104 | +``` |
| 105 | +push {lr} @ (str lr, [sp, #-4]!) |
| 106 | +movw sl, #:lower16:(&GOT[2]) |
| 107 | +movt sl, #:upeer16:(&GOT[2]) |
| 108 | +mov lr, sl |
| 109 | +ldr pc, [lr] |
| 110 | +``` |
| 111 | + |
| 112 | +1. Push register `lr` onto stack. |
| 113 | +2. Set register `sl` to the address of `GOT[2]`. |
| 114 | +3. Move the value of `sl` to `lr`. |
| 115 | +4. Load the value located at `[lr]` into the program counter (`pc`). |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +The remaining entries correspond to all external functions, with each entry including the following instructions: |
| 120 | + |
| 121 | +``` |
| 122 | +movw ip, #:lower16:(&GOT[x]) |
| 123 | +movt ip, #:upper16:(&GOT[x]) |
| 124 | +ldr pc, [ip] |
| 125 | +``` |
| 126 | + |
| 127 | +1. Set register `ip` to the address of `GOT[x]`. |
| 128 | +2. Assign register `pc` to the value of `GOT[x]`. That is, set `pc` to the address of the callee. |
| 129 | + |
0 commit comments