|
1 | 1 | --- |
2 | 2 | title: Further information on implementation |
3 | | -weight: 6 |
| 3 | +weight: 7 |
4 | 4 |
|
5 | 5 | ### FIXED, DO NOT MODIFY |
6 | 6 | layout: learningpathall |
7 | 7 | --- |
8 | 8 |
|
9 | 9 | To select the most appropriate version of a function, each call to a versioned function is routed through an indirect function resolver which is pointed by the called symbol (ifunc). |
10 | 10 |
|
11 | | -The compiler generates a resolver based on the function versions declared in the translation unit. A typical resolver implementation uses a runtime library to detect the presence of the architectural features on which the function versions depend and returns a pointer to the correct version. |
| 11 | +The compiler generates a resolver based on the function versions declared in the translation unit. A typical resolver implementation uses a runtime library to detect the presence of the architectural features on which the function versions depend and returns a pointer to the correct version. Features implied by the command line are not exempt from runtime detection. |
12 | 12 |
|
13 | 13 | The resolution of the called symbol is delayed until runtime, when the dynamic loader runs the resolver and updates the procedure linkage table (PLT) with a pointer to the chosen implementation. The resolver function is run only once and its returned value remains unchanged for the lifetime of the process. |
14 | 14 |
|
15 | 15 | Relocations handle references to the called symbol, which return the cached PLT entry. |
16 | 16 |
|
17 | | -#### Differences between GCC 14 and LLVM 19 implementations |
| 17 | +#### Feature detection at runtime |
| 18 | + |
| 19 | +Some architectural features depend on others as indicated by the [dependencies table](https://arm-software.github.io/acle/main/acle.html#dependencies). Those are detected transitively and they are not exempt from runtime detection if implied by the command line. For example `rcpc3` depends on `rcpc2` which depends on `rcpc`. All three are detected in the following example. |
| 20 | + |
| 21 | +Use a text editor to create a file named `rcpc.c` with the code below: |
| 22 | + |
| 23 | +```c |
| 24 | +__attribute__((target_clones("rcpc3", "default"))) int f(void) { return 0; } |
| 25 | +``` |
| 26 | +
|
| 27 | +{{% notice Note %}} |
| 28 | +The depended-on features (rcpc2, rcpc) *need not* be specified in the attribute, but they *may* well be (there is no functional difference): |
| 29 | +```c |
| 30 | +__attribute__((target_clones("rcpc3+rcpc2+rcpc", "default"))) |
| 31 | +``` |
| 32 | +{{% /notice %}} |
| 33 | + |
| 34 | +To compile with Clang, run: |
| 35 | + |
| 36 | +```console |
| 37 | +clang --target=aarch64-linux-gnu -march=armv8-a+rcpc -O3 --rtlib=compiler-rt -S -o - rcpc.c |
| 38 | +``` |
| 39 | + |
| 40 | +Here is the generated resolver function containing the runtime detection of features: |
| 41 | + |
| 42 | +```output |
| 43 | + .section .text.f.resolver,"axG",@progbits,f.resolver,comdat |
| 44 | + .weak f.resolver |
| 45 | + .p2align 2 |
| 46 | + .type f.resolver,@function |
| 47 | +f.resolver: |
| 48 | + str x30, [sp, #-16]! |
| 49 | + bl __init_cpu_features_resolver |
| 50 | + adrp x8, __aarch64_cpu_features |
| 51 | + mov x9, #12582912 |
| 52 | + adrp x10, f.default |
| 53 | + add x10, x10, :lo12:f.default |
| 54 | + ldr x8, [x8, :lo12:__aarch64_cpu_features] |
| 55 | + movk x9, #1024, lsl #48 |
| 56 | + bics xzr, x9, x8 |
| 57 | + adrp x8, f._Mrcpc3 |
| 58 | + add x8, x8, :lo12:f._Mrcpc3 |
| 59 | + csel x0, x8, x10, eq |
| 60 | + ldr x30, [sp], #16 |
| 61 | + ret |
| 62 | +``` |
| 63 | +{{% notice Note %}} |
| 64 | +The immediate value `#12582912` in this assembly is used to construct a bitmask for materializing the runtime detection of `rcpc3`. |
| 65 | +{{% /notice %}} |
| 66 | + |
| 67 | +#### Differences between GCC 14 and LLVM 20 implementations |
18 | 68 |
|
19 | 69 | - The attribute `target_version` in GCC is only supported for C++, not for C. |
20 | 70 | - The set of features as indicated by the [mapping table](https://arm-software.github.io/acle/main/acle.html#mapping) differs in support between the two compilers. |
21 | | -- GCC can statically resolve calls to versioned functions, whereas LLVM cannot. |
| 71 | +- LLVM supports mixing `target_version` with `target_clones` whereas GCC does not yet support this. |
22 | 72 |
|
23 | 73 | #### Resolver emission with LLVM |
24 | 74 |
|
@@ -47,12 +97,10 @@ The compilation of `file1.c` yields normal code generation since no version of ` |
47 | 97 |
|
48 | 98 | When compiling `file2.c` a resolver is emitted for `func1` due to the presence of its default definition. GCC does not currently support multiversioning for this example as it only generates a resolver when a function is called. |
49 | 99 |
|
50 | | -#### Static resolution with GCC |
51 | | - |
52 | | -The GCC compiler optimizes calls to versioned functions when they can be statically resolved. |
| 100 | +#### Static resolution of calls |
53 | 101 |
|
54 | | -Such calls would otherwise be routed through the resolver, but instead they become direct which allows them to be inlined. |
| 102 | +Normally the called symbol is resolved at runtime (dynamically), however it may be possible to determine which function version to call at compile time (statically). |
55 | 103 |
|
56 | | -This might be possible whenever a function is compiled with a sufficiently high set of architecture features (so including `target`/`target_version`/`target_clones` attributes, and command line options). |
| 104 | +This may be possible when the caller function is compiled with a sufficiently high set of architecture features (explicitly by using the `target` attribute as an optimization hint, or the multiversioning attributes `target_version`/`target_clones`, and implicitly via command line options). See [the example](/learning-paths/cross-platform/function-multiversioning/changes-from-past-releases). |
57 | 105 |
|
58 | | -LLVM is not yet able to perform this optimization. |
| 106 | +The compiler optimizes calls to versioned functions which can be statically resolved into direct calls. As a result the versioned function may be inlined into the call site. |
0 commit comments