Skip to content

Commit 76c7a8b

Browse files
Merge pull request #1705 from labrinea/fmv-updates-in-llvm20
[FMV] Updates for the LLVM 20 release.
2 parents 35bce95 + 9455fb9 commit 76c7a8b

File tree

5 files changed

+192
-14
lines changed

5 files changed

+192
-14
lines changed

content/learning-paths/cross-platform/function-multiversioning/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ prerequisites:
1717
- Familiarity with indirect functions (ifuncs).
1818
- Basic knowledge of loop vectorization.
1919
- Familiarity with Arm assembly.
20-
- A LLVM 19 compiler with runtime library support or GCC 14.
20+
- A LLVM 20 compiler with runtime library support or GCC 14.
2121

2222
author: Alexandros Lamprineas
2323

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
title: Changes from released compilers
3+
weight: 8
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
#### ACLE specification changes
10+
11+
- The set of supported features has changed as indicated by the [ACLE Q3 change log](https://arm-software.github.io/acle/main/acle.html#changes-between-acle-q2-2024-and-acle-q3-2024) and the [ACLE Q4 change log](https://arm-software.github.io/acle/main/acle.html#changes-between-acle-q3-2024-and-acle-q4-2024).
12+
- The runtime detection of features has changed. Dependent-on features get detected as indicated by the [dependencies table](https://arm-software.github.io/acle/main/acle.html#dependencies).
13+
- The most appropriate version of a function is determined as indicated by the new [selection rules](https://arm-software.github.io/acle/main/acle.html#selection). Previously, the most specific version (the one with most features) was favored over any other version.
14+
- A new predefined macro `__FUNCTION_MULTI_VERSIONING_SUPPORT_LEVEL` has been added to indicate which ACLE version is implemented by the compiler.
15+
16+
#### Semantic changes between LLVM 19 and LLVM 20
17+
18+
With LLVM 19 at least one more version other than the default is needed to trigger function multiversioning. With LLVM 20 a header file declaration:
19+
20+
```c
21+
__attribute__((target_version("default"))) void f(void);
22+
```
23+
24+
guarantees that there will be a mangled version `f.default`. Conversely, LLVM 19 would generate an unmangled symbol here since function multiversioning does not trigger when compiling this code in the absence of other versions.
25+
26+
#### Static resolution in LLVM 20
27+
28+
LLVM can optimize calls to versioned functions when they can be statically resolved. For example:
29+
30+
```c
31+
__attribute__((target_version("mops"))) int f(void);
32+
__attribute__((target_version("sve2"))) int f(void);
33+
__attribute__((target_version("sve"))) int f(void);
34+
__attribute__((target_version("default"))) int f(void) { return 0; }
35+
36+
__attribute__((target_version("mops+sve2"))) int caller(void) {
37+
return f(); // f._Mmops is called directly
38+
}
39+
__attribute__((target_version("mops"))) int caller(void) {
40+
return f(); // f._Mmops is called directly
41+
}
42+
__attribute__((target_version("sve"))) int caller(void) {
43+
return f(); // cannot be optimized since SVE2 may be available on target
44+
}
45+
__attribute__((target_version("default"))) int caller(void) {
46+
return f(); // f.default is called directly
47+
}
48+
```

content/learning-paths/cross-platform/function-multiversioning/implementation-details.md

Lines changed: 58 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,74 @@
11
---
22
title: Further information on implementation
3-
weight: 6
3+
weight: 7
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

99
To select the most appropriate version of a function, each call to a versioned function is routed through an indirect function resolver which is pointed by the called symbol (ifunc).
1010

11-
The compiler generates a resolver based on the function versions declared in the translation unit. A typical resolver implementation uses a runtime library to detect the presence of the architectural features on which the function versions depend and returns a pointer to the correct version.
11+
The compiler generates a resolver based on the function versions declared in the translation unit. A typical resolver implementation uses a runtime library to detect the presence of the architectural features on which the function versions depend and returns a pointer to the correct version. Features implied by the command line are not exempt from runtime detection.
1212

1313
The resolution of the called symbol is delayed until runtime, when the dynamic loader runs the resolver and updates the procedure linkage table (PLT) with a pointer to the chosen implementation. The resolver function is run only once and its returned value remains unchanged for the lifetime of the process.
1414

1515
Relocations handle references to the called symbol, which return the cached PLT entry.
1616

17-
#### Differences between GCC 14 and LLVM 19 implementations
17+
#### Feature detection at runtime
18+
19+
Some architectural features depend on others as indicated by the [dependencies table](https://arm-software.github.io/acle/main/acle.html#dependencies). Those are detected transitively and they are not exempt from runtime detection if implied by the command line. For example `rcpc3` depends on `rcpc2` which depends on `rcpc`. All three are detected in the following example.
20+
21+
Use a text editor to create a file named `rcpc.c` with the code below:
22+
23+
```c
24+
__attribute__((target_clones("rcpc3", "default"))) int f(void) { return 0; }
25+
```
26+
27+
{{% notice Note %}}
28+
The depended-on features (rcpc2, rcpc) *need not* be specified in the attribute, but they *may* well be (there is no functional difference):
29+
```c
30+
__attribute__((target_clones("rcpc3+rcpc2+rcpc", "default")))
31+
```
32+
{{% /notice %}}
33+
34+
To compile with Clang, run:
35+
36+
```console
37+
clang --target=aarch64-linux-gnu -march=armv8-a+rcpc -O3 --rtlib=compiler-rt -S -o - rcpc.c
38+
```
39+
40+
Here is the generated resolver function containing the runtime detection of features:
41+
42+
```output
43+
.section .text.f.resolver,"axG",@progbits,f.resolver,comdat
44+
.weak f.resolver
45+
.p2align 2
46+
.type f.resolver,@function
47+
f.resolver:
48+
str x30, [sp, #-16]!
49+
bl __init_cpu_features_resolver
50+
adrp x8, __aarch64_cpu_features
51+
mov x9, #12582912
52+
adrp x10, f.default
53+
add x10, x10, :lo12:f.default
54+
ldr x8, [x8, :lo12:__aarch64_cpu_features]
55+
movk x9, #1024, lsl #48
56+
bics xzr, x9, x8
57+
adrp x8, f._Mrcpc3
58+
add x8, x8, :lo12:f._Mrcpc3
59+
csel x0, x8, x10, eq
60+
ldr x30, [sp], #16
61+
ret
62+
```
63+
{{% notice Note %}}
64+
The immediate value `#12582912` in this assembly is used to construct a bitmask for materializing the runtime detection of `rcpc3`.
65+
{{% /notice %}}
66+
67+
#### Differences between GCC 14 and LLVM 20 implementations
1868

1969
- The attribute `target_version` in GCC is only supported for C++, not for C.
2070
- The set of features as indicated by the [mapping table](https://arm-software.github.io/acle/main/acle.html#mapping) differs in support between the two compilers.
21-
- GCC can statically resolve calls to versioned functions, whereas LLVM cannot.
71+
- LLVM supports mixing `target_version` with `target_clones` whereas GCC does not yet support this.
2272

2373
#### Resolver emission with LLVM
2474

@@ -47,12 +97,10 @@ The compilation of `file1.c` yields normal code generation since no version of `
4797

4898
When compiling `file2.c` a resolver is emitted for `func1` due to the presence of its default definition. GCC does not currently support multiversioning for this example as it only generates a resolver when a function is called.
4999

50-
#### Static resolution with GCC
51-
52-
The GCC compiler optimizes calls to versioned functions when they can be statically resolved.
100+
#### Static resolution of calls
53101

54-
Such calls would otherwise be routed through the resolver, but instead they become direct which allows them to be inlined.
102+
Normally the called symbol is resolved at runtime (dynamically), however it may be possible to determine which function version to call at compile time (statically).
55103

56-
This might be possible whenever a function is compiled with a sufficiently high set of architecture features (so including `target`/`target_version`/`target_clones` attributes, and command line options).
104+
This may be possible when the caller function is compiled with a sufficiently high set of architecture features (explicitly by using the `target` attribute as an optimization hint, or the multiversioning attributes `target_version`/`target_clones`, and implicitly via command line options). See [the example](/learning-paths/cross-platform/function-multiversioning/changes-from-past-releases).
57105

58-
LLVM is not yet able to perform this optimization.
106+
The compiler optimizes calls to versioned functions which can be statically resolved into direct calls. As a result the versioned function may be inlined into the call site.

content/learning-paths/cross-platform/function-multiversioning/semantics.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,8 @@ A hardware platform is able to support multiple architectural features from the
1818

1919
Function multiversioning provides a convenient way to select the most appropriate version of a function at runtime. The selection is permanent for the lifetime of the process and works as follows:
2020

21-
1. Select the most specific version (the one with most features), else
22-
2. Select the version with the highest priority, as indicated by the [mapping table](https://arm-software.github.io/acle/main/acle.html#mapping), else
23-
3. Select a default version if no other versions are suitable.
21+
1. Select the version with the highest priority, as indicated by the [selection rules](https://arm-software.github.io/acle/main/acle.html#selection), else
22+
2. Select a default version if no other versions are suitable.
2423

2524
The `default` version is the version of the function that is generated without these attributes.
2625

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: Compatibility with streaming mode
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
Function Multi Versioning is compatible with Arm streaming mode as long as the same calling convention is used across all function versions.
10+
11+
Use a text editor to create a file named `streaming.c` with the code below:
12+
13+
```c
14+
__attribute__((target_clones("sve", "simd")))
15+
void ok_arm_streaming(void) __arm_streaming {}
16+
17+
__arm_locally_streaming __attribute__((target_version("sme2")))
18+
void ok_arm_streaming(void) __arm_streaming {}
19+
20+
__attribute__((target_version("default")))
21+
void ok_arm_streaming(void) __arm_streaming {}
22+
23+
24+
__attribute__((target_clones("sve", "simd")))
25+
void ok_arm_streaming_compatible(void) __arm_streaming_compatible {}
26+
27+
__arm_locally_streaming __attribute__((target_version("sme2")))
28+
void ok_arm_streaming_compatible(void) __arm_streaming_compatible {}
29+
30+
__attribute__((target_version("default")))
31+
void ok_arm_streaming_compatible(void) __arm_streaming_compatible {}
32+
33+
34+
__arm_locally_streaming __attribute__((target_clones("sve", "simd")))
35+
void ok_no_streaming(void) {}
36+
37+
__attribute__((target_version("sme2")))
38+
void ok_no_streaming(void) {}
39+
40+
__attribute__((target_version("default")))
41+
void ok_no_streaming(void) {}
42+
43+
44+
__attribute__((target_clones("sve", "simd")))
45+
void bad_mixed_streaming(void) {}
46+
47+
__attribute__((target_version("sme2")))
48+
void bad_mixed_streaming(void) __arm_streaming {} // expected-error: declaration has a different calling convention
49+
50+
__attribute__((target_version("default")))
51+
void bad_mixed_streaming(void) __arm_streaming_compatible {} // expected-error: declaration has a different calling convention
52+
53+
__arm_locally_streaming __attribute__((target_version("dotprod")))
54+
void bad_mixed_streaming(void) __arm_streaming {} // expected-error: declaration has a different calling convention
55+
56+
57+
void n_caller(void) {
58+
ok_arm_streaming();
59+
ok_arm_streaming_compatible();
60+
ok_no_streaming();
61+
bad_mixed_streaming();
62+
}
63+
64+
void s_caller(void) __arm_streaming {
65+
ok_arm_streaming();
66+
ok_arm_streaming_compatible();
67+
ok_no_streaming();
68+
bad_mixed_streaming();
69+
}
70+
71+
void sc_caller(void) __arm_streaming_compatible {
72+
ok_arm_streaming();
73+
ok_arm_streaming_compatible();
74+
ok_no_streaming();
75+
bad_mixed_streaming();
76+
}
77+
```
78+
79+
To compile with Clang, run:
80+
81+
```console
82+
clang --target=aarch64-linux-gnu -march=armv8-a+sme --rtlib=compiler-rt -c streaming.c
83+
```

0 commit comments

Comments
 (0)