Skip to content

Commit d7d9a70

Browse files
committed
Changes from last revision:
* Fixed upper half of box in comment * Renamed foo to sumPosEltsScaledByIndex in examples * Added missing comma in resolver's emission section
1 parent ed19074 commit d7d9a70

File tree

3 files changed

+27
-27
lines changed

3 files changed

+27
-27
lines changed

content/learning-paths/smartphones-and-mobile/function-multiversioning/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ operatingsystems:
3333
- Android
3434
- macOS
3535

36-
37-
### FIXED, DO NOT MODIFY
36+
# ================================================================================
37+
# FIXED, DO NOT MODIFY
3838
# ================================================================================
3939
weight: 1 # _index.md always has weight of 1 to order correctly
4040
layout: "learningpathall" # All files under learning paths have this same wrapper

content/learning-paths/smartphones-and-mobile/function-multiversioning/examples.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ layout: learningpathall
1010

1111
#### Code Generation example
1212

13-
In this example we have specified two versions of `foo` using the `target_clones` attribute (the order in which they are listed does not matter). At certain optimization levels compilers can decide to perform loop vectorization depending on the target's vector capabilities. Our intention is to enable the compiler to use SVE instructions in the specialized case, whilst restricting it to use only Armv8 instructions in the default one.
13+
In this example we have specified two versions of `sumPosEltsScaledByIndex` using the `target_clones` attribute (the order in which they are listed does not matter). At certain optimization levels compilers can decide to perform loop vectorization depending on the target's vector capabilities. Our intention is to enable the compiler to use SVE instructions in the specialized case, whilst restricting it to use only Armv8 instructions in the default one.
1414

1515
loop.c
1616
```c
1717
__attribute__((target_clones("sve", "default")))
18-
int foo(int *v, unsigned n) {
18+
int sumPosEltsScaledByIndex(int *v, unsigned n) {
1919
int s = 0;
2020
for (unsigned i = 0; i < n; ++i)
2121
if (v[i] > 0)
@@ -34,13 +34,13 @@ $ gcc -march=armv8-a -O3 -S -o - loop.c
3434
```
3535
Note that when using the `clang` compiler, the option `--rtlib=compiler-rt` should be specified on the command line. This allows the compiler to generate runtime checks for detecting the presence of hardware features on your host target.
3636
37-
Here is the generated compiler output for the SVE version of `foo` (using `clang`):
37+
Here is the generated compiler output for the SVE version of `sumPosEltsScaledByIndex` (using `clang`):
3838
```
3939
.text
40-
.globl foo._Msve
40+
.globl sumPosEltsScaledByIndex._Msve
4141
.p2align 2
42-
.type foo._Msve,@function
43-
foo._Msve:
42+
.type sumPosEltsScaledByIndex._Msve,@function
43+
sumPosEltsScaledByIndex._Msve:
4444
cbz w1, .LBB0_3
4545
mov w9, w1
4646
cnth x8
@@ -96,7 +96,7 @@ foo._Msve:
9696
ret
9797
```
9898
99-
This is the default version of `foo`:
99+
This is the default version of `sumPosEltsScaledByIndex`:
100100
```
101101
.section .rodata.cst16,"aM",@progbits,16
102102
.p2align 4, 0x0
@@ -106,10 +106,10 @@ This is the default version of `foo`:
106106
.word 2
107107
.word 3
108108
.text
109-
.globl foo.default
109+
.globl sumPosEltsScaledByIndex.default
110110
.p2align 2
111-
.type foo.default,@function
112-
foo.default:
111+
.type sumPosEltsScaledByIndex.default,@function
112+
sumPosEltsScaledByIndex.default:
113113
cbz w1, .LBB2_3
114114
cmp w1, #8
115115
mov w9, w1
@@ -164,35 +164,35 @@ foo.default:
164164
ret
165165
```
166166
167-
Any calls to `foo` are routed through `foo.resolver`. This is the function which contains the runtime checks for feature detection. More on this later.
167+
Any calls to `sumPosEltsScaledByIndex` are routed through `sumPosEltsScaledByIndex.resolver`. This is the function which contains the runtime checks for feature detection. More on this later.
168168
```
169-
.section .text.foo.resolver,"axG",@progbits,foo.resolver,comdat
170-
.weak foo.resolver
169+
.section .text.sumPosEltsScaledByIndex.resolver,"axG",@progbits,sumPosEltsScaledByIndex.resolver,comdat
170+
.weak sumPosEltsScaledByIndex.resolver
171171
.p2align 2
172-
.type foo.resolver,@function
173-
foo.resolver:
172+
.type sumPosEltsScaledByIndex.resolver,@function
173+
sumPosEltsScaledByIndex.resolver:
174174
str x30, [sp, #-16]!
175175
bl __init_cpu_features_resolver
176176
adrp x8, __aarch64_cpu_features+3
177-
adrp x9, foo._Msve
178-
add x9, x9, :lo12:foo._Msve
177+
adrp x9, sumPosEltsScaledByIndex._Msve
178+
add x9, x9, :lo12:sumPosEltsScaledByIndex._Msve
179179
ldrb w8, [x8, :lo12:__aarch64_cpu_features+3]
180180
tst w8, #0x40
181-
adrp x8, foo.default
182-
add x8, x8, :lo12:foo.default
181+
adrp x8, sumPosEltsScaledByIndex.default
182+
add x8, x8, :lo12:sumPosEltsScaledByIndex.default
183183
csel x0, x8, x9, eq
184184
ldr x30, [sp], #16
185185
ret
186186
```
187187
188-
The called symbol `foo` is an indirect function (IFUNC) which points to the resolver.
188+
The called symbol `sumPosEltsScaledByIndex` is an indirect function (IFUNC) which points to the resolver.
189189
```
190-
.weak foo
191-
.type foo,@gnu_indirect_function
192-
.set foo, foo.resolver
190+
.weak sumPosEltsScaledByIndex
191+
.type sumPosEltsScaledByIndex,@gnu_indirect_function
192+
.set sumPosEltsScaledByIndex, sumPosEltsScaledByIndex.resolver
193193
```
194194
195-
The names `foo._Msve` and `foo.default` correspond to the function versions of `foo`. See the [Arm C Language Extensions](https://arm-software.github.io/acle/main/acle.html#name-mangling) document for more details on the name mangling rules.
195+
The names `sumPosEltsScaledByIndex._Msve` and `sumPosEltsScaledByIndex.default` correspond to the function versions of `sumPosEltsScaledByIndex`. See the [Arm C Language Extensions](https://arm-software.github.io/acle/main/acle.html#name-mangling) document for more details on the name mangling rules.
196196
197197
#### Runtime example with use of ACLE intrinsics
198198

content/learning-paths/smartphones-and-mobile/function-multiversioning/implementation-details.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ In order to select the most appropriate version of a function, each call to a ve
1818

1919
#### Resolver emission with LLVM
2020

21-
When using the LLVM compiler the resolver is emitted in the translation unit which contains the definition of the default version. To correctly generate a resolver the compiler must be aware of all the versions of a function. Therefore, the user must declare every function version in the TU where the default version resides. For example:
21+
When using the LLVM compiler, the resolver is emitted in the translation unit which contains the definition of the default version. To correctly generate a resolver the compiler must be aware of all the versions of a function. Therefore, the user must declare every function version in the TU where the default version resides. For example:
2222

2323
file1.c
2424
```c

0 commit comments

Comments
 (0)