Skip to content

Commit 01f0471

Browse files
committed
expand on the feedback by Tamar
1 parent c9321f3 commit 01f0471

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

content/learning-paths/cross-platform/loop-reflowing/autovectorization-and-restrict.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ addvec:
8080

8181
As you can see, the compiler has enabled autovectorization for this algorithm and the output is identical to the hand-written function! Strictly speaking, you don't even need `restrict` in such a trivial loop as it will be autovectorized anyway when certain optimization levels are added to the compilation flags (`-O2` for clang, `-O3` for gcc). However, the use of restrict simplifies the code and generates SIMD code similar to the hand written version in `addvec_neon.c`.
8282

83-
This is just a trivial example though and not all loops can be autovectorized that easily by the compiler.
83+
The reason for this is because of the way each compiler decides whether to use autovectorization or not. For each candidate loop the compiler will estimate the possible performance gains against a cost model, which is affected by many parameters and of course the optimization level in the compilation flags. This cost model will estimate whether the autovectorized code grows in size and if the performance gains are enough to outweigh this increase in code size. Based on this estimation, the compiler will decide to use this vectorized code or fall back to a more 'safe' scalar implementation. This decision however is something that is not set in stone and is constantly reevaluated during compiler development.
84+
85+
This analysis goes beyond the scope of this LP, this was just one trivial example to demonstrate how the autovectorization can be triggered by a flag.
8486

8587
You will see some more advanced examples in the next sections.

0 commit comments

Comments
 (0)