-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Closed
Labels
Description
Flang can't vectorize the loop in s115 of TSVC while Clang can vectorize the loop written in C.
- Fortran
! Fortran version
subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)
integer ntimes, ld, n, i, nl, j
real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)
call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ')
do 10 j = 1,n
do 20 i = j+1, n
a(i) = a(i) - aa(i,j) * a(j)
20 continue
10 continue
call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
end$ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
"/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f- C
// C version
#define LEN 32000
#define LEN2 256
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2];
int s115() {
init( "s115 ");
for (int j = 0; j < LEN2; j++) {
for (int i = j+1; i < LEN2; i++) {
a[i] -= aa[j][i] * a[j];
}
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
return 0;
}$ clang -O3 s115.c -S -Rpass=vector
s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
10 | for (int i = j+1; i < LEN2; i++) {
| ^If j+1 overflow, the access to a(i) and a(j) may overlap so vectorization is prevented.
IIRC, compilers don't have to consider it.