-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
When measuring 649.fotonik3d_s from SPEC CPU 2017 on the Grace (AArch64) machine, the speed of LLVM was slower than GCC as the number of threads increased. The options specified were -Ofast -mcpu=neoverse-v2.
According to our investigation, one of the reasons seems to be that the read statement, called in the non-parallelized initialization process, is slower than Gfortran.
The read statement is called approximately 54 million times within the nested loop, so its impact is significant.
The difference in processing time of the read statement was verified using the test program test.f90. This test was measured using LLVM 20.1.0 built with the build option -DCMAKE_BUILD_TYPE=Release.
To minimize the impact of file systems like NFS, the input file test.dat was placed in the machine's local directory.
test.dat can be generated with the following command:
python3 -c 'for i in range(54000000): print(i,i,i)' > test.dat
- test.f90
program main
integer, parameter :: num = 54000000
integer, dimension(num) :: a,b,c
integer :: t1,t2, count, countmax
integer :: ii
open(unit=9, file='test.dat', status='old', IOSTAT=ios)
call system_clock(t1, count, countmax)
do ii=1,num
read(9,*) a(ii),b(ii),c(ii)
end do
call system_clock(t2)
print*, "init time : ",real(t2 - t1)/count,"sec"
close(9)
end program main- Compilation commands
$ flang-new -O3 -ffast-math -mcpu=neoverse-v2 test.f90
$ gfortran -O3 -ffast-math -mcpu=neoverse-v2 test.f90
- Measurement results
| version | time | |
|---|---|---|
| GCC | 14.2.0 | 22.496 [s] |
| LLVM (Release build) | 20.1.0 | 42.341 [s] |
The read statement appears to be nearly twice as slow in Flang compared to Gfortran.