You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned in the last meeting, I had the chance of doing a few NUFFT scaling tests on a very large node (AMD EPYC 9965, 384 physical cores in total, AVX512 support).
I compiled the latest versions of Finufft and ducc from source and ran type 1/2 double precision transforms on a 256**3 grid with varying nonuniform point densities (1 and 10) and thread counts (32, 64, 128).
Here are a few results (Finufft was compiled with ducc FFTs):
Density 1:
Density 10:
I also tried Finufft with the FFTW backend, but it seems that FFTW goes a little berserk when confronted with many threads:
Some lessons learned:
FFTW has problems with large thread counts (at least when FFTW_ESTIMATE is used).
Scaling with number of threads is far from optimal in both libraries (this is also a reason why I didn''t try to scale to all 384 cores). Not sure if this will get significantly better for larger problems ...
The scaling of Finufft's planning stage could be improved.
Performance at epsilon=1e-6 is not much different from the A100 GPU results reported by @jipolanco, which I'm quite happy about :-)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
As mentioned in the last meeting, I had the chance of doing a few NUFFT scaling tests on a very large node (AMD EPYC 9965, 384 physical cores in total, AVX512 support).
I compiled the latest versions of Finufft and ducc from source and ran type 1/2 double precision transforms on a 256**3 grid with varying nonuniform point densities (1 and 10) and thread counts (32, 64, 128).
Here are a few results (Finufft was compiled with ducc FFTs):
Density 1:
Density 10:
I also tried Finufft with the FFTW backend, but it seems that FFTW goes a little berserk when confronted with many threads:
Some lessons learned:
Beta Was this translation helpful? Give feedback.
All reactions