Skip to content

Commit d969d6e

Browse files
authored
CMake: AMReX_FASTMATH (#4545)
## Summary Add new general compiler flag for fast-math optimizations. https://en.wikipedia.org/wiki/Floating-point_arithmetic#%22Fast_math%22_optimization **This PR does not yet change any defaults (see details below).** Close #1769 ## Proposed Breaking Change By default, AMReX turned fast-math OFF for all backends (serial, CPU/OpenMP, SYCL GPU, HIP GPU) besides: - CUDA GPUs (both GNUmake & CMake) - a single, legacy [cray compiler in GNUmake](https://github.com/AMReX-Codes/amrex/blob/25.07/Tools/GNUMake/comps/cray.mak) I propose to amend this PR to: - remove the `AMReX_CUDA_FASTMATH` flag (default: ON) - alternatively: fade it out and set its default value to `AMReX_FASTMATH` - turn `AMReX_FASTMATH` to `OFF` by default for all backends. Contrary to turning it `ON` by default, users of most backends will not be surprised by a sudden change in numerical results. ### Breaking: - Users that support fast-math, which might be many will see a sudden drop of performance on CUDA GPUs. ### Recommended migration: - If you already benchmarked correctness on CUDA GPUs, I recommend to intentionally set in your projects `-DAMReX_FASTMATH=ON` for all backends, to get the performance benefits on CPUs, Intel and AMD GPUs as well. ## Additional background As HPC team, we should strive to make our numerics as robust as possible under fast-math optimizations. Without those, your compiler will barely auto-vectorize, will not replace simple optimizations like `x / 2.` with `x * 0.5` and many other optimizations that rely on agressive re-ordering. Fast-math floating point math is an important aspect to consider and support for significant performance improvements on all modern architectures. ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [x] changes answers in the test suite to more than roundoff level - [x] are likely to significantly affect the results of downstream AMReX users - [x] include documentation in the code and/or rst files, if appropriate
1 parent 37cf3e9 commit d969d6e

File tree

7 files changed

+49
-2
lines changed

7 files changed

+49
-2
lines changed

Docs/sphinx_documentation/source/BuildingAMReX.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,8 @@ The list of available options is reported in the :ref:`table <tab:cmakevar>` bel
443443
+------------------------------+-------------------------------------------------+-------------------------+-----------------------+
444444
| AMReX_BUILD_SHARED_LIBS | Build as shared C++ library | NO (unless xSDK) | YES, NO |
445445
+------------------------------+-------------------------------------------------+-------------------------+-----------------------+
446+
| AMReX_FASTMATH | Enable fast-math optimizations | NO (CPU), YES (CUDA) | YES, NO |
447+
+------------------------------+-------------------------------------------------+-------------------------+-----------------------+
446448
| AMReX_FORTRAN | Enable Fortran language | NO | YES, NO |
447449
+------------------------------+-------------------------------------------------+-------------------------+-----------------------+
448450
| AMReX_PRECISION | Set the precision of reals | DOUBLE | DOUBLE, SINGLE |

Tools/CMake/AMReXCUDAOptions.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ if(DEFINED ENV{AMREX_CUDA_ARCH})
2626
endif()
2727
set(AMReX_CUDA_ARCH ${AMReX_CUDA_ARCH_DEFAULT} CACHE STRING "CUDA architecture (Use 'Auto' for automatic detection)")
2828

29-
option(AMReX_CUDA_FASTMATH "Enable CUDA fastmath" ON)
29+
option(AMReX_CUDA_FASTMATH "Enable CUDA fastmath" ON) # Note: inconsistent with AMReX_FASTMATH defaults
3030
cuda_print_option( AMReX_CUDA_FASTMATH )
3131

3232
set(AMReX_CUDA_MAXREGCOUNT "255" CACHE STRING

Tools/CMake/AMReXConfig.cmake.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ set(AMReX_HIP @AMReX_HIP@)
125125
set(AMReX_GPU_BACKEND @AMReX_GPU_BACKEND@)
126126
set(AMReX_GPU_RDC @AMReX_GPU_RDC@)
127127
set(AMReX_PRECISION @AMReX_PRECISION@)
128+
set(AMReX_FASTMATH @AMReX_FASTMATH@)
128129
set(AMReX_FORTRAN @AMReX_FORTRAN@)
129130

130131
# Actual components selection

Tools/CMake/AMReXOptions.cmake

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,18 @@ cmake_dependent_option( AMReX_GPU_RDC "Enable Relocatable Device Code" ${_GPU_RD
253253
unset(_GPU_RDC_default)
254254
print_option(AMReX_GPU_RDC)
255255

256+
#
257+
# Fast Math ================================================================
258+
#
259+
set(_FASTMATH_default OFF)
260+
if(AMReX_GPU_BACKEND STREQUAL CUDA) # note: historic settings
261+
# if(NOT AMReX_GPU_BACKEND STREQUAL NONE) # note: this would be more consistent for GPUs
262+
set(_FASTMATH_default ON)
263+
endif()
264+
option(AMReX_FASTMATH "Enable fast-math optimizations" ${_FASTMATH_default})
265+
print_option(AMReX_FASTMATH)
266+
unset(_FASTMATH_default)
267+
256268
#
257269
# Parallel backends ========================================================
258270
#

Tools/CMake/AMReXParallelBackends.cmake

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,11 @@ if (AMReX_SYCL)
205205
include(AMReXSYCL)
206206
foreach(D IN LISTS AMReX_SPACEDIM)
207207
target_link_libraries(amrex_${D}d PUBLIC SYCL)
208+
209+
# fast math
210+
if(AMReX_FASTMATH)
211+
target_compile_options(amrex_${D}d PUBLIC -ffast-math)
212+
endif()
208213
endforeach()
209214
endif ()
210215

@@ -350,6 +355,11 @@ if (AMReX_HIP)
350355
foreach(D IN LISTS AMReX_SPACEDIM)
351356
target_compile_options(amrex_${D}d PUBLIC $<$<COMPILE_LANGUAGE:CXX>:-m64>)
352357

358+
# fast math
359+
if(AMReX_FASTMATH)
360+
target_compile_options(amrex_${D}d PUBLIC -ffast-math)
361+
endif()
362+
353363
# ROCm 4.5: use unsafe floating point atomics, otherwise atomicAdd is much slower
354364
#
355365
target_compile_options(amrex_${D}d PUBLIC $<$<COMPILE_LANGUAGE:CXX>:-munsafe-fp-atomics>)

Tools/CMake/AMReX_Config.cmake

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,6 @@ function (configure_amrex AMREX_TARGET)
7272
)
7373

7474
unset(_condition)
75-
unset(_cxx_msvc)
7675

7776
#
7877
# Setup OpenMP
@@ -146,11 +145,33 @@ function (configure_amrex AMREX_TARGET)
146145
endif()
147146
endif()
148147

148+
# fast math
149+
if (AMReX_FASTMATH)
150+
# GPU specific backends set in AMReXParallelBackends.cmake
151+
if (AMReX_GPU_BACKEND STREQUAL NONE)
152+
# See https://cmake.org/cmake/help/v4.1/variable/CMAKE_LANG_COMPILER_ID.html#variable:CMAKE_%3CLANG%3E_COMPILER_ID
153+
target_compile_options(${AMREX_TARGET} PUBLIC
154+
$<$<CXX_COMPILER_ID:AppleClang,Clang,CrayClang,GNU,IBMClang,IntelLLVM,XLClang>:-ffast-math>
155+
$<${_cxx_msvc}:"/fp:fast"> # MSVC
156+
)
157+
if (CMAKE_Fortran_COMPILER_LOADED)
158+
target_compile_options(${AMREX_TARGET} PUBLIC
159+
$<$<Fortran_COMPILER_ID:AppleClang,Clang,CrayClang,GNU,IBMClang,IntelLLVM,XLClang>:-ffast-math>
160+
)
161+
endif ()
162+
endif()
163+
endif()
164+
149165
#
150166
# Setup third-party profilers
151167
#
152168
set_amrex_profilers(${AMREX_TARGET})
153169

170+
#
171+
# clean up helpers
172+
#
173+
unset(_cxx_msvc)
174+
154175
endfunction ()
155176

156177
#

Tools/CMake/AMReX_Config_ND.H.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
#cmakedefine AMREX_USE_FLATTEN_FOR
3636
#cmakedefine AMREX_BOUND_CHECK
3737
#cmakedefine AMREX_EXPORT_DYNAMIC
38+
#cmakedefine AMREX_FASTMATH
3839
#cmakedefine BL_FORT_USE_UNDERSCORE
3940
#cmakedefine BL_FORT_USE_LOWERCASE
4041
#cmakedefine BL_FORT_USE_UPPERCASE

0 commit comments

Comments
 (0)