Skip to content

Commit 353c971

Browse files
authored
Merge pull request #511 from flatironinstitute/makeducc
add FFT=DUCC option to makefile (and associated install docs, which were also cleaned up)
2 parents b3c2be7 + f41ae29 commit 353c971

File tree

7 files changed

+320
-246
lines changed

7 files changed

+320
-246
lines changed

CHANGELOG

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,25 @@
11
List of features / changes made / release notes, in reverse chronological order.
22
If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
33

4-
V 2.3.0beta (7/24/24)
4+
V 2.3.0-rc1 (8/6/24)
55

6-
* python build modernized to pyproject.toml (both CPU and GPU).
7-
PRs 507 (Anden, Lu, Barbone)
8-
* switchable FFT: either FFTW or DUCC0 (latter need no plan stage; also it is
6+
* Switched C++ standards from C++14 to C++17, allowing various templating
7+
improvements (Barbone).
8+
* Python build modernized to pyproject.toml (for both CPU and GPU).
9+
PR 507 (Anden, Lu, Barbone). Compiles from source for the local build.
10+
* Switchable FFT: either FFTW or DUCC0 (latter needs no plan stage; also it is
911
used to exploit sparsity pattern to achieve FFT speedups 1-3x in 2D and 3D).
10-
PR463, Martin Reinecke.
12+
PR463, Martin Reinecke. Both CMake and makefile includes this DUCC0 option
13+
(makefile PR511 by Barnett; CMake by Barbone).
1114
* ES kernel rescaled to max value 1, reduced poly degrees for upsampfac=1.25,
1215
cleaner Horner coefficient generation PR499 (fixes fp32 overflow issue #454).
1316
* Major manual acceleration of spread/interp kernels via XSIMD header-only lib,
1417
kernel evaluation, templating by ns with AVX-width-dependent decisions.
1518
Up to 80% faster, dep on compiler. (Marco Barbone with help from Libin Lu).
16-
PRs 459, 471, 502.
17-
NOTE: introduces new dependency (XSIMD), added to cMake and makefile.
19+
A large chunk of work: PRs 459, 471, 502.
20+
NOTE: introduces new dependency (XSIMD), added to CMake and makefile.
1821
* Exploiting even/odd symmetry for 10% faster xsimd-accel kernel poly eval
19-
Libin Lu based on idea of Martin Reinecke (PR477,492,493).
22+
(Libin Lu based on idea of Martin Reinecke; PR477,492,493).
2023
* new test/finufft3dkernel_test checks kerevalmeth=0 and 1 agree to tolerance
2124
PR 473 (M Barbone).
2225
* new perftest/compare_spreads.jl compares two spreadinterp libs (A Barnett).
@@ -47,24 +50,24 @@ V 2.3.0beta (7/24/24)
4750
any 32-bit integers to 64-bit when calling cufinufft(f)_setpts. Note that
4851
internally, 32-bit integers are still used, so calling cufinufft with more
4952
than 2e9 points will fail. This restriction may be lifted in the future.
50-
* cmake build system revamped completely, more modern practices.
51-
It auto selects compiler flags based on the supported ones on all operating systems.
52-
Added support for Windows (llvm, msvc), Linux (llvm, gcc) and MacOS (llvm, gcc).
53-
* cmake support for both ducc0 and fftw
54-
* cmake adding nvcc and msvc optimization flags
55-
* cmake supports sphinx
56-
* updated install docs
57-
* cuFINUFFT binsize is now a function of the shared memory available where
58-
possible.
59-
* cuFINUFFT GM 1D sorts using thrust::sort instead of bin-sort.
60-
* cuFINUFFT using the new normalized Horner coefficients and added support
61-
for 1.25.
62-
* cuFINUFFT new compile flags for extra-vectorization, flushing single
63-
precision denormals to 0 and using fma where possible.
64-
* cuFINUFFT using intrinsics in foldrescale and other places to increase
65-
performance
66-
* cuFINUFFT using SM90 float2 vector atomicAdd where supported
67-
* cuFINUFFT making default binsize = 0
53+
* CMake build system revamped completely, using more modern practices (Barbone).
54+
It now auto-selects compiler flags based on those supported on all OSes, and
55+
has support for Windows (llvm, msvc), Linux (llvm, gcc) and MacOS (llvm, gcc).
56+
* CMake added nvcc and msvc optimization flags.
57+
* sphinx local doc build also using CMake. (Barbone)
58+
* updated install docs, including for DUCC0 FFT and new python build.
59+
* updated install docs (Barnett)
60+
* Major acceleration effort for the GPU library cufinufft (M Barbone, PR488):
61+
- binsize is now a function of the shared memory available where possible.
62+
- GM 1D sorts using thrust::sort instead of bin-sort.
63+
- uses the new normalized Horner coefficients and added support for
64+
upsampfac=1.25 on GPU, for first time.
65+
- new compile flags for extra-vectorization, flushing single
66+
precision denormals to 0 and using fma where possible.
67+
- using intrinsics (eg FMA) in foldrescale and other places to increase
68+
performance
69+
- using SM90 float2 vector atomicAdd where supported
70+
- make default binsize = 0
6871

6972
V 2.2.0 (12/12/23)
7073

cmake/setupCPM.cmake

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,21 @@
11
# USING CPM TO HANDLE DEPENDENCIES
22
if(CPM_SOURCE_CACHE)
3-
set(CPM_DOWNLOAD_LOCATION "${CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake")
3+
set(CPM_DOWNLOAD_LOCATION
4+
"${CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake")
45
elseif(DEFINED ENV{CPM_SOURCE_CACHE})
5-
set(CPM_DOWNLOAD_LOCATION "$ENV{CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake")
6+
set(CPM_DOWNLOAD_LOCATION
7+
"$ENV{CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake")
68
else()
7-
set(CPM_DOWNLOAD_LOCATION "${CMAKE_BINARY_DIR}/cmake/CPM_${CPM_DOWNLOAD_VERSION}.cmake")
9+
set(CPM_DOWNLOAD_LOCATION
10+
"${CMAKE_BINARY_DIR}/cmake/CPM_${CPM_DOWNLOAD_VERSION}.cmake")
811
endif()
912

1013
if(NOT (EXISTS ${CPM_DOWNLOAD_LOCATION}))
11-
message(STATUS "Downloading CPM.cmake to ${CPM_DOWNLOAD_LOCATION}")
12-
file(DOWNLOAD
13-
https://github.com/cpm-cmake/CPM.cmake/releases/download/v${CPM_DOWNLOAD_VERSION}/CPM.cmake
14-
${CPM_DOWNLOAD_LOCATION}
15-
)
14+
message(STATUS "Downloading CPM.cmake to ${CPM_DOWNLOAD_LOCATION}")
15+
file(
16+
DOWNLOAD
17+
https://github.com/cpm-cmake/CPM.cmake/releases/download/v${CPM_DOWNLOAD_VERSION}/CPM.cmake
18+
${CPM_DOWNLOAD_LOCATION})
1619
endif()
1720

1821
include(${CPM_DOWNLOAD_LOCATION})

cmake/setupXSIMD.cmake

Lines changed: 27 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,28 @@
1-
CPMAddPackage(
2-
NAME xtl
3-
GIT_REPOSITORY "https://github.com/xtensor-stack/xtl.git"
4-
GIT_TAG ${XTL_VERSION}
5-
EXCLUDE_FROM_ALL YES
6-
GIT_SHALLOW YES
7-
OPTIONS "XTL_DISABLE_EXCEPTIONS YES"
8-
)
9-
10-
CPMAddPackage(
11-
NAME xsimd
12-
GIT_REPOSITORY "https://github.com/xtensor-stack/xsimd.git"
13-
GIT_TAG ${XSIMD_VERSION}
14-
EXCLUDE_FROM_ALL YES
15-
GIT_SHALLOW YES
16-
OPTIONS
17-
"XSIMD_SKIP_INSTALL YES"
18-
"XSIMD_ENABLE_XTL_COMPLEX YES"
19-
)
1+
cpmaddpackage(
2+
NAME
3+
xtl
4+
GIT_REPOSITORY
5+
"https://github.com/xtensor-stack/xtl.git"
6+
GIT_TAG
7+
${XTL_VERSION}
8+
EXCLUDE_FROM_ALL
9+
YES
10+
GIT_SHALLOW
11+
YES
12+
OPTIONS
13+
"XTL_DISABLE_EXCEPTIONS YES")
2014

15+
cpmaddpackage(
16+
NAME
17+
xsimd
18+
GIT_REPOSITORY
19+
"https://github.com/xtensor-stack/xsimd.git"
20+
GIT_TAG
21+
${XSIMD_VERSION}
22+
EXCLUDE_FROM_ALL
23+
YES
24+
GIT_SHALLOW
25+
YES
26+
OPTIONS
27+
"XSIMD_SKIP_INSTALL YES"
28+
"XSIMD_ENABLE_XTL_COMPLEX YES")

docs/devnotes.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ Developer notes
2727

2828
* The kernel function in spreadinterp is evaluated via piecewise-polynomial approximation (Horner's rule). The code for this is auto-generated in MATLAB, for all upsampling factors. There are two versions supported:
2929

30-
- 2018--2024 vintage: no explicit SIMD vectorization, C code is generated code for the Horner evaluation loop, by running from MATLAB `gen_all_horner_C_code.m`
30+
- 2018--2024 vintage: no explicit SIMD vectorization, C code is generated code for the Horner evaluation loop, by running from MATLAB ``gen_all_horner_C_code.m``
3131

32-
- post-2024 vintage: explicit SIMD and many other acceleration tricks, and the generated code is a static C++ array of coefficients, and their sizes (`nc` or number of coefficients) for each width `w`. Run from MATLAB `gen_ker_horner_loop_cpp_code.m`
32+
- post-2024 vintage: explicit SIMD and many other acceleration tricks, and the generated code is a static C++ array of coefficients, and their sizes (``nc`` or number of coefficients) for each width ``w``. Run from MATLAB ``gen_ker_horner_loop_cpp_code.m``
3333

34-
See `devel/README` for more details. The ES kernel coefficient and poly approx degree for both of the above are defined in a single location, `devel/get_degree_and_beta.m`, which must match the C++ `setup_spreader()` function.
34+
See ``devel/README`` for more details. The ES kernel coefficient and poly approx degree for both of the above are defined in a single location, ``devel/get_degree_and_beta.m``, which must match the C++ ``setup_spreader()`` function.
3535

3636
* Continuous Integration (CI). See files for this in ``.github/workflows/``. It currently tests the default ``makefile`` settings in linux, and three other ``make.inc.*`` files covering OSX and Windows (MinGW). CI does not test build the variant OMP=OFF. The dev should test these locally. Likewise, the Julia wrapper is separate and thus not tested in CI. We have added ``JenkinsFile`` for the GPU CI via python wrappers.
3737

@@ -49,7 +49,9 @@ Developer notes
4949

5050
* The cufinufft Python wheels are generated using Docker based on the manylinux2014 image. For instructions, see ``tools/cufinufft/distribution_helper.sh``. These are binary wheels that are built using CUDA 11 (or optionally CUDA 12, but these are not distributed on PyPI) and bundled with the necessary libraries.
5151

52-
* Testing cufinufft (for FI, mostly)
52+
* CMake compiling on linux at Flatiron Institute (Rusty cluster): We have had a report that if you want to use LLVM, you need to ``module load llvm/16.0.3`` otherwise the default ``llvm/14.0.6`` does not find ``OpenMP_CXX``.
53+
54+
* Testing cufinufft (for FI, mostly):
5355

5456
.. code-block:: sh
5557

0 commit comments

Comments
 (0)