Skip to content

Commit 46fd0b6

Browse files
committed
docs: migration guide from nfft3 (two new tutorial C codes)
1 parent 50e797a commit 46fd0b6

File tree

5 files changed

+190
-1
lines changed

5 files changed

+190
-1
lines changed

CHANGELOG

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
List of features / changes made / release notes, in reverse chronological order.
22
If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
33

4+
* New doc page: migration guide from NFFT3 (2d1 case only).
45
* New foldrescale, removes [-3pi,3pi) restriction on NU points, and slight
56
speedup at large tols. Deprecates both opts.chkbnds and error code
6-
FINUFFT_ERR_SPREAD_PTS_OUT_RANGE. PR #440 (Marco Barbone + Martin Reinecke)
7+
FINUFFT_ERR_SPREAD_PTS_OUT_RANGE. Also inlined kernel eval code, increases
8+
compile of spreadinterp.cpp to 10s. PR #440 (Marco Barbone + Martin Reinecke)
79
* CPU plan stage allows any # threads, warns if > omp_get_max_threads(); or
810
if single-threaded fixes nthr=1 and warns opts.nthreads>1 attempt.
911
Sort now respects spread_opts.sort_threads not nthreads. Supercedes PR 431.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Documentation contents
3535
python_gpu
3636
julia
3737
changelog
38+
nfft_migr
3839
cufinufft_migration
3940
devnotes
4041
related

docs/nfft_migr.rst

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
.. _nfft_migr:
2+
3+
Migration guide from NFFT3
4+
==========================
5+
6+
Here we outline how the C/C++ user can replace NUFFT calls to the popular library
7+
`Chemnitz NFFT3 library <https://www-user.tu-chemnitz.de/~potts/nfft/>`_ with
8+
FINUFFT CPU calls to achieve the same effect, possibly with more performance
9+
or less RAM usage.
10+
See [KKP] in the :ref:`references<refs>` for more about NFFT3, and [FIN] for
11+
some performance and RAM comparisons performed using the codes available in 2018.
12+
We use the `NFFT source on GitHub <https://github.com/NFFT/nfft>`_, version 3.5.4alpha.
13+
So far we only discuss:
14+
15+
* the adjoint NFFT (a.k.a. type 1)
16+
17+
Also of interest (but not yet demonstrated below) is:
18+
19+
* the forward NFFT transform (a.k.a. type 2)
20+
* the nonuniform to nonuniform NNFFT (a.k.a. type 3)
21+
22+
.. note:: The NFFT3 library can do more things---real-valued data, sphere, rotation group, hyperbolic cross, inverse transforms---none of which FINUFFT can yet do directly (although our three transforms can be used as components in such tasks). We do not address those here.
23+
24+
Migrating a 2D adjoint transform (type 1) in C from NFFT3 to FINUFFT
25+
--------------------------------------------------------------------
26+
27+
We need to start with the simplest example of using NFFT3 on "user data" generated
28+
using plain, transparent, C commands (rather than relying on NFFT3-supplied
29+
data-generation, direct transform, and printing utilities as in the
30+
NFFT example :file:`examples/nfft/simple_test.c`, or even its simplest
31+
version
32+
at https://www-user.tu-chemnitz.de/~potts/nfft/download/nfft_simple_test.tar.gz ).
33+
We choose 2D since it is the simplest rectangular
34+
case that illustrates how to get the transposed
35+
coordinates (or mode array) ordering correct.
36+
After installing NFFT3 one should be able to compile and run the following:
37+
38+
.. literalinclude:: ../tutorial/nfft2d1_test.c
39+
:language: c
40+
41+
This is a basic example, running single-threaded, at the highest precision
42+
(using ``nfft_init_guru`` would allow more control.)
43+
It demonstrates: i) the NFFT3 user must write their data into arrays allocated
44+
by ``nfft_plan``,
45+
ii) the single nonuniform point coordinate array
46+
is interleaved ($x_1, y_1, x_2, y_2, \dots, x_M, y_M$),
47+
iii) the output mode ordering is C (row-major) rather than Fortran (column-major;
48+
this affects how to convert frequency indices into the output array index),
49+
and iv) there is an extra factor of $2\pi$ in the exponent relative
50+
to the FINUFFT definition, because NFFT3 assumes a 1-periodic input domain.
51+
The code is found in our :file:`tutorial/nfft2d1_test.c`. Running the executable gives:
52+
53+
::
54+
55+
2D type 1 (NFFT3) done in 0.589 s: f_hat[-17,33]=86.0632804289+-350.023846367i, rel err 9.93e-14
56+
57+
To show how to migrate this, we write a self-contained code that generates exactly
58+
the same "user data" (same random seed), then uses FINUFFT to do the transform
59+
to achieve exactly the same ``f_hat`` output array (in row-major C ordering).
60+
This entails scaling and swapping the nonequispaced coordinates just before sending
61+
to FINUFFT. Here is the corresponding C code (compare to the above):
62+
63+
.. literalinclude:: ../tutorial/migrate2d1_test.c
64+
:language: c
65+
66+
The fact that NFFT3 uses row-major mode arrays whereas FINUFFT uses column-major has
67+
been handled here by swapping the input $x$ and $y$ coordinates and array sizes in the
68+
FINUFFT call. (Equivalently, this could have been achieved by transposing the ``f_hat``
69+
output array. We recommend the former route since it saves memory.) Running the
70+
executable gives:
71+
72+
::
73+
74+
2D type 1 (FINUFFT) in 0.0787 s: f_hat[-17,33]=86.0632804289+-350.023846367i, rel err 9.58e-14
75+
76+
Comparing to the above, we see the same answer to all shown digits, a similar error for this tested output entry, plus a 7.5$\times$ speed-up. (Both use a single thread, tested on the same AMD 5700U laptop.) The user may of course now set a coarser (larger) value for ``tol`` and see a further speed-up.
77+
78+
We believe that the above gives the essentials of how to convert your code from using NFFT3 to FINUFFT. Please read our documentation, especially the guru interface if multiple related transforms are required, then post a GitHub Issue if you are still stuck.

tutorial/migrate2d1_test.c

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
/* Minimal example of 2D "adjoint" (aka type 1) NUFFT using FINUFFT
2+
to match NFFT3's conventions, and the same user-given data as
3+
for nfft2d1_test.c. Single-threaded, timer. Barnett 5/18/24
4+
To compile (assuming FINUFFT include and lib in path):
5+
gcc migrate2d1_test.c -o migrate2d1_test -lfinufft -lfftw3 -lm
6+
*/
7+
#include <stdlib.h>
8+
#include <stdio.h>
9+
#include <math.h>
10+
#include <complex.h>
11+
#include <time.h>
12+
#include <finufft.h>
13+
14+
int main(void) {
15+
int N[2] = {300,200}; // N0, N1 output shape in nfft3 sense
16+
int M = 500000; // num. nonuniform input points
17+
double tol = 1e-13; // user must choose (unlike nfft3's simple call)
18+
19+
// user allocates all external arrays (and no internal ones)
20+
double* x = (double *)malloc(sizeof(double)*M); // x (0th) coords only here
21+
double* y = (double *)malloc(sizeof(double)*M); // y (1st) coords need separate ptr
22+
double complex* f = (double complex*)malloc(sizeof(double complex)*M);
23+
double complex* f_hat = (double complex*)malloc(sizeof(double complex)*N[0]*N[1]); // output
24+
25+
// start with exactly the same "user data" as in nfft2d1_test.c...
26+
srand(0); // fix seed
27+
for (int j=0; j<M; ++j) { // nonequispaced pts, and strengths f...
28+
x[j] = (double)rand()/RAND_MAX - 0.5; // x unif rand in [-1/2,1/2)
29+
y[j] = (double)rand()/RAND_MAX - 0.5; // y "
30+
f[j] = 2*((double)rand()/RAND_MAX)-1 + I*(2*((double)rand()/RAND_MAX)-1);
31+
}
32+
33+
clock_t before = clock();
34+
35+
// do transform, includes precompute, writing to f_hat...
36+
for (int j=0; j<M; ++j) { // change user coords so finufft same as nfft3
37+
x[j] *= 2*M_PI; y[j] *= 2*M_PI; // scales from 1-periodic to 2pi-periodic
38+
}
39+
finufft_opts opts; // opts struct
40+
finufft_default_opts(&opts); // set default opts (must start with this)
41+
opts.nthreads = 1; // enforce single-thread
42+
int ier = finufft2d1(M,y,x,f,+1,tol,N[1],N[0],f_hat,&opts); // both x,y and N0,N1 swapped!
43+
44+
double secs = (clock()-before)/(double)CLOCKS_PER_SEC;
45+
46+
// now test that f_hat is as it would have been if original data were sent to nfft3...
47+
int kx=-17, ky=33; // check one output f_hat(kx,ky) vs direct computation
48+
int kxout = kx + N[0]/2;
49+
int kyout = ky + N[1]/2;
50+
int i = kyout + kxout*N[1]; // the output index (nfft3 convention, not FINUFFT's)
51+
double complex f_hat_test = 0.0 + 0.0*I;
52+
for (int j=0; j<M; ++j) // since x,y were mult by 2pi, no such factor here...
53+
f_hat_test += f[j] * cexp(I*((double)kx*x[j]+(double)ky*y[j]));
54+
double err = cabs(f_hat[i] - f_hat_test) / cabs(f_hat_test);
55+
printf("2D type 1 (FINUFFT) in %.3g s: f_hat[%d,%d]=%.12g+%.12gi, rel err %.3g\n",
56+
secs,kx,ky,creal(f_hat[i]),cimag(f_hat[i]),err);
57+
58+
free(x); free(y); free(f); free(f_hat); // user deallocates own I/O arrays
59+
return ier;
60+
}

tutorial/nfft2d1_test.c

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
/* Minimal example of 2D "adjoint" (aka type 1) NUFFT using NFFT3 library,
2+
with user-given data, single-threaded, plus timer. Barnett 5/17/24
3+
To compile (assuming nfft3 installed):
4+
gcc nfft2d1_test.c -o nfft2d1_test -lnfft3 -lfftw3 -lm
5+
*/
6+
#include <stdlib.h>
7+
#include <stdio.h>
8+
#include <math.h>
9+
#include <complex.h>
10+
#include <time.h>
11+
#include "nfft3.h"
12+
13+
int main(void) {
14+
int N[2] = {300,200}; // N1, N2 output mode numbers
15+
int M = 500000; // num. nonuniform input points
16+
int dim=2;
17+
nfft_plan p;
18+
nfft_init(&p, dim, N, M); // allocates user I/O arrays too
19+
// make some "user data" (we must use arrays that nfft allocated)...
20+
srand(0); // fix seed
21+
for (int j=0; j<M; ++j) { // nonequispaced pts, and strengths f...
22+
p.x[2*j] = (double)rand()/RAND_MAX - 0.5; // x unif rand in [-1/2,1/2)
23+
p.x[2*j+1] = (double)rand()/RAND_MAX - 0.5; // y "
24+
p.f[j] = 2*((double)rand()/RAND_MAX)-1 + I*(2*((double)rand()/RAND_MAX)-1);
25+
}
26+
27+
clock_t before = clock();
28+
29+
if(p.flags & PRE_ONE_PSI) // precompute psi, the entries of the matrix B
30+
nfft_precompute_one_psi(&p);
31+
nfft_adjoint(&p); // do transform, write to p.f_hat
32+
33+
double secs = (clock()-before)/(double)CLOCKS_PER_SEC;
34+
35+
int kx=-17, ky=33; // check one output f_hat(kx,ky) vs direct computation
36+
int kxout = kx + N[0]/2;
37+
int kyout = ky + N[1]/2;
38+
int i = kyout + kxout*N[1]; // output index: array ordered x slow, y fast
39+
double complex f_hat_test = 0.0 + 0.0*I;
40+
for (int j=0; j<M; ++j) // 2pi fac; p.x array is x interleaved with y...
41+
f_hat_test += p.f[j] * cexp(2*M_PI*I*((double)kx*p.x[2*j]+(double)ky*p.x[2*j+1]));
42+
double err = cabs(p.f_hat[i] - f_hat_test) / cabs(f_hat_test);
43+
printf("2D type 1 (NFFT3) done in %.3g s: f_hat[%d,%d]=%.12g+%.12gi, rel err %.3g\n",
44+
secs,kx,ky,creal(p.f_hat[i]),cimag(p.f_hat[i]),err);
45+
46+
nfft_finalize(&p); // free the plan
47+
return 0;
48+
}

0 commit comments

Comments
 (0)