Skip to content

Commit d4d08a4

Browse files
mreineckDiamonDinoiaahbarnettjanden
authored
Add the capability to do adjoint transforms (#633)
* first version; adjoint still completely untested * add Python interface; fix a few bugs * add tests; add overlooked file * use assertions that are more helpful in cse of errors * test commit to see if tests pass with incresed tolerance for adjoint type 3 * increase overall requested plan accuracy for adjoint type 3 transforms. Perhaps 1e-6 is just a bit too close to the machine epsilon for single precision * be more clever about memory consumption * document new parameters for execute_internal() * update CHANGELOG * more comments * small tweak * fix typos * add explanations for obscure tricks * added an example * create FFTW plans for adjoint transforms * better variable names and debug prints * add docstring for execute_adjoint * comments * corrected doc comments in example/guru2d1_adjoint.cpp * add execute_adjoint to C/C++ guru doc strings * execute_adjoint fully described in docs/c.rst * execute_adjoint added to docs/cex.rst * matlab mwrap interface add execute_adjoint * actually add matlab execute_adjoint * exec_adj into matlab docs and overview.src * exec_adj matlab docs and example/guru1d1_adjoint.m * exec_adj to matlab.rst and Contents.m * mention additional benefits of the change * add Fortran adjoint interface and example * add adjoint to Fortran docs * execute_adjoint in Fortran, with example. No docs yet * add Martin's guru1d2_adjoint.f to makefile * negate iflags in Martin's guru1d2_adjoint{f}.f * both fort adj examples in doc page * simplified guru1d2_adjoint{f}.f * make octave runs adjoint example * basic C++ adjointness for t1,2; in makefile and make test (not CTest yet). t3 to do * Update CMakeLists.txt * adjointness t3 test added, reduced prob sizes a bit; added docs trouble paragraph on adjointness * Simplify py tests for adjoint * bump up adjointness double allowederr to 1e-12 due to macos-13 fail --------- Co-authored-by: Marco Barbone <[email protected]> Co-authored-by: ahbarnett <[email protected]> Co-authored-by: Marco Barbone <[email protected]> Co-authored-by: Joakim Andén <[email protected]>
1 parent dd265a1 commit d4d08a4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2009
-295
lines changed

CHANGELOG

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,12 @@ If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
33

44
Master (working towards v2.5.0), 7/8/25
55

6+
* Added functionality for adjoint execution of FINUFFT plans (Reinecke #633,
7+
addresses #566 and #571).
8+
Work arrays are now only allocated during plan execution, reducing overall
9+
memory consumption.
10+
A single plan can now safely be executed by several threads concurrently.
11+
612
V 2.4.1 7/8/25
713

814
* Update Python cufinufft unit tests to use complex dtypes (Andén, #705).

docs/c.rst

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ with the word "many" in the function name) perform ``ntr`` transforms with the s
5353

5454
.. note::
5555

56-
The motivations for the vectorized interface (and guru interface, see below) are as follows. 1) It is more efficient to bin-sort the nonuniform points only once if there are not to change between transforms. 2) For small problems, certain start-up costs cause repeated calls to the simple interface to be slower than necessary. In particular, we note that FFTW takes around 0.1 ms per thread to look up stored wisdom, which for small problems (of order 10000 or less input and output data) can, sadly, dominate the runtime.
56+
The motivations for the vectorized interface (and guru interface, see below) include the following. 1) It is more efficient to bin-sort the nonuniform points only once if there are not to change between transforms. 2) For small problems, certain start-up costs cause repeated calls to the simple interface to be slower than necessary. In particular, we note that FFTW takes around 0.1 ms per thread to look up stored wisdom, which for small problems (of order 10000 or less input and output data) can, sadly, dominate the runtime.
5757

5858

5959
1D transforms
@@ -77,13 +77,19 @@ with the word "many" in the function name) perform ``ntr`` transforms with the s
7777
Guru plan interface
7878
-------------------
7979

80-
This provides more flexibility than the simple or vectorized interfaces.
80+
This provides more flexibility than either simple or vectorized interfaces.
8181
Any transform requires (at least)
82-
calling the following four functions in order. However, within this
83-
sequence one may insert repeated ``execute`` calls, or another ``setpts``
84-
followed by more ``execute`` calls, as long as the transform sizes (and number of transforms ``ntr``) are
82+
calling four of the following five functions in order. However, within this
83+
sequence one may insert repeated ``execute`` and/or ``execute_adjoint`` calls,
84+
or another ``setpts``
85+
followed by more ``execute`` and/or ``execute_adjoint`` calls, as long as the transform sizes (and number of transforms ``ntr``) are
8586
consistent with those that have been set in the ``plan`` and in ``setpts``.
86-
Keep in mind that ``setpts`` retains *pointers* to the user's list of nonuniform points, rather than copying these points; thus the user must not change their nonuniform point arrays until after any ``execute`` calls that use them.
87+
Keep in mind that ``setpts`` retains *pointers* to the user's list of nonuniform points, rather than copying these points; thus the user must not change their nonuniform point arrays until after any ``execute`` or ``execute_adjoint`` calls that use them.
88+
89+
The goal of the ``execute_adjoint`` feature (fully supported in v2.5.0)
90+
is to allow the
91+
common use-case of transform and adjoint transform pairs to be accessible
92+
via a single plan stage and a single setpts call.
8793

8894
.. note::
8995

docs/cex.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,8 @@ previous wisdom which would be significant when doing many small transforms.
236236
You may also send in a new
237237
set of stacked strength data (for type 1 and 3, or coefficients for type 2),
238238
reusing the existing FFTW plan and sorted points.
239+
Finally, you may execute *adjoints* of the planned transforms without
240+
re-planning, making forward-adjoint transform pairs very convenient.
239241
Now we redo the above 2D type 1 C++ example with the guru interface.
240242

241243
One first makes a plan giving transform parameters, but no data:
@@ -254,6 +256,7 @@ One first makes a plan giving transform parameters, but no data:
254256
// step 3: do the planned transform to the c strength data, output to F...
255257
finufft_execute(plan, &c[0], &F[0]);
256258
// ... you could now send in new points, and/or do transforms with new c data
259+
// ... or even adjoint transforms with the same points but now mapping F to c.
257260
// ...
258261
// step 4: when done, free the memory used by the plan...
259262
finufft_destroy(plan);
@@ -264,14 +267,15 @@ is that the ``int64_t`` type (aka ``long long int``)
264267
is needed since the Fourier coefficient dimensions are passed as an array.
265268

266269
.. warning::
267-
You must not change the nonuniform point arrays (here ``x``, ``y``) between passing them to ``finufft_setpts`` and performing ``finufft_execute``. The latter call expects these arrays to be unchanged. We chose this style of interface since it saves RAM and time (by avoiding unnecessary duplication), allowing the largest possible problems to be solved.
270+
You must not change the nonuniform point arrays (here ``x``, ``y``) between passing them to ``finufft_setpts`` and performing ``finufft_execute`` or ``finufft_execute_adjoint``. The last two calls expect these arrays to be unchanged. We chose this style of interface since it saves RAM and time (by avoiding unnecessary duplication), allowing the largest possible problems to be solved.
268271

269272
.. warning::
270273
You must destroy a plan before making a new plan using the same
271274
plan object, otherwise a memory leak results.
272275

273-
The complete code with a math test is in ``examples/guru2d1.cpp``, and for
274-
more examples see ``examples/guru1d1*.c*``
276+
The complete code with a math test is in ``examples/guru2d1.cpp``,
277+
the demo of an adjoint execution is in ``examples/guru2d1_adjoint.cpp``,
278+
and for more examples see ``examples/guru1d1*.c*``
275279

276280
Using the guru interface to perform a vectorized transform (multiple 1D type 1
277281
transforms each with the same nonuniform points) is demonstrated in

docs/cguru.doc

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,10 @@
77

88
Make a plan to perform one or more general transforms.
99

10-
Under the hood, for type 1 and 2, this does FFTW planning and kernel Fourier
11-
transform precomputation. For type 3, this does very little, since the FFT
12-
sizes are not yet known.
10+
Under the hood, for type 1 and 2, this chooses spread/interp kernel
11+
parameters, precomputes the kernel Fourier transform, and (for FFTW), plans
12+
a pair of FFTs. For type 3, only the kernel parameters are chosen, since
13+
the FFT sizes are not yet known.
1314

1415
Inputs:
1516
type type of transform (1,2, or 3)
@@ -128,6 +129,49 @@
128129
if ntr>1, being the "slowest" (outer) dimension.
129130

130131

132+
::
133+
134+
int finufft_execute_adjoint(finufft_plan plan, complex<double>* c, complex<double>* f)
135+
int finufftf_execute_adjoint(finufftf_plan plan, complex<float>* c, complex<float>* f)
136+
137+
Perform one or more NUFFT transforms using previously entered nonuniform
138+
points and the *adjoint* of the existing planned transform. The point is to
139+
enable transforms and their adjoints to be accessible via a single plan.
140+
Recall that the adjoint of a type 1 is a type 2 of opposite isign, and
141+
vice versa. The adjoint of a type 3 is a type 3 of opposite isign and
142+
flipped input and output. To summarize, this operation maps
143+
adjoint of type 1: f -> c
144+
adjoint of type 2: c -> f
145+
adjoint of type 3: f -> c
146+
147+
Inputs:
148+
plan plan object
149+
150+
Input/Outputs:
151+
c If adjoints of types 1 and 3, the output values at the
152+
nonuniform point sources (size M*ntr complex array).
153+
If adjoint of type 2, the input strengths at the nonuniform
154+
point targets (size M*ntr complex array).
155+
f If adjoint of type 1, the input Fourier mode coefficients (size
156+
N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
157+
dim = 1, 2, or 3 respectively).
158+
If adjoint of type 2, the output Fourier mode coefficients (size
159+
N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
160+
dim = 1, 2, or 3 respectively).
161+
If adjoint of type 3, the input values at the nonuniform
162+
frequency sources (size N*ntr complex array).
163+
164+
Outputs:
165+
return value 0: success, 1: success but warning, >1: error (see error.rst)
166+
167+
Notes:
168+
* The contents of the arrays x, y, z, s, t, u must not have changed since
169+
the finufft_setpts call that read them. The adjoint execution rereads them
170+
(this way of doing business saves RAM).
171+
* f and c are contiguous Fortran-style arrays with the transform number,
172+
if ntr>1, being the "slowest" (outer) dimension.
173+
174+
131175
::
132176

133177
int finufft_destroy(finufft_plan plan)

docs/cguru.docsrc

Lines changed: 44 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@ int @G_makeplan(int type, int dim, int64_t* nmodes, int iflag, int ntr, double e
22

33
Make a plan to perform one or more general transforms.
44

5-
Under the hood, for type 1 and 2, this does FFTW planning and kernel Fourier
6-
transform precomputation. For type 3, this does very little, since the FFT
7-
sizes are not yet known.
5+
Under the hood, for type 1 and 2, this chooses spread/interp kernel
6+
parameters, precomputes the kernel Fourier transform, and (for FFTW), plans
7+
a pair of FFTs. For type 3, only the kernel parameters are chosen, since
8+
the FFT sizes are not yet known.
89

910
Inputs:
1011
type type of transform (1,2, or 3)
@@ -114,6 +115,46 @@ int @G_execute(finufft_plan plan, complex<double>* c, complex<double>* f)
114115
if ntr>1, being the "slowest" (outer) dimension.
115116

116117

118+
int @G_execute_adjoint(finufft_plan plan, complex<double>* c, complex<double>* f)
119+
120+
Perform one or more NUFFT transforms using previously entered nonuniform
121+
points and the *adjoint* of the existing planned transform. The point is to
122+
enable transforms and their adjoints to be accessible via a single plan.
123+
Recall that the adjoint of a type 1 is a type 2 of opposite isign, and
124+
vice versa. The adjoint of a type 3 is a type 3 of opposite isign and
125+
flipped input and output. To summarize, this operation maps
126+
adjoint of type 1: f -> c
127+
adjoint of type 2: c -> f
128+
adjoint of type 3: f -> c
129+
130+
Inputs:
131+
plan plan object
132+
133+
Input/Outputs:
134+
c If adjoints of types 1 and 3, the output values at the
135+
nonuniform point sources (size M*ntr complex array).
136+
If adjoint of type 2, the input strengths at the nonuniform
137+
point targets (size M*ntr complex array).
138+
f If adjoint of type 1, the input Fourier mode coefficients (size
139+
N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
140+
dim = 1, 2, or 3 respectively).
141+
If adjoint of type 2, the output Fourier mode coefficients (size
142+
N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
143+
dim = 1, 2, or 3 respectively).
144+
If adjoint of type 3, the input values at the nonuniform
145+
frequency sources (size N*ntr complex array).
146+
147+
Outputs:
148+
@r
149+
150+
Notes:
151+
* The contents of the arrays x, y, z, s, t, u must not have changed since
152+
the finufft_setpts call that read them. The adjoint execution rereads them
153+
(this way of doing business saves RAM).
154+
* f and c are contiguous Fortran-style arrays with the transform number,
155+
if ntr>1, being the "slowest" (outer) dimension.
156+
157+
117158
int @G_destroy(finufft_plan plan)
118159

119160
Deallocate a plan object. This must be used upon clean-up, or before reusing

docs/fortran.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ These routines and arguments are, in double-precision:
161161
call finufft_makeplan(type,dim,n_modes,iflag,ntrans,tol,plan,opts,ier)
162162
call finufft_setpts(plan,M,xj,yj,zj,Nk,sk,yk,uk,ier)
163163
call finufft_execute(plan,cj,fk,ier)
164+
call finufft_execute_adjoint(plan,cj,fk,ier)
164165
call finufft_destroy(plan,ier)
165166
166167
The single-precision (ie, ``real*4`` and ``complex*8``)
@@ -178,6 +179,8 @@ Each has a math test to check the correctness of some or all outputs::
178179

179180
simple1d1.f - 1D type 1, simple interface, default and various opts
180181
guru1d1.f - 1D type 1, guru interface, default and various opts
182+
guru1d1_adjoint.f - adjoint of 1D type 1, guru interface, default opts
183+
guru1d2_adjoint.f - adjoint of 1D type 2, guru interface, default and various opts
181184
nufft1d_demo.f - 1D types 1,2,3, minimally changed from CMCL demo codes
182185
nufft2d_demo.f - 2D "
183186
nufft3d_demo.f - 3D "

docs/makefile.doc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
make[1]: Entering directory '/home/marco/repos/finufft'
1+
make[1]: Entering directory '/home/alex/numerics/finufft'
22
Makefile for FINUFFT CPU library. Please specify your task:
33
make lib - build the main library (in lib/ and lib-static/)
44
make examples - compile and run all codes in examples/
@@ -23,4 +23,4 @@ Make options:
2323
You must at least 'make objclean' before changing such options!
2424

2525
Also see docs/install.rst and docs/README
26-
make[1]: Leaving directory '/home/marco/repos/finufft'
26+
make[1]: Leaving directory '/home/alex/numerics/finufft'

docs/matlab.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,12 +54,14 @@ interface. For smaller transform sizes the acceleration factor of this vectorize
5454

5555
If you want yet more control, consider using the "guru" interface.
5656
This can be faster than fresh calls to the simple or vectorized interfaces
57-
for the same number of transforms, for reasons such as this:
57+
for the same number of transforms, since
5858
the nonuniform points can be changed between transforms, without forcing
5959
FFTW to look up a previously stored plan.
6060
Usually, such an acceleration is only important when doing
6161
repeated small transforms, where "small" means each transform takes of
6262
order 0.01 sec or less.
63+
The guru interface is also very convenient for applying forward-adjoint
64+
transform pairs, common in imaging or optimization applications.
6365
Here we use the guru interface to repeat the first demo above:
6466

6567
.. code-block:: matlab
@@ -74,12 +76,12 @@ Here we use the guru interface to repeat the first demo above:
7476
c = randn(M,1)+1i*randn(M,1); % iid random complex data (row or col vec)
7577
f = plan.execute(c); % do the transform (0.008 sec, ie, faster)
7678
% ...one could now change the points with setpts, and/or do new transforms
77-
% with new c data...
79+
% ...with new c data, and/or do adjoint transforms with new data...
7880
delete(plan); % don't forget to clean up
7981
8082
.. warning::
8183

82-
If an existing array is passed to ``setpts``, then this array must not be altered before ``execute`` is called! This is because, in order to save RAM (allowing larger problems to be solved), internally FINUFFT stores only *pointers* to ``x`` (etc), rather than unnecessarily duplicating this data. This is not true if an *expression* such as ``-x`` or ``2*pi*rand(M,1)`` is passed to ``setpts``, since in those cases the ``plan`` object does make internal copies, as per MATLAB's usual shallow-copy argument passing.
84+
If an existing array is passed to ``setpts``, then this array must not be altered before ``execute`` or ``execute_adjoint`` is called! This is because, in order to save RAM (allowing larger problems to be solved), internally FINUFFT stores only *pointers* to ``x`` (etc), rather than unnecessarily duplicating this data. This is not true if an *expression* such as ``-x`` or ``2*pi*rand(M,1)`` is passed to ``setpts``, since in those cases the ``plan`` object does make internal copies, as per MATLAB's usual shallow-copy argument passing.
8385

8486
Finally, we demo a 2D type 1 transform using the simple interface. Let's
8587
request a rectangular Fourier mode array of 1000 modes in the x direction but 500 in the

docs/matlabhelp.doc

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -461,8 +461,7 @@
461461

462462
FINUFFT_PLAN is a class which wraps the guru interface to FINUFFT.
463463

464-
Full documentation is given in ../finufft-manual.pdf and online at
465-
http://finufft.readthedocs.io
464+
Full documentation is given online at http://finufft.readthedocs.io
466465
Also see examples in the matlab/examples and matlab/test directories.
467466

468467
PROPERTIES
@@ -478,6 +477,7 @@
478477
finufft_plan - create guru plan object for one/many general nonuniform FFTs.
479478
setpts - process nonuniform points for general FINUFFT transform(s).
480479
execute - execute single or many-vector FINUFFT transforms in a plan.
480+
execute_adjoint - execute adjoint of planned transform(s).
481481

482482
General notes:
483483
* use delete(plan) to remove a plan after use.
@@ -605,10 +605,43 @@
605605
plan stage using opts.floatprec, otherwise an error is raised.
606606

607607

608-
4) To deallocate (delete) a nonuniform FFT plan, use delete(plan)
608+
4) EXECUTE_ADJOINT execute adjoint of planned transform(s).
609609

610-
This deallocates all stored FFTW plans, nonuniform point sorting arrays,
611-
kernel Fourier transforms arrays, etc.
610+
result = plan.execute_adjoint(data_in);
611+
612+
Perform the adjoint of the planned transform(s) that plan.execute would
613+
perform (see above documentation for EXECUTE). This is convenient in the
614+
common case of needing forward-adjoint transform pairs for the same set of
615+
nonuniform points.
616+
The adjoint of a type 1 is a type 2 of opposite isign, and vice versa.
617+
The adjoint of a type 3 is a type 3 of opposite isign and flipped input
618+
and output.
619+
620+
Inputs:
621+
plan finufft_plan object
622+
data_in strengths (adjoint type 2 and 3) or Fourier coefficients
623+
(adjoint type 1) vector, matrix, or array of appropriate size.
624+
For adjoint type 1, in 1D this is length-ms, in 2D size (ms,mt),
625+
or in 3D size (ms,mt,mu), or each of these with an extra last
626+
dimension ntrans if ntrans>1. For adjoint types 2 and 3, it is
627+
a column vector of length M (for type 2, the length of xj),
628+
or nk (for type 3, the length of s). If ntrans>1 its is a stack
629+
of such objects, ie, it has an extra last dimension ntrans.
630+
Outputs:
631+
result strengths (adjoint of type 1 or 3) or Fourier coefficients
632+
(adjoint of type 2) vector, matrix, or array of appropriate size.
633+
For adjoint of type 1 and 3, this is either a length-M vector
634+
(where M is the length of xj), or an (M,ntrans) matrix when
635+
ntrans>1. For adjoint of type 2, in 1D this is
636+
length-ms, in 2D size (ms,mt), or in 3D size (ms,mt,mu), or
637+
each of these with an extra last dimension ntrans if ntrans>1.
638+
639+
Notes:
640+
* The precision (double/single) of all inputs must match that chosen at the
641+
plan stage using opts.floatprec, otherwise an error is raised.
612642

613643

644+
5) To deallocate (delete) a nonuniform FFT plan, use delete(plan)
614645

646+
This deallocates all stored FFTW plans, nonuniform point sorting arrays,
647+
kernel Fourier transforms arrays, etc.

0 commit comments

Comments
 (0)