flatironinstitute
diff --git a/‎CHANGELOG‎
Lines changed: 6 additions & 0 deletions b/‎CHANGELOG‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/c.rst‎
Lines changed: 12 additions & 6 deletions b/‎docs/c.rst‎
Lines changed: 12 additions & 6 deletions
diff --git a/‎docs/cex.rst‎
Lines changed: 7 additions & 3 deletions b/‎docs/cex.rst‎
Lines changed: 7 additions & 3 deletions
diff --git a/‎docs/cguru.doc‎
Lines changed: 47 additions & 3 deletions b/‎docs/cguru.doc‎
Lines changed: 47 additions & 3 deletions
diff --git a/‎docs/cguru.docsrc‎
Lines changed: 44 additions & 3 deletions b/‎docs/cguru.docsrc‎
Lines changed: 44 additions & 3 deletions
diff --git a/‎docs/fortran.rst‎
Lines changed: 3 additions & 0 deletions b/‎docs/fortran.rst‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/makefile.doc‎
Lines changed: 2 additions & 2 deletions b/‎docs/makefile.doc‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/matlab.rst‎
Lines changed: 5 additions & 3 deletions b/‎docs/matlab.rst‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎docs/matlabhelp.doc‎
Lines changed: 38 additions & 5 deletions b/‎docs/matlabhelp.doc‎
Lines changed: 38 additions & 5 deletions
@@ -3,6 +3,12 @@ If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
 
 Master (working towards v2.5.0), 7/8/25
 
+* Added functionality for adjoint execution of FINUFFT plans (Reinecke #633,
+  addresses #566 and #571).
+  Work arrays are now only allocated during plan execution, reducing overall
+  memory consumption.
+  A single plan can now safely be executed by several threads concurrently.
+
 V 2.4.1 7/8/25
 
 * Update Python cufinufft unit tests to use complex dtypes (Andén, #705).
 
@@ -53,7 +53,7 @@ with the word "many" in the function name) perform ``ntr`` transforms with the s
 
 .. note::
 
-  The motivations for the vectorized interface (and guru interface, see below) are as follows. 1) It is more efficient to bin-sort the nonuniform points only once if there are not to change between transforms. 2) For small problems, certain start-up costs cause repeated calls to the simple interface to be slower than necessary.  In particular, we note that FFTW takes around 0.1 ms per thread to look up stored wisdom, which for small problems (of order 10000 or less input and output data) can, sadly, dominate the runtime.
+  The motivations for the vectorized interface (and guru interface, see below) include the following. 1) It is more efficient to bin-sort the nonuniform points only once if there are not to change between transforms. 2) For small problems, certain start-up costs cause repeated calls to the simple interface to be slower than necessary.  In particular, we note that FFTW takes around 0.1 ms per thread to look up stored wisdom, which for small problems (of order 10000 or less input and output data) can, sadly, dominate the runtime.
 
 
 1D transforms
@@ -77,13 +77,19 @@ with the word "many" in the function name) perform ``ntr`` transforms with the s
 Guru plan interface
 -------------------
 
-This provides more flexibility than the simple or vectorized interfaces.
+This provides more flexibility than either simple or vectorized interfaces.
 Any transform requires (at least)
-calling the following four functions in order. However, within this
-sequence one may insert repeated ``execute`` calls, or another ``setpts``
-followed by more ``execute`` calls, as long as the transform sizes (and number of transforms ``ntr``) are
+calling four of the following five functions in order. However, within this
+sequence one may insert repeated ``execute`` and/or ``execute_adjoint`` calls,
+or another ``setpts``
+followed by more ``execute`` and/or ``execute_adjoint`` calls, as long as the transform sizes (and number of transforms ``ntr``) are
 consistent with those that have been set in the ``plan`` and in ``setpts``.
-Keep in mind that ``setpts`` retains *pointers* to the user's list of nonuniform points, rather than copying these points; thus the user must not change their nonuniform point arrays until after any ``execute`` calls that use them.
+Keep in mind that ``setpts`` retains *pointers* to the user's list of nonuniform points, rather than copying these points; thus the user must not change their nonuniform point arrays until after any ``execute`` or ``execute_adjoint`` calls that use them.
+
+The goal of the ``execute_adjoint`` feature (fully supported in v2.5.0)
+is to allow the
+common use-case of transform and adjoint transform pairs to be accessible
+via a single plan stage and a single setpts call.
 
 .. note::
 
 
@@ -236,6 +236,8 @@ previous wisdom which would be significant when doing many small transforms.
 You may also send in a new
 set of stacked strength data (for type 1 and 3, or coefficients for type 2),
 reusing the existing FFTW plan and sorted points.
+Finally, you may execute *adjoints* of the planned transforms without
+re-planning, making forward-adjoint transform pairs very convenient.
 Now we redo the above 2D type 1 C++ example with the guru interface.
 
 One first makes a plan giving transform parameters, but no data:
@@ -254,6 +256,7 @@ One first makes a plan giving transform parameters, but no data:
   // step 3: do the planned transform to the c strength data, output to F...
   finufft_execute(plan, &c[0], &F[0]);
   // ... you could now send in new points, and/or do transforms with new c data
+  // ... or even adjoint transforms with the same points but now mapping F to c.
   // ...
   // step 4: when done, free the memory used by the plan...
   finufft_destroy(plan);
@@ -264,14 +267,15 @@ is that the ``int64_t`` type (aka ``long long int``)
 is needed since the Fourier coefficient dimensions are passed as an array.
 
 .. warning::
-  You must not change the nonuniform point arrays (here ``x``, ``y``) between passing them to ``finufft_setpts`` and performing ``finufft_execute``. The latter call expects these arrays to be unchanged. We chose this style of interface since it saves RAM and time (by avoiding unnecessary duplication), allowing the largest possible problems to be solved.
+  You must not change the nonuniform point arrays (here ``x``, ``y``) between passing them to ``finufft_setpts`` and performing ``finufft_execute`` or ``finufft_execute_adjoint``. The last two calls expect these arrays to be unchanged. We chose this style of interface since it saves RAM and time (by avoiding unnecessary duplication), allowing the largest possible problems to be solved.
 
 .. warning::
   You must destroy a plan before making a new plan using the same
   plan object, otherwise a memory leak results.
 
-The complete code with a math test is in ``examples/guru2d1.cpp``, and for
-more examples see ``examples/guru1d1*.c*``
+The complete code with a math test is in ``examples/guru2d1.cpp``,
+the demo of an adjoint execution is in ``examples/guru2d1_adjoint.cpp``,
+and for more examples see ``examples/guru1d1*.c*``
 
 Using the guru interface to perform a vectorized transform (multiple 1D type 1
 transforms each with the same nonuniform points) is demonstrated in
 
@@ -7,9 +7,10 @@
 
    Make a plan to perform one or more general transforms.
 
-   Under the hood, for type 1 and 2, this does FFTW planning and kernel Fourier
-   transform precomputation. For type 3, this does very little, since the FFT
-   sizes are not yet known.
+   Under the hood, for type 1 and 2, this chooses spread/interp kernel
+   parameters, precomputes the kernel Fourier transform, and (for FFTW), plans
+   a pair of FFTs. For type 3, only the kernel parameters are chosen, since
+   the FFT sizes are not yet known.
 
  Inputs:
       type   type of transform (1,2, or 3)
@@ -128,6 +129,49 @@
        if ntr>1, being the "slowest" (outer) dimension.
 
 
+::
+
+ int finufft_execute_adjoint(finufft_plan plan, complex<double>* c, complex<double>* f)
+ int finufftf_execute_adjoint(finufftf_plan plan, complex<float>* c, complex<float>* f)
+
+   Perform one or more NUFFT transforms using previously entered nonuniform
+   points and the *adjoint* of the existing planned transform. The point is to
+   enable transforms and their adjoints to be accessible via a single plan.
+   Recall that the adjoint of a type 1 is a type 2 of opposite isign, and
+   vice versa. The adjoint of a type 3 is a type 3 of opposite isign and
+   flipped input and output. To summarize, this operation maps
+     adjoint of type 1: f -> c
+     adjoint of type 2: c -> f
+     adjoint of type 3: f -> c
+
+   Inputs:
+        plan   plan object
+
+   Input/Outputs:
+        c      If adjoints of types 1 and 3, the output values at the
+               nonuniform point sources (size M*ntr complex array).
+               If adjoint of type 2, the input strengths at the nonuniform
+               point targets (size M*ntr complex array).
+        f      If adjoint of type 1, the input Fourier mode coefficients (size
+               N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
+               dim = 1, 2, or 3 respectively).
+               If adjoint of type 2, the output Fourier mode coefficients (size
+               N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
+               dim = 1, 2, or 3 respectively).
+               If adjoint of type 3, the input values at the nonuniform
+               frequency sources (size N*ntr complex array).
+
+   Outputs:
+     return value  0: success, 1: success but warning, >1: error (see error.rst)
+
+   Notes:
+     * The contents of the arrays x, y, z, s, t, u must not have changed since
+       the finufft_setpts call that read them. The adjoint execution rereads them
+       (this way of doing business saves RAM).
+     * f and c are contiguous Fortran-style arrays with the transform number,
+       if ntr>1, being the "slowest" (outer) dimension.
+
+
 ::
 
  int finufft_destroy(finufft_plan plan)
 
@@ -2,9 +2,10 @@ int @G_makeplan(int type, int dim, int64_t* nmodes, int iflag, int ntr, double e
 
   Make a plan to perform one or more general transforms.
 
-  Under the hood, for type 1 and 2, this does FFTW planning and kernel Fourier
-  transform precomputation. For type 3, this does very little, since the FFT
-  sizes are not yet known.
+  Under the hood, for type 1 and 2, this chooses spread/interp kernel
+  parameters, precomputes the kernel Fourier transform, and (for FFTW), plans
+  a pair of FFTs. For type 3, only the kernel parameters are chosen, since
+  the FFT sizes are not yet known.
 
 Inputs:
      type   type of transform (1,2, or 3)
@@ -114,6 +115,46 @@ int @G_execute(finufft_plan plan, complex<double>* c, complex<double>* f)
       if ntr>1, being the "slowest" (outer) dimension.
 
 
+int @G_execute_adjoint(finufft_plan plan, complex<double>* c, complex<double>* f)
+
+  Perform one or more NUFFT transforms using previously entered nonuniform
+  points and the *adjoint* of the existing planned transform. The point is to
+  enable transforms and their adjoints to be accessible via a single plan.
+  Recall that the adjoint of a type 1 is a type 2 of opposite isign, and
+  vice versa. The adjoint of a type 3 is a type 3 of opposite isign and
+  flipped input and output. To summarize, this operation maps
+    adjoint of type 1: f -> c
+    adjoint of type 2: c -> f
+    adjoint of type 3: f -> c
+
+  Inputs:
+       plan   plan object
+
+  Input/Outputs:
+       c      If adjoints of types 1 and 3, the output values at the
+              nonuniform point sources (size M*ntr complex array).
+              If adjoint of type 2, the input strengths at the nonuniform
+              point targets (size M*ntr complex array).
+       f      If adjoint of type 1, the input Fourier mode coefficients (size
+              N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
+              dim = 1, 2, or 3 respectively).
+              If adjoint of type 2, the output Fourier mode coefficients (size
+              N1*ntr or N1*N2*ntr or N1*N2*N3*ntr complex array, when
+              dim = 1, 2, or 3 respectively).
+              If adjoint of type 3, the input values at the nonuniform
+              frequency sources (size N*ntr complex array).
+
+  Outputs:
+@r
+
+  Notes:
+    * The contents of the arrays x, y, z, s, t, u must not have changed since
+      the finufft_setpts call that read them. The adjoint execution rereads them
+      (this way of doing business saves RAM).
+    * f and c are contiguous Fortran-style arrays with the transform number,
+      if ntr>1, being the "slowest" (outer) dimension.
+
+
 int @G_destroy(finufft_plan plan)
 
   Deallocate a plan object. This must be used upon clean-up, or before reusing
 
@@ -161,6 +161,7 @@ These routines and arguments are, in double-precision:
       call finufft_makeplan(type,dim,n_modes,iflag,ntrans,tol,plan,opts,ier)
       call finufft_setpts(plan,M,xj,yj,zj,Nk,sk,yk,uk,ier)
       call finufft_execute(plan,cj,fk,ier)
+      call finufft_execute_adjoint(plan,cj,fk,ier)
       call finufft_destroy(plan,ier)
 
 The single-precision (ie, ``real*4`` and ``complex*8``)
@@ -178,6 +179,8 @@ Each has a math test to check the correctness of some or all outputs::
 
   simple1d1.f        - 1D type 1, simple interface, default and various opts
   guru1d1.f          - 1D type 1, guru interface, default and various opts
+  guru1d1_adjoint.f  - adjoint of 1D type 1, guru interface, default opts
+  guru1d2_adjoint.f  - adjoint of 1D type 2, guru interface, default and various opts
   nufft1d_demo.f     - 1D types 1,2,3, minimally changed from CMCL demo codes
   nufft2d_demo.f     - 2D "
   nufft3d_demo.f     - 3D "
 
@@ -1,4 +1,4 @@
-make[1]: Entering directory '/home/marco/repos/finufft'
+make[1]: Entering directory '/home/alex/numerics/finufft'
 Makefile for FINUFFT CPU library. Please specify your task:
  make lib - build the main library (in lib/ and lib-static/)
  make examples - compile and run all codes in examples/
@@ -23,4 +23,4 @@ Make options:
  You must at least 'make objclean' before changing such options!
 
 Also see docs/install.rst and docs/README
-make[1]: Leaving directory '/home/marco/repos/finufft'
+make[1]: Leaving directory '/home/alex/numerics/finufft'
@@ -54,12 +54,14 @@ interface. For smaller transform sizes the acceleration factor of this vectorize
 
 If you want yet more control, consider using the "guru" interface.
 This can be faster than fresh calls to the simple or vectorized interfaces
-for the same number of transforms, for reasons such as this:
+for the same number of transforms, since
 the nonuniform points can be changed between transforms, without forcing
 FFTW to look up a previously stored plan.
 Usually, such an acceleration is only important when doing
 repeated small transforms, where "small" means each transform takes of
 order 0.01 sec or less.
+The guru interface is also very convenient for applying forward-adjoint
+transform pairs, common in imaging or optimization applications.
 Here we use the guru interface to repeat the first demo above:
 
 .. code-block:: matlab
@@ -74,12 +76,12 @@ Here we use the guru interface to repeat the first demo above:
   c = randn(M,1)+1i*randn(M,1);       % iid random complex data (row or col vec)
   f = plan.execute(c);                % do the transform (0.008 sec, ie, faster)
   % ...one could now change the points with setpts, and/or do new transforms
-  % with new c data...
+  % ...with new c data, and/or do adjoint transforms with new data...
   delete(plan);                       % don't forget to clean up
 
 .. warning::
 
-   If an existing array is passed to ``setpts``, then this array must not be altered before ``execute`` is called! This is because, in order to save RAM (allowing larger problems to be solved), internally FINUFFT stores only *pointers* to ``x`` (etc), rather than unnecessarily duplicating this data. This is not true if an *expression* such as ``-x`` or ``2*pi*rand(M,1)`` is passed to ``setpts``, since in those cases the ``plan`` object does make internal copies, as per MATLAB's usual shallow-copy argument passing.
+   If an existing array is passed to ``setpts``, then this array must not be altered before ``execute`` or ``execute_adjoint`` is called! This is because, in order to save RAM (allowing larger problems to be solved), internally FINUFFT stores only *pointers* to ``x`` (etc), rather than unnecessarily duplicating this data. This is not true if an *expression* such as ``-x`` or ``2*pi*rand(M,1)`` is passed to ``setpts``, since in those cases the ``plan`` object does make internal copies, as per MATLAB's usual shallow-copy argument passing.
 
 Finally, we demo a 2D type 1 transform using the simple interface. Let's
 request a rectangular Fourier mode array of 1000 modes in the x direction but 500 in the
 
@@ -461,8 +461,7 @@
 
  FINUFFT_PLAN   is a class which wraps the guru interface to FINUFFT.
 
-  Full documentation is given in ../finufft-manual.pdf and online at
-  http://finufft.readthedocs.io
+  Full documentation is given online at http://finufft.readthedocs.io
   Also see examples in the matlab/examples and matlab/test directories.
 
  PROPERTIES
@@ -478,6 +477,7 @@
    finufft_plan - create guru plan object for one/many general nonuniform FFTs.
    setpts       - process nonuniform points for general FINUFFT transform(s).
    execute      - execute single or many-vector FINUFFT transforms in a plan.
+   execute_adjoint - execute adjoint of planned transform(s).
 
  General notes:
   * use delete(plan) to remove a plan after use.
@@ -605,10 +605,43 @@
     plan stage using opts.floatprec, otherwise an error is raised.
 
 
- 4) To deallocate (delete) a nonuniform FFT plan, use delete(plan)
+ 4) EXECUTE_ADJOINT   execute adjoint of planned transform(s).
 
- This deallocates all stored FFTW plans, nonuniform point sorting arrays,
-  kernel Fourier transforms arrays, etc.
+ result = plan.execute_adjoint(data_in);
+
+  Perform the adjoint of the planned transform(s) that plan.execute would
+  perform (see above documentation for EXECUTE). This is convenient in the
+  common case of needing forward-adjoint transform pairs for the same set of
+  nonuniform points.
+  The adjoint of a type 1 is a type 2 of opposite isign, and vice versa.
+  The adjoint of a type 3 is a type 3 of opposite isign and flipped input
+  and output.
+
+ Inputs:
+     plan     finufft_plan object
+     data_in  strengths (adjoint type 2 and 3) or Fourier coefficients
+              (adjoint type 1) vector, matrix, or array of appropriate size.
+              For adjoint type 1, in 1D this is length-ms, in 2D size (ms,mt),
+              or in 3D size (ms,mt,mu), or each of these with an extra last
+              dimension ntrans if ntrans>1. For adjoint types 2 and 3, it is
+              a column vector of length M (for type 2, the length of xj),
+              or nk (for type 3, the length of s). If ntrans>1 its is a stack
+              of such objects, ie, it has an extra last dimension ntrans.
+ Outputs:
+     result   strengths (adjoint of type 1 or 3) or Fourier coefficients
+              (adjoint of type 2) vector, matrix, or array of appropriate size.
+              For adjoint of type 1 and 3, this is either a length-M vector
+              (where M is the length of xj), or an (M,ntrans) matrix when
+              ntrans>1. For adjoint of type 2, in 1D this is
+              length-ms, in 2D size (ms,mt), or in 3D size (ms,mt,mu), or
+              each of these with an extra last dimension ntrans if ntrans>1.
+
+ Notes:
+  * The precision (double/single) of all inputs must match that chosen at the
+    plan stage using opts.floatprec, otherwise an error is raised.
 
 
+ 5) To deallocate (delete) a nonuniform FFT plan, use delete(plan)
 
+ This deallocates all stored FFTW plans, nonuniform point sorting arrays,
+  kernel Fourier transforms arrays, etc.