rework readme

MikaelSlevinsky · MikaelSlevinsky · commit 89d3e97d79a8 · 2019-09-07T15:18:16.000-05:00
diff --git a/README.md b/README.md
@@ -4,71 +4,100 @@
 
 `FastTransforms.jl` allows the user to conveniently work with orthogonal polynomials with degrees well into the millions.
 
-Transforms include conversion between Jacobi polynomial expansions, with Chebyshev, Legendre, and ultraspherical polynomial transforms as special cases. For the signal processor, all three types of nonuniform fast Fourier transforms available. As well, spherical harmonic transforms and transforms between orthogonal polynomials on the triangle allow for the efficient simulation of partial differential equations of evolution.
+This package provides a Julia wrapper for the [C library](https://github.com/MikaelSlevinsky/FastTransforms) of the same name. Additionally, all three types of nonuniform fast Fourier transforms available, as well as the Padua transform.
 
-Algorithms include methods based on asymptotic formulae to relate the transforms to a small number of fast Fourier transforms, matrix factorizations based on the Hadamard product, hierarchical matrix decompositions à la Fast Multipole Method, and the butterfly algorithm.
+## Installation
 
-## The Chebyshev—Legendre Transform
-
-The Chebyshev—Legendre transform allows the fast conversion of Chebyshev expansion coefficients to Legendre expansion coefficients and back.
+The build script, which works on macOS and Linux systems with x86_64 processors, downloads precompiled binaries of the latest version of [FastTransforms](https://github.com/MikaelSlevinsky/FastTransforms). This library depends on `FFTW`, `MPFR`, and `OpenBLAS` (on Linux). Therefore, installation may be as straightforward as:
 
 ```julia
-julia> Pkg.add("FastTransforms")
+julia> if Sys.isapple()
+           run(`brew install gcc@8 fftw mpfr`)
+       elseif Sys.islinux()
+           run(`apt-get gcc-8 libblas-dev libopenblas-base libfftw3-dev libmpfr-dev`)
+       end
+
+pkg> build FastTransforms
 
 julia> using FastTransforms
 
-julia> c = rand(10001);
+```
+
+## Fast orthogonal polynomial transforms
+
+The 26 orthogonal polynomial transforms are listed in `FastTransforms.kind2string.(0:25)`. Univariate transforms may be planned with the standard normalization or with orthonormalization. For multivariate transforms, the standard normalization may be too severe for floating-point computations, so it is omitted. Here are two examples:
+
+### The Chebyshev--Legendre transform
+
+```julia
+julia> c = rand(8192);
 
 julia> leg2cheb(c);
 
 julia> cheb2leg(c);
 
-julia> norm(cheb2leg(leg2cheb(c))-c)
-5.564168202018823e-13
+julia> norm(cheb2leg(leg2cheb(c; normcheb=true); normcheb=true)-c)/norm(c)
+1.1866591414786334e-14
+
 ```
 
-The implementation separates pre-computation into a type of plan. This type is constructed with either `plan_leg2cheb` or `plan_cheb2leg`. Let's see how much faster it is if we pre-compute.
+The implementation separates pre-computation into an `FTPlan`. This type is constructed with either `plan_leg2cheb` or `plan_cheb2leg`. Let's see how much faster it is if we pre-compute.
 
 ```julia
 julia> p1 = plan_leg2cheb(c);
 
 julia> p2 = plan_cheb2leg(c);
 
 julia> @time leg2cheb(c);
-  0.082615 seconds (11.94 k allocations: 31.214 MiB, 6.75% gc time)
+  0.433938 seconds (9 allocations: 64.641 KiB)
 
 julia> @time p1*c;
-  0.004297 seconds (6 allocations: 78.422 KiB)
+  0.007927 seconds (79 allocations: 68.563 KiB)
 
 julia> @time cheb2leg(c);
-  0.110388 seconds (11.94 k allocations: 31.214 MiB, 8.16% gc time)
+  0.423865 seconds (9 allocations: 64.641 KiB)
 
 julia> @time p2*c;
-  0.004500 seconds (6 allocations: 78.422 KiB)
+  0.009164 seconds (89 allocations: 69.672 KiB)
+
+```
+
+Furthermore, for orthogonal polynomial connection problems that are degree-preserving, we should expect to be able to apply the transforms in-place:
+
+```julia
+julia> lmul!(p1, c);
+
+julia> lmul!(p2, c);
+
+julia> ldiv!(p1, c);
+
+julia> ldiv!(p2, c);
+
 ```
 
-## The Chebyshev—Jacobi Transform
+### The spherical harmonic transform
 
-The Chebyshev—Jacobi transform allows the fast conversion of Chebyshev expansion coefficients to Jacobi expansion coefficients and back.
+Let `F` be an array of spherical harmonic expansion coefficients with columns arranged by increasing order in absolute value, alternating between negative and positive orders. Then `sph2fourier` converts the representation into a bivariate Fourier series, and `fourier2sph` converts it back. Once in a bivariate Fourier series on the sphere, `plan_sph_synthesis` converts the coefficients to function samples on an equiangular grid that does not sample the poles, and `plan_sph_analysis` converts them back.
 
 ```julia
-julia> c = rand(10001);
+julia> F = sphrandn(Float64, 1024, 2047); # convenience method
+
+julia> P = plan_sph2fourier(F);
+
+julia> PS = plan_sph_synthesis(F);
 
-julia> @time norm(icjt(cjt(c, 0.1, -0.2), 0.1, -0.2) - c, Inf)
-  0.258390 seconds (431 allocations: 6.278 MB)
-1.4830359162942841e-12
+julia> PA = plan_sph_analysis(F);
 
-julia> p1 = plan_cjt(c, 0.1, -0.2);
+julia> G = PS*(P*F);
 
-julia> p2 = plan_icjt(c, 0.1, -0.2);
+julia> H = P\(PA*G);
 
-julia> @time norm(p2*(p1*c) - c, Inf)
-  0.244842 seconds (17 allocations: 469.344 KB)
-1.4830359162942841e-12
+julia> norm(F-H)/norm(F)
+2.1541073345177038e-15
 
 ```
 
-Composition of transforms allows the Jacobi—Jacobi transform, computed via `jjt`. The remainder in Hahn's asymptotic expansion is valid for the half-open square `(α,β) ∈ (-1/2,1/2]^2`. Therefore, the fast transform works best when the parameters are inside. If the parameters `(α,β)` are not exceptionally beyond the square, then increment/decrement operators are used with linear complexity (and linear conditioning) in the degree.
+Due to the structure of the spherical harmonic connection problem, these transforms may also be performed in-place with `lmul!` and `ldiv!`.
 
 ## Nonuniform fast Fourier transforms
 
@@ -124,41 +153,12 @@ julia> N = div((n+1)*(n+2), 2);
 
 julia> v = rand(N); # The length of v is the number of Padua points
 
-julia> @time norm(ipaduatransform(paduatransform(v)) - v)
-  0.006571 seconds (846 allocations: 1.746 MiB)
-3.123637691861415e-14
+julia> @time norm(ipaduatransform(paduatransform(v)) - v)/norm(v)
+  0.007373 seconds (543 allocations: 1.733 MiB)
+3.925164683252905e-16
 
 ```
 
-## The Spherical Harmonic Transform
-
-Let `F` be a matrix of spherical harmonic expansion coefficients with columns arranged by increasing order in absolute value, alternating between negative and positive orders. Then `sph2fourier` converts the representation into a bivariate Fourier series, and `fourier2sph` converts it back.
-
-```julia
-julia> F = sphrandn(Float64, 256, 256);
-
-julia> G = sph2fourier(F);
-
-julia> H = fourier2sph(G);
-
-julia> norm(F-H)
-4.950645831278297e-14
-
-julia> F = sphrandn(Float64, 1024, 1024);
-
-julia> G = sph2fourier(F; sketch = :none);
-Pre-computing...100%|███████████████████████████████████████████| Time: 0:00:04
-
-julia> H = fourier2sph(G; sketch = :none);
-Pre-computing...100%|███████████████████████████████████████████| Time: 0:00:04
-
-julia> norm(F-H)
-1.1510623098225283e-12
-
-```
-
-As with other fast transforms, `plan_sph2fourier` saves effort by caching the pre-computation. Be warned that for dimensions larger than `1,000`, this is no small feat!
-
 # References:
 
    [1]  B. Alpert and V. Rokhlin. <a href="http://dx.doi.org/10.1137/0912009">A fast algorithm for the evaluation of Legendre expansions</a>, *SIAM J. Sci. Stat. Comput.*, **12**:158—179, 1991.
@@ -171,7 +171,7 @@ As with other fast transforms, `plan_sph2fourier` saves effort by caching the pr
 
    [5]  R. M. Slevinsky. <a href="https://doi.org/10.1093/imanum/drw070">On the use of Hahn's asymptotic formula and stabilized recurrence for a fast, simple, and stable Chebyshev—Jacobi transform</a>, *IMA J. Numer. Anal.*, **38**:102—124, 2018.
 
-   [6]  R. M. Slevinsky. <a href="https://doi.org/10.1016/j.acha.2017.11.001">Fast and backward stable transforms between spherical harmonic expansions and bivariate Fourier series</a>, in press at *Appl. Comput. Harmon. Anal.*, 2017.
+   [6]  R. M. Slevinsky. <a href="https://doi.org/10.1016/j.acha.2017.11.001">Fast and backward stable transforms between spherical harmonic expansions and bivariate Fourier series</a>, *Appl. Comput. Harmon. Anal.*, **47**:585—606, 2019.
 
    [7]  R. M. Slevinsky, <a href="https://arxiv.org/abs/1711.07866">Conquering the pre-computation in two-dimensional harmonic polynomial transforms</a>, arXiv:1711.07866, 2017.
 
diff --git a/examples/sphere.jl b/examples/sphere.jl
@@ -26,6 +26,13 @@
 # For the storage pattern of the arrays, please consult the documentation.
 #############
 
+function threshold!(A::AbstractArray, ϵ)
+    for i in eachindex(A)
+        if abs(A[i]) < ϵ A[i] = 0 end
+    end
+    A
+end
+
 using FastTransforms
 
 # The colatitudinal grid (mod π):
@@ -51,19 +58,19 @@ P4 = x -> (35*x^4-30*x^2+3)/8
 # On the tensor product grid, our function samples are:
 F = [(P4(z(θ,φ)⋅y) - P4(x⋅y))/(z(θ,φ)⋅y - x⋅y) for θ in θ, φ in φ]
 
-P = plan_sph2fourier(Float64, N)
-PA = plan_sph_analysis(Float64, N, M)
+P = plan_sph2fourier(F)
+PA = plan_sph_analysis(F)
 
 # Its spherical harmonic coefficients demonstrate that it is degree-3:
 V = PA*F
-U3 = P\V
+U3 = threshold!(P\V, 400*eps())
 
 # Similarly, on the tensor product grid, the Legendre polynomial P₄(z⋅y) is:
 F = [P4(z(θ,φ)⋅y) for θ in θ, φ in φ]
 
 # Its spherical harmonic coefficients demonstrate that it is exact-degree-4:
 V = PA*F
-U4 = P\V
+U4 = threshold!(P\V, 3*eps())
 
 nrm1 = norm(U4);
 
@@ -73,7 +80,7 @@ F = [P4(z(θ,φ)⋅x) for θ in θ, φ in φ]
 # It only has one nonnegligible spherical harmonic coefficient.
 # Can you spot it?
 V = PA*F
-U4 = P\V
+U4 = threshold!(P\V, 3*eps())
 
 # That nonnegligible coefficient should be approximately √(2π/(4+1/2)),
 # since the convention in this library is to orthonormalize.
diff --git a/src/libfasttransforms.jl b/src/libfasttransforms.jl
@@ -1,18 +1,17 @@
 # This file shows how to call `libfasttransforms` from Julia.
+# It is normally built by downloading precompiled binaries assuming that
+# dependencies are already installed. However, it may also be built from source.
 
-# Step 1: In this repository, `git clone -b v0.2.1 https://github.com/MikaelSlevinsky/FastTransforms.git deps/FastTransforms`
+# Step 1: In this repository,
+# `git clone -b v0.2.6 https://github.com/MikaelSlevinsky/FastTransforms.git deps/FastTransforms`
 
-# Step 2: use a version of gcc that supports OpenMP: on OS X, this means using a
-# version of `gcc` from Homebrew, `brew install gcc`; on linux, `gcc-4.6` and up should work.
-# `export CC=gcc-"the-right-version"`.
+# Step 2: Get the dependencies. On macOS, run `brew install gcc@8 fftw mpfr`.
+# On linux, run `apt-get gcc-8 libblas-dev libopenblas-base libfftw3-dev libmpfr-dev`.
 
-# Step 3: get the remaining dependencies: On OS X, either `brew install openblas`
-# or change the Make.inc to use `BLAS=APPLEBLAS` instead of `BLAS=OPENBLAS`.
-# Furthermore, `brew install fftw mpfr`. For linux, see the `Travis.yml` file.
-# For Windows, see the `Appveyor.yml` file.
+# Step 3: Build the library. On macOS, run `make CC=gcc-8 FT_USE_APPLEBLAS=1`.
+# On linux, run `make CC=gcc-8`.
 
-# Step 4: run `make` and check the tests by running `./test_drivers 3 3 0`.
-# All the errors should be roughly on the order of machine precision.
+# Step 4: move `libfastfransforms.dylib` out of the folder to be found by ↓.
 
 const libfasttransforms = joinpath(dirname(@__DIR__), "deps", "libfasttransforms")
 
@@ -220,15 +219,15 @@ for f in (:leg2cheb, :cheb2leg, :ultra2ultra, :jac2jac,
     plan_f = Symbol("plan_", f)
     @eval begin
         $plan_f(x::AbstractArray{T}, y...; z...) where T = $plan_f(T, size(x, 1), y...; z...)
-        $f(x::AbstractArray{T}, y...; z...) where T = $plan_f(x, y...; z...)*x
+        $f(x::AbstractArray, y...; z...) = $plan_f(x, y...; z...)*x
     end
 end
 
 for (f, plan_f) in ((:fourier2sph, :plan_sph2fourier), (:fourier2sphv, :plan_sphv2fourier),
                     (:cxf2disk2, :plan_disk2cxf), (:cheb2tri, :plan_tri2cheb),
                     (:cheb2tet, :plan_tet2cheb))
     @eval begin
-        $f(x::AbstractArray{T}, y...; z...) where T = $plan_f(x, y...; z...)\x
+        $f(x::AbstractArray, y...; z...) = $plan_f(x, y...; z...)\x
     end
 end
 
@@ -435,6 +434,7 @@ for (fJ, fC, fE, K) in ((:plan_sph_synthesis, :ft_plan_sph_synthesis, :ft_execut
                     (:plan_tri_synthesis, :ft_plan_tri_synthesis, :ft_execute_tri_synthesis, TRIANGLESYNTHESIS),
                     (:plan_tri_analysis, :ft_plan_tri_analysis, :ft_execute_tri_analysis, TRIANGLEANALYSIS))
     @eval begin
+        $fJ(x::Matrix{T}) where T = $fJ(T, size(x, 1), size(x, 2))
         function $fJ(::Type{Float64}, n::Integer, m::Integer)
             plan = ccall(($(string(fC)), libfasttransforms), Ptr{ft_plan_struct}, (Cint, Cint), n, m)
             return FTPlan{Float64, 2, $K}(plan, n, m)
@@ -449,6 +449,8 @@ for (fJ, fC, fE, K) in ((:plan_sph_synthesis, :ft_plan_sph_synthesis, :ft_execut
     end
 end
 
+plan_tet_synthesis(x::Array{T, 3}) where T = plan_tet_synthesis(T, size(x, 1), size(x, 2), size(x, 3))
+
 function plan_tet_synthesis(::Type{Float64}, n::Integer, l::Integer, m::Integer)
     plan = ccall((:ft_plan_tet_synthesis, libfasttransforms), Ptr{ft_plan_struct}, (Cint, Cint, Cint), n, l, m)
     return FTPlan{Float64, 3, TETRAHEDRONSYNTHESIS}(plan, n, l, m)
@@ -462,6 +464,8 @@ function lmul!(p::FTPlan{Float64, 3, TETRAHEDRONSYNTHESIS}, x::Array{Float64, 3}
     return x
 end
 
+plan_tet_analysis(x::Array{T, 3}) where T = plan_tet_analysis(T, size(x, 1), size(x, 2), size(x, 3))
+
 function plan_tet_analysis(::Type{Float64}, n::Integer, l::Integer, m::Integer)
     plan = ccall((:ft_plan_tet_analysis, libfasttransforms), Ptr{ft_plan_struct}, (Cint, Cint, Cint), n, l, m)
     return FTPlan{Float64, 3, TETRAHEDRONANALYSIS}(plan, n, l, m)