TuringLang
diff --git a/‎docs/make.jl‎
Lines changed: 6 additions & 1 deletion b/‎docs/make.jl‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/src/NSF.md‎
Lines changed: 47 additions & 0 deletions b/‎docs/src/NSF.md‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎docs/src/PlanarFlow.md‎
Lines changed: 135 additions & 0 deletions b/‎docs/src/PlanarFlow.md‎
Lines changed: 135 additions & 0 deletions
diff --git a/‎docs/src/RealNVP.md‎
Lines changed: 54 additions & 0 deletions b/‎docs/src/RealNVP.md‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎docs/src/api.md‎
Lines changed: 35 additions & 47 deletions b/‎docs/src/api.md‎
Lines changed: 35 additions & 47 deletions
diff --git a/‎docs/src/customized_layer.md‎
Lines changed: 6 additions & 3 deletions b/‎docs/src/customized_layer.md‎
Lines changed: 6 additions & 3 deletions
@@ -15,8 +15,13 @@ makedocs(;
     format=Documenter.HTML(; prettyurls=get(ENV, "CI", nothing) == "true"),
     pages=[
         "Home" => "index.md",
+        "General usage" => "usage.md",
         "API" => "api.md",
-        "Example" => "example.md",
+        "Example" => [
+            "Planar Flow" => "PlanarFlow.md",
+            "RealNVP" => "RealNVP.md",
+            "Neural Spline Flow" => "NSF.md",
+        ],
         "Customize your own flow layer" => "customized_layer.md",
     ],
     checkdocs=:exports,
 
@@ -0,0 +1,47 @@
+# Demo of NSF on 2D Banana Distribution
+
+```julia
+using Random, Distributions, LinearAlgebra
+using Functors
+using Optimisers, ADTypes
+using Zygote
+using NormalizingFlows
+
+
+target = Banana(2, one(T), 100one(T))
+logp = Base.Fix1(logpdf, target)
+
+######################################
+# learn the target using Neural Spline Flow
+######################################
+@leaf MvNormal
+q0 = MvNormal(zeros(T, 2), I)
+
+
+flow = nsf(q0; paramtype=T)
+flow_untrained = deepcopy(flow)
+######################################
+# start training
+######################################
+sample_per_iter = 64
+
+# callback function to log training progress
+cb(iter, opt_stats, re, θ) = (sample_per_iter=sample_per_iter,ad=adtype)
+# nsf only supports AutoZygote
+adtype = ADTypes.AutoZygote()
+checkconv(iter, stat, re, θ, st) = stat.gradient_norm < one(T)/1000
+flow_trained, stats, _ = train_flow(
+    elbo_batch,
+    flow,
+    logp,
+    sample_per_iter;
+    max_iters=10,   # change to larger number of iterations (e.g., 50_000) for better results
+    optimiser=Optimisers.Adam(1e-4),
+    ADbackend=adtype,
+    show_progress=true,
+    callback=cb,
+    hasconverged=checkconv,
+)
+θ, re = Optimisers.destructure(flow_trained)
+losses = map(x -> x.loss, stats)
+```
@@ -0,0 +1,135 @@
+# Planar Flow on a 2D Banana Distribution
+
+This example demonstrates learning a synthetic 2D banana distribution with a planar normalizing flow [^RM2015] by maximizing the Evidence Lower BOund (ELBO).
+
+The two required ingredients are:
+
+- A log-density function `logp` for the target distribution.
+- A parametrised invertible transformation (the planar flow) applied to a simple base distribution.
+
+## Target Distribution
+
+The banana target used here is defined in `example/targets/banana.jl` (see source for details):
+
+```julia
+using Random, Distributions
+Random.seed!(123)
+
+target = Banana(2, 1.0, 10.0)  # (dimension, nonlinearity, scale)
+logp = Base.Fix1(logpdf, target)
+```
+
+You can visualise its contour and samples (figure shipped as `banana.png`).
+
+![Banana](banana.png)
+
+## Planar Flow
+
+A planar flow of length N applies a sequence of planar layers to a base distribution q₀:
+
+```math
+T_{n,\theta_n}(x) = x + u_n \tanh(w_n^T x + b_n), \qquad n = 1,\ldots,N.
+```
+
+Parameters θₙ = (uₙ, wₙ, bₙ) are learned. `Bijectors.jl` provides `PlanarLayer`.
+
+```julia
+using Bijectors
+using Functors # for @leaf
+
+function create_planar_flow(n_layers::Int, q₀)
+    d = length(q₀)
+    Ls = [PlanarLayer(d) for _ in 1:n_layers]
+    ts = reduce(∘, Ls)  # alternatively: FunctionChains.fchain(Ls)
+    return transformed(q₀, ts)
+end
+
+@leaf MvNormal  # prevent updating base distribution parameters
+q₀ = MvNormal(zeros(2), ones(2))
+flow = create_planar_flow(10, q₀)
+flow_untrained = deepcopy(flow)  # keep copy for comparison
+```
+
+If you build *many* layers (e.g. > ~30) you may reduce compilation time by using `FunctionChains.jl`:
+
+```julia
+# uncomment the following lines to use FunctionChains
+# using FunctionChains
+# ts = fchain([PlanarLayer(d) for _ in 1:n_layers])
+```
+See [this comment](https://github.com/TuringLang/NormalizingFlows.jl/blob/8f4371d48228adf368d851e221af076ff929f1cf/src/NormalizingFlows.jl#L52)
+for how the compilation time might be a concern.
+
+## Training the Flow
+
+We maximize the ELBO (here using the minibatch estimator `elbo_batch`) with the generic `train_flow` interface.
+
+```julia
+using NormalizingFlows
+using ADTypes, Optimisers
+using Mooncake
+
+sample_per_iter = 32
+adtype = ADTypes.AutoMooncake(; config=Mooncake.Config())  # try AutoZygote() / AutoForwardDiff() / etc.
+# optional: callback function to track the batch size per iteration and the AD backend used 
+cb(iter, opt_stats, re, θ) = (sample_per_iter=sample_per_iter, ad=adtype)
+# optional: defined stopping criteria when the gradient norm is less than 1e-3
+checkconv(iter, stat, re, θ, st) = stat.gradient_norm < 1e-3
+
+flow_trained, stats, _ = train_flow(
+    elbo_batch,
+    flow,
+    logp,
+    sample_per_iter;
+    max_iters = 20_000,
+    optimiser = Optimisers.Adam(1e-2),
+    ADbackend = adtype,
+    callback = cb,
+    hasconverged = checkconv,
+    show_progress = false,
+)
+
+losses = map(x -> x.loss, stats)
+```
+
+Plot the losses (negative ELBO):
+
+```julia
+using Plots
+plot(losses; xlabel = "iteration", ylabel = "negative ELBO", label = "", lw = 2)
+```
+
+![elbo](elbo.png)
+
+## Evaluating the Trained Flow
+
+The trained flow is a `Bijectors.TransformedDistribution`, so we can call `rand` to draw iid samples and call `logpdf` to evaluate the log-density function of the flow.
+See [documentation of `Bijectors.jl`](https://turinglang.org/Bijectors.jl/dev/distributions/) for details.
+```julia
+n_samples = 1_000
+samples_trained   = rand(flow_trained, n_samples)
+samples_untrained = rand(flow_untrained, n_samples)
+samples_true      = rand(target, n_samples)
+```
+
+Simple visual comparison:
+
+```julia
+using Plots
+scatter(samples_true[1, :], samples_true[2, :]; label="Target", ms=2, alpha=0.5)
+scatter!(samples_untrained[1, :], samples_untrained[2, :]; label="Untrained", ms=2, alpha=0.5)
+scatter!(samples_trained[1, :],  samples_trained[2, :];  label="Trained", ms=2, alpha=0.5)
+plot!(title = "Planar Flow: Before vs After Training", xlabel = "x₁", ylabel = "x₂", legend = :topleft)
+```
+
+![compare](comparison.png)
+
+## Notes
+
+- Use `elbo` instead of `elbo_batch` for a single-sample estimator.
+- Switch AD backends by changing `adtype` (see `ADTypes.jl`).
+- Marking the base distribution with `@leaf` prevents its parameters from being updated during training.
+
+## Reference
+
+[^RM2015]: Rezende, D. & Mohamed, S. (2015). Variational Inference with Normalizing Flows. ICML.
@@ -0,0 +1,54 @@
+# Demo of RealNVP on 2D Banana Distribution
+
+```julia
+using Random, Distributions, LinearAlgebra
+using Functors
+using Optimisers, ADTypes
+using Mooncake
+using NormalizingFlows
+
+
+target = Banana(2, one(T), 100one(T))
+logp = Base.Fix1(logpdf, target)
+
+######################################
+# set up the RealNVP
+######################################
+@leaf MvNormal
+q0 = MvNormal(zeros(T, 2), I)
+
+d = 2
+hdims = [16, 16]
+nlayers = 3
+
+# use NormalizingFlows.realnvp to create a RealNVP flow
+flow = realnvp(q0, hdims, nlayers; paramtype=T)
+flow_untrained = deepcopy(flow)
+
+
+######################################
+# start training
+######################################
+sample_per_iter = 16
+
+# callback function to log training progress
+cb(iter, opt_stats, re, θ) = (sample_per_iter=sample_per_iter,ad=adtype)
+adtype = ADTypes.AutoMooncake(; config = Mooncake.Config())
+
+checkconv(iter, stat, re, θ, st) = stat.gradient_norm < one(T)/1000
+flow_trained, stats, _ = train_flow(
+    rng, 
+    elbo,        # using elbo_batch instead of elbo achieves 4-5 times speedup 
+    flow,
+    logp,
+    sample_per_iter;
+    max_iters=10,   # change to larger number of iterations (e.g., 50_000) for better results
+    optimiser=Optimisers.Adam(5e-4),
+    ADbackend=adtype,
+    show_progress=true,
+    callback=cb,
+    hasconverged=checkconv,
+)
+θ, re = Optimisers.destructure(flow_trained)
+losses = map(x -> x.loss, stats)
+```
@@ -3,53 +3,6 @@
 ```@index
 ```
 
-## Main Function
-
-```@docs
-NormalizingFlows.train_flow
-```
-
-The flow object can be constructed by `transformed` function in `Bijectors.jl`.
-For example, for Gaussian VI, we can construct the flow as follows:
-
-```julia
-using Distributions, Bijectors
-T = Float32
-@leaf MvNormal # to prevent params in q₀ from being optimized
-q₀ = MvNormal(zeros(T, 2), ones(T, 2))
-flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) ∘ Bijectors.Scale(ones(T, 2)))
-```
-
-To train the Gaussian VI targeting distribution `p` via ELBO maximization, run:
-
-```julia
-using NormalizingFlows, Optimisers
-
-sample_per_iter = 10
-flow_trained, stats, _ = train_flow(
-    elbo,
-    flow,
-    logp,
-    sample_per_iter;
-    max_iters = 2_000,
-    optimiser = Optimisers.ADAM(0.01 * one(T)),
-)
-```
-
-## Coupling-based flows (default constructors)
-
-These helpers construct commonly used coupling-based flows with sensible defaults.
-
-```@docs
-NormalizingFlows.realnvp
-NormalizingFlows.nsf
-NormalizingFlows.RealNVP_layer
-NormalizingFlows.NSF_layer
-NormalizingFlows.AffineCoupling
-NormalizingFlows.NeuralSplineCoupling
-NormalizingFlows.create_flow
-```
-
 ## Variational Objectives
 
 We provide ELBO (reverse KL) and expected log-likelihood (forward KL). You can also
@@ -100,3 +53,38 @@ NormalizingFlows.loglikelihood
 ```@docs
 NormalizingFlows.optimize
 ```
+
+
+## Available Flows
+
+`NormalizingFlows.jl` provides two commonly used normalizing flows: `RealNVP` and 
+`Neural Spline Flow (NSF)`.
+
+### RealNVP (Affine Coupling Flow)
+
+These helpers construct commonly used coupling-based flows with sensible defaults.
+
+```@docs
+NormalizingFlows.realnvp
+NormalizingFlows.RealNVP_layer
+NormalizingFlows.AffineCoupling
+```
+
+### Neural Spline Flow (NSF)
+
+```@docs
+NormalizingFlows.nsf
+NormalizingFlows.NSF_layer
+NormalizingFlows.NeuralSplineCoupling
+```
+
+##  Utility Functions
+
+```@docs
+NormalizingFlows.create_flow
+```
+
+```@docs
+NormalizingFlows.fnn
+```
+
@@ -12,7 +12,11 @@ for more details.
 
 
 In this tutorial, we demonstrate how to define a customized normalizing flow
-layer -- an `Affine Coupling Layer` (Dinh *et al.*, 2016) -- using `Bijectors.jl` and `Flux.jl`.
+layer -- an `Affine Coupling Layer` -- using `Bijectors.jl` and `Flux.jl`,
+which is the building block of the RealNVP flow [^LJS2017].
+It's worth mentioning that the [`realnvp`](@ref) implemented in `NormalizingFlows.jl`
+is slightly different from this tutorial with some optimization for the training stability
+and performance.
 
 ## Affine Coupling Flow
 
@@ -176,5 +180,4 @@ logpdf(flow, x[:,1])
 
 
 ## Reference
-Dinh, L., Sohl-Dickstein, J. and Bengio, S., 2016. *Density estimation using real nvp.* 
-arXiv:1605.08803.
+[^LJS2017]: Dinh, L., Sohl-Dickstein, J. and Bengio, S., 2017. Density estimation using real nvp. in *ICLR*