update doc

zuhengxu · zuhengxu · commit 48829adda06c · 2025-08-08T14:36:07.000-07:00
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -5,5 +5,6 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589"
 NormalizingFlows = "50e4474d-9f12-44b7-af7a-91ab30ff6256"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
diff --git a/docs/make.jl b/docs/make.jl
@@ -7,14 +7,15 @@ DocMeta.setdocmeta!(
 
 makedocs(;
     modules=[NormalizingFlows],
-    repo="https://github.com/TuringLang/NormalizingFlows.jl/blob/{commit}{path}#{line}",
     sitename="NormalizingFlows.jl",
-    format=Documenter.HTML(),
+    format=Documenter.HTML(;
+        repolink="https://github.com/TuringLang/NormalizingFlows.jl/blob/{commit}{path}#{line}",
+    ),
     pages=[
         "Home" => "index.md",
         "API" => "api.md",
         "Example" => "example.md",
         "Customize your own flow layer" => "customized_layer.md",
     ],
     checkdocs=:exports,
-)
+)
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -1,62 +1,74 @@
-## API
+# API
 
 ```@index
 ```
 
-
 ## Main Function
 
 ```@docs
 NormalizingFlows.train_flow
 ```
 
-The flow object can be constructed by `transformed` function in `Bijectors.jl` package.
-For example of Gaussian VI, we can construct the flow as follows:
-```@julia
+The flow object can be constructed by `transformed` function in `Bijectors.jl`.
+For example, for Gaussian VI, we can construct the flow as follows:
+
+```julia
 using Distributions, Bijectors
-T= Float32
+T = Float32
 @leaf MvNormal # to prevent params in q₀ from being optimized
 q₀ = MvNormal(zeros(T, 2), ones(T, 2))
 flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) ∘ Bijectors.Scale(ones(T, 2)))
 ```
-To train the Gaussian VI targeting at distirbution $p$ via ELBO maiximization, we can run
-```@julia
-using NormalizingFlows
+
+To train the Gaussian VI targeting distribution `p` via ELBO maximization, run:
+
+```julia
+using NormalizingFlows, Optimisers
 
 sample_per_iter = 10
 flow_trained, stats, _ = train_flow(
     elbo,
     flow,
     logp,
     sample_per_iter;
-    max_iters=2_000,
-    optimiser=Optimisers.ADAM(0.01 * one(T)),
+    max_iters = 2_000,
+    optimiser = Optimisers.ADAM(0.01 * one(T)),
 )
 ```
-## Variational Objectives
-We have implemented two variational objectives, namely, ELBO and the log-likelihood objective. 
-Users can also define their own objective functions, and pass it to the [`train_flow`](@ref) function.
-`train_flow` will optimize the flow parameters by maximizing `vo`.
-The objective function should take the following general form:
-```julia
-vo(rng, flow, args...) 
+
+## Coupling-based flows (default constructors)
+
+These helpers construct commonly used coupling-based flows with sensible defaults.
+
+```@docs
+NormalizingFlows.realnvp
+NormalizingFlows.nsf
+NormalizingFlows.RealNVP_layer
+NormalizingFlows.NSF_layer
+NormalizingFlows.AffineCoupling
+NormalizingFlows.NeuralSplineCoupling
+NormalizingFlows.create_flow
 ```
-where `rng` is the random number generator, `flow` is the flow object, and `args...` are the
-additional arguments that users can pass to the objective function.
 
-#### Evidence Lower Bound (ELBO)
-By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$, i.e., 
-```math 
+## Variational Objectives
+
+We provide ELBO (reverse KL) and expected log-likelihood (forward KL). You can also
+supply your own objective with the signature `vo(rng, flow, args...)`.
+
+### Evidence Lower Bound (ELBO)
+
+By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$:
+
+```math
 \begin{aligned}
 &\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right]  \quad \text{(Reverse KL)}\\
 & = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ
 T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ
-F_1(X)\right)\right] \quad \text{(ELBO)} 
+F_1(X)\right)\right] \quad \text{(ELBO)}
 \end{aligned}
 ```
-Reverse KL minimization is typically used for **Bayesian computation**, 
-where one only has access to the log-(unnormalized)density of the target distribution $p$ (e.g., a Bayesian posterior distribution), 
-and hope to generate approximate samples from it.
+
+Reverse KL minimization is typically used for Bayesian computation when only `logp` is available.
 
 ```@docs
 NormalizingFlows.elbo
@@ -66,24 +78,23 @@ NormalizingFlows.elbo
 NormalizingFlows.elbo_batch
 ```
 
-#### Log-likelihood
+### Log-likelihood
+
+By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$:
 
-By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$, i.e., 
-```math 
+```math
 \begin{aligned}
 & \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\
 & = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)}
 \end{aligned}
 ```
-Forward KL minimization is typically used for **generative modeling**, 
-where one is given a set of samples from the target distribution $p$ (e.g., images)
-and aims to learn the density or a generative process that outputs high quality samples.
+
+Forward KL minimization is typically used for generative modeling when samples from `p` are given.
 
 ```@docs
 NormalizingFlows.loglikelihood
 ```
 
-
 ## Training Loop
 
 ```@docs
diff --git a/docs/src/example.md b/docs/src/example.md
@@ -1,38 +1,40 @@
 ## Example: Using Planar Flow
 
-Here we provide a minimal demonstration of learning a synthetic 2d banana distribution
-using *planar flows* (Renzende *et al.* 2015) by maximizing the [Evidence Lower Bound (ELBO)](@ref).
+Here we provide a minimal demonstration of learning a synthetic 2D banana distribution
+using planar flows (Rezende and Mohamed, 2015) by maximizing the ELBO.
 To complete this task, the two key inputs are:
 - the log-density function of the target distribution, 
 - the planar flow. 
 
-#### The Target Distribution
+- the log-density function of the target distribution
+- the planar flow
+
+### The Target Distribution
+
+The `Banana` object is defined in `example/targets/banana.jl` (see the source for details).
 
-The `Banana` object is defined in `example/targets/banana.jl`, see the [source code](https://github.com/zuhengxu/NormalizingFlows.jl/blob/main/example/targets/banana.jl) for details.
 ```julia
 p = Banana(2, 1.0f-1, 100.0f0)
 logp = Base.Fix1(logpdf, p)
 ```
-Visualize the contour of the log-density and the sample scatters of the target distribution: 
-![Banana](banana.png)
 
+Visualize the contour of the log-density and the sample scatters of the target distribution:
 
+![Banana](banana.png)
 
+### The Planar Flow
 
-#### The Planar Flow 
+The planar flow is defined by repeatedly applying a sequence of invertible
+transformations to a base distribution $q_0$. The building blocks for a planar flow
+of length $N$ are the following invertible transformations, called planar layers:
 
-The planar flow is defined by repeated applying a sequence of invertible
-transformations to a base distribution $q_0$.  The building blocks for a planar flow
-of length $N$ are the following invertible transformations, called *planar layers*:
 ```math
-\text{planar layers}: 
-T_{n, \theta_n}(x)=x+u_n \cdot \tanh \left(w_n^T x+b_n\right), \quad n=1, \ldots, N, 
+T_{n, \theta_n}(x)=x+u_n \cdot \tanh \left(w_n^T x+b_n\right), \quad n=1, \ldots, N.
 ```
-where $\theta_n = (u_n, w_n, b_n), n=1, \dots, N$ are the parameters to be learned. 
-Thankfully, [`Bijectors.jl`](https://github.com/TuringLang/Bijectors.jl)
-provides a nice framework to define a normalizing flow.
-Here we used the `PlanarLayer()` from `Bijectors.jl` to construct a 
-20-layer planar flow, of which the base distribution is a 2d standard Gaussian distribution.
+
+Here $\theta_n = (u_n, w_n, b_n), n=1, \dots, N$ are the parameters to be learned.
+[`Bijectors.jl`](https://github.com/TuringLang/Bijectors.jl) provides `PlanarLayer()`.
+Below is a 20-layer planar flow on a 2D standard Gaussian base distribution.
 
 ```julia
 using Bijectors, FunctionChains
@@ -51,8 +53,9 @@ q₀ = MvNormal(zeros(Float32, 2), I)
 flow = create_planar_flow(20, q₀)
 flow_untrained = deepcopy(flow) # keep a copy of the untrained flow for comparison
 ```
-*Notice that here the flow layers are chained together using `fchain` function from [`FunctionChains.jl`](https://github.com/oschulz/FunctionChains.jl). 
-Alternatively, one can do*
+
+Notice: Using `fchain` (FunctionChains.jl) reduces compilation time versus chaining with `∘` for many layers.
+
 ```julia
 ts = reduce(∘, [f32(PlanarLayer(d)) for i in 1:20]) 
 ```
diff --git a/src/NormalizingFlows.jl b/src/NormalizingFlows.jl
@@ -21,11 +21,11 @@ export train_flow, elbo, elbo_batch, loglikelihood
 
 Train the given normalizing flow `flow` by calling `optimize`.
 
-# Arguments
-- `rng::AbstractRNG`: random number generator
-- `vo`: variational objective
-- `flow`: normalizing flow to be trained, we recommend to define flow as `<:Bijectors.TransformedDistribution` 
-- `args...`: additional arguments for `vo`
+Arguments
+- `rng::AbstractRNG`: random number generator (default: `Random.default_rng()`)
+- `vo`: objective with signature `vo(rng, flow, args...)`
+- `flow`: a `Bijectors.TransformedDistribution` (recommended)
+- `args...`: additional arguments passed to `vo`
 
 # Keyword Arguments
 - `max_iters::Int=1000`: maximum number of iterations
diff --git a/src/flows/neuralspline.jl b/src/flows/neuralspline.jl
@@ -121,19 +121,23 @@ end
 
 
 """
-    NSF_layer(dims, hdims; paramtype = Float64)
-Default constructor of single layer of Neural Spline Flow (NSF) 
-which is a composition of 2 neural spline coupling transformations with complementary masks.
-The masking strategy is odd-even masking.
-# Arguments
+    NSF_layer(dims, hdims, K, B; paramtype = Float64)
+
+Default constructor of a single layer of Neural Spline Flow (NSF), which is a
+composition of two neural spline coupling transformations with complementary
+odd–even masks.
+
+Arguments
 - `dims::Int`: dimension of the problem
-- `hdims::AbstractVector{Int}`: dimension of hidden units for s and t
-- `K::Int`: number of knots
-- `B::AbstractFloat`: bound of the knots
-# Keyword Arguments
-- `paramtype::Type{T} = Float64`: type of the parameters, defaults to `Float64`
-# Returns
-- A `Bijectors.Bijector` representing the NSF layer.
+- `hdims::AbstractVector{Int}`: hidden sizes of the MLP used to parameterize the spline
+- `K::Int`: number of knots for the rational quadratic spline
+- `B::AbstractFloat`: boundary for the spline domain
+
+Keyword Arguments
+- `paramtype::Type{T} = Float64`: parameter element type
+
+Returns
+- A `Bijectors.Bijector` representing the NSF layer
 """
 function NSF_layer(
     dims::T1,                      # dimension of problem
@@ -152,6 +156,27 @@ function NSF_layer(
     return reduce(∘, (nsf1, nsf2))
 end
 
+"""
+    nsf(q0, hdims, K, B, nlayers; paramtype = Float64)
+
+Default constructor of Neural Spline Flow (NSF), which composes `nlayers` NSF_layer
+blocks with odd-even masking.
+
+Arguments
+- `q0::Distribution{Multivariate,Continuous}`: base distribution (e.g., `MvNormal(zeros(d), I)`).
+- `hdims::AbstractVector{Int}`: hidden layer sizes of the coupling networks.
+- `K::Int`: number of spline knots.
+- `B::AbstractFloat`: boundary range for spline knots.
+- `nlayers::Int`: number of NSF_layer blocks.
+
+Keyword Arguments
+- `paramtype::Type{T} = Float64`: parameter element type (e.g., `Float32` for GPU-friendly).
+
+Returns
+- `Bijectors.MultivariateTransformed` representing the NSF flow.
+
+Use the shorthand `nsf(q0)` to construct a default configuration.
+"""
 function nsf(
     q0::Distribution{Multivariate,Continuous},  
     hdims::AbstractVector{Int},     # dimension of hidden units for s and t
diff --git a/src/flows/realnvp.jl b/src/flows/realnvp.jl
@@ -1,9 +1,12 @@
 """
-Default constructor of Affine Coupling flow layer
+Affine coupling layer used in RealNVP.
 
-following the general architecture as Eq(3) in [^AD2025]
+Implements two subnetworks `s` (scale, exponentiated) and `t` (shift) applied to
+one partition of the input, conditioned on the complementary partition. The
+scale network uses `tanh` on its output before exponentiation to improve
+stability during training.
 
-[^AD2025]: Agrawal, J., & Domke, J. (2025). Disentangling impact of capacity, objective, batchsize, estimators, and step-size on flow VI. In *AISTATS*
+See also: Dinh et al., 2016 (RealNVP).
 """
 struct AffineCoupling <: Bijectors.Bijector
     dim::Int
@@ -119,19 +122,18 @@ end
 """
     RealNVP_layer(dims, hdims; paramtype = Float64)
 
-Default constructor of single layer of realnvp flow, 
-which is a composition of 2 affine coupling transformations with complementary masks.
-The masking strategy is odd-even masking.
+Construct a single RealNVP layer using two affine coupling bijections with
+odd–even masks.
 
-# Arguments
-- `dims::Int`: dimension of the problem
-- `hdims::AbstractVector{Int}`: dimension of hidden units for s and t
+Arguments
+- `dims::Int`: dimensionality of the target distribution
+- `hdims::AbstractVector{Int}`: hidden sizes for the `s` and `t` MLPs
 
-# Keyword Arguments
-- `paramtype::Type{T} = Float64`: type of the parameters, defaults to `Float64`
+Keyword Arguments
+- `paramtype::Type{T} = Float64`: parameter element type
 
-# Returns
-- A `Bijectors.Bijector` representing the RealNVP layer.
+Returns
+- A `Bijectors.Bijector` representing the RealNVP layer
 """
 function RealNVP_layer(
     dims::Int,                      # dimension of problem
@@ -149,22 +151,24 @@ function RealNVP_layer(
 end
 
 """
-    realnvp(q0, dims, hdims, nlayers; paramtype = Float64)
+    realnvp(q0, hdims, nlayers; paramtype = Float64)
+    realnvp(q0; paramtype = Float64)
 
-Default constructor of RealNVP flow, which is a composition of `nlayers` RealNVP_layer.
-# Arguments
-- `q0::Distribution{Multivariate,Continuous}`: reference distribution, e.g. `MvNormal(zeros(dims), I)`
-- `dims::Int`: dimension of problem
-- `hdims::AbstractVector{Int}`: dimension of hidden units for s and t
-- `nlayers::Int`: number of RealNVP_layer  
-# Keyword Arguments
-- `paramtype::Type{T} = Float64`: type of the parameters, defaults to `Float64`
+Construct a RealNVP flow by stacking `nlayers` RealNVP_layer blocks with
+odd–even masking. The no-argument variant uses 10 layers with `[32, 32]`
+hidden sizes per coupling network.
 
-# Returns
-- A `Bijectors.MultivariateTransformed` representing the RealNVP flow.
+Arguments
+- `q0::Distribution{Multivariate,Continuous}`: base distribution (e.g. `MvNormal(zeros(d), I)`)
+- `hdims::AbstractVector{Int}`: hidden sizes for the `s` and `t` MLPs
+- `nlayers::Int`: number of RealNVP layers
 
-"""
+Keyword Arguments
+- `paramtype::Type{T} = Float64`: parameter element type
 
+Returns
+- `Bijectors.MultivariateTransformed` representing the RealNVP flow
+"""
 function realnvp(
     q0::Distribution{Multivariate,Continuous},  
     hdims::AbstractVector{Int},     # dimension of hidden units for s and t
diff --git a/src/flows/utils.jl b/src/flows/utils.jl
diff --git a/src/objectives/elbo.jl b/src/objectives/elbo.jl