Skip to content

Commit 48829ad

Browse files
committed
update doc
1 parent 0b9e656 commit 48829ad

File tree

9 files changed

+181
-122
lines changed

9 files changed

+181
-122
lines changed

docs/Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,6 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
55
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
66
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
77
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
8+
LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589"
89
NormalizingFlows = "50e4474d-9f12-44b7-af7a-91ab30ff6256"
910
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

docs/make.jl

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,15 @@ DocMeta.setdocmeta!(
77

88
makedocs(;
99
modules=[NormalizingFlows],
10-
repo="https://github.com/TuringLang/NormalizingFlows.jl/blob/{commit}{path}#{line}",
1110
sitename="NormalizingFlows.jl",
12-
format=Documenter.HTML(),
11+
format=Documenter.HTML(;
12+
repolink="https://github.com/TuringLang/NormalizingFlows.jl/blob/{commit}{path}#{line}",
13+
),
1314
pages=[
1415
"Home" => "index.md",
1516
"API" => "api.md",
1617
"Example" => "example.md",
1718
"Customize your own flow layer" => "customized_layer.md",
1819
],
1920
checkdocs=:exports,
20-
)
21+
)

docs/src/api.md

Lines changed: 45 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,74 @@
1-
## API
1+
# API
22

33
```@index
44
```
55

6-
76
## Main Function
87

98
```@docs
109
NormalizingFlows.train_flow
1110
```
1211

13-
The flow object can be constructed by `transformed` function in `Bijectors.jl` package.
14-
For example of Gaussian VI, we can construct the flow as follows:
15-
```@julia
12+
The flow object can be constructed by `transformed` function in `Bijectors.jl`.
13+
For example, for Gaussian VI, we can construct the flow as follows:
14+
15+
```julia
1616
using Distributions, Bijectors
17-
T= Float32
17+
T = Float32
1818
@leaf MvNormal # to prevent params in q₀ from being optimized
1919
q₀ = MvNormal(zeros(T, 2), ones(T, 2))
2020
flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) Bijectors.Scale(ones(T, 2)))
2121
```
22-
To train the Gaussian VI targeting at distirbution $p$ via ELBO maiximization, we can run
23-
```@julia
24-
using NormalizingFlows
22+
23+
To train the Gaussian VI targeting distribution `p` via ELBO maximization, run:
24+
25+
```julia
26+
using NormalizingFlows, Optimisers
2527

2628
sample_per_iter = 10
2729
flow_trained, stats, _ = train_flow(
2830
elbo,
2931
flow,
3032
logp,
3133
sample_per_iter;
32-
max_iters=2_000,
33-
optimiser=Optimisers.ADAM(0.01 * one(T)),
34+
max_iters = 2_000,
35+
optimiser = Optimisers.ADAM(0.01 * one(T)),
3436
)
3537
```
36-
## Variational Objectives
37-
We have implemented two variational objectives, namely, ELBO and the log-likelihood objective.
38-
Users can also define their own objective functions, and pass it to the [`train_flow`](@ref) function.
39-
`train_flow` will optimize the flow parameters by maximizing `vo`.
40-
The objective function should take the following general form:
41-
```julia
42-
vo(rng, flow, args...)
38+
39+
## Coupling-based flows (default constructors)
40+
41+
These helpers construct commonly used coupling-based flows with sensible defaults.
42+
43+
```@docs
44+
NormalizingFlows.realnvp
45+
NormalizingFlows.nsf
46+
NormalizingFlows.RealNVP_layer
47+
NormalizingFlows.NSF_layer
48+
NormalizingFlows.AffineCoupling
49+
NormalizingFlows.NeuralSplineCoupling
50+
NormalizingFlows.create_flow
4351
```
44-
where `rng` is the random number generator, `flow` is the flow object, and `args...` are the
45-
additional arguments that users can pass to the objective function.
4652

47-
#### Evidence Lower Bound (ELBO)
48-
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$, i.e.,
49-
```math
53+
## Variational Objectives
54+
55+
We provide ELBO (reverse KL) and expected log-likelihood (forward KL). You can also
56+
supply your own objective with the signature `vo(rng, flow, args...)`.
57+
58+
### Evidence Lower Bound (ELBO)
59+
60+
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$:
61+
62+
```math
5063
\begin{aligned}
5164
&\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\
5265
& = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ
5366
T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ
54-
F_1(X)\right)\right] \quad \text{(ELBO)}
67+
F_1(X)\right)\right] \quad \text{(ELBO)}
5568
\end{aligned}
5669
```
57-
Reverse KL minimization is typically used for **Bayesian computation**,
58-
where one only has access to the log-(unnormalized)density of the target distribution $p$ (e.g., a Bayesian posterior distribution),
59-
and hope to generate approximate samples from it.
70+
71+
Reverse KL minimization is typically used for Bayesian computation when only `logp` is available.
6072

6173
```@docs
6274
NormalizingFlows.elbo
@@ -66,24 +78,23 @@ NormalizingFlows.elbo
6678
NormalizingFlows.elbo_batch
6779
```
6880

69-
#### Log-likelihood
81+
### Log-likelihood
82+
83+
By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$:
7084

71-
By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$, i.e.,
72-
```math
85+
```math
7386
\begin{aligned}
7487
& \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\
7588
& = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)}
7689
\end{aligned}
7790
```
78-
Forward KL minimization is typically used for **generative modeling**,
79-
where one is given a set of samples from the target distribution $p$ (e.g., images)
80-
and aims to learn the density or a generative process that outputs high quality samples.
91+
92+
Forward KL minimization is typically used for generative modeling when samples from `p` are given.
8193

8294
```@docs
8395
NormalizingFlows.loglikelihood
8496
```
8597

86-
8798
## Training Loop
8899

89100
```@docs

docs/src/example.md

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,40 @@
11
## Example: Using Planar Flow
22

3-
Here we provide a minimal demonstration of learning a synthetic 2d banana distribution
4-
using *planar flows* (Renzende *et al.* 2015) by maximizing the [Evidence Lower Bound (ELBO)](@ref).
3+
Here we provide a minimal demonstration of learning a synthetic 2D banana distribution
4+
using planar flows (Rezende and Mohamed, 2015) by maximizing the ELBO.
55
To complete this task, the two key inputs are:
66
- the log-density function of the target distribution,
77
- the planar flow.
88

9-
#### The Target Distribution
9+
- the log-density function of the target distribution
10+
- the planar flow
11+
12+
### The Target Distribution
13+
14+
The `Banana` object is defined in `example/targets/banana.jl` (see the source for details).
1015

11-
The `Banana` object is defined in `example/targets/banana.jl`, see the [source code](https://github.com/zuhengxu/NormalizingFlows.jl/blob/main/example/targets/banana.jl) for details.
1216
```julia
1317
p = Banana(2, 1.0f-1, 100.0f0)
1418
logp = Base.Fix1(logpdf, p)
1519
```
16-
Visualize the contour of the log-density and the sample scatters of the target distribution:
17-
![Banana](banana.png)
1820

21+
Visualize the contour of the log-density and the sample scatters of the target distribution:
1922

23+
![Banana](banana.png)
2024

25+
### The Planar Flow
2126

22-
#### The Planar Flow
27+
The planar flow is defined by repeatedly applying a sequence of invertible
28+
transformations to a base distribution $q_0$. The building blocks for a planar flow
29+
of length $N$ are the following invertible transformations, called planar layers:
2330

24-
The planar flow is defined by repeated applying a sequence of invertible
25-
transformations to a base distribution $q_0$. The building blocks for a planar flow
26-
of length $N$ are the following invertible transformations, called *planar layers*:
2731
```math
28-
\text{planar layers}:
29-
T_{n, \theta_n}(x)=x+u_n \cdot \tanh \left(w_n^T x+b_n\right), \quad n=1, \ldots, N,
32+
T_{n, \theta_n}(x)=x+u_n \cdot \tanh \left(w_n^T x+b_n\right), \quad n=1, \ldots, N.
3033
```
31-
where $\theta_n = (u_n, w_n, b_n), n=1, \dots, N$ are the parameters to be learned.
32-
Thankfully, [`Bijectors.jl`](https://github.com/TuringLang/Bijectors.jl)
33-
provides a nice framework to define a normalizing flow.
34-
Here we used the `PlanarLayer()` from `Bijectors.jl` to construct a
35-
20-layer planar flow, of which the base distribution is a 2d standard Gaussian distribution.
34+
35+
Here $\theta_n = (u_n, w_n, b_n), n=1, \dots, N$ are the parameters to be learned.
36+
[`Bijectors.jl`](https://github.com/TuringLang/Bijectors.jl) provides `PlanarLayer()`.
37+
Below is a 20-layer planar flow on a 2D standard Gaussian base distribution.
3638

3739
```julia
3840
using Bijectors, FunctionChains
@@ -51,8 +53,9 @@ q₀ = MvNormal(zeros(Float32, 2), I)
5153
flow = create_planar_flow(20, q₀)
5254
flow_untrained = deepcopy(flow) # keep a copy of the untrained flow for comparison
5355
```
54-
*Notice that here the flow layers are chained together using `fchain` function from [`FunctionChains.jl`](https://github.com/oschulz/FunctionChains.jl).
55-
Alternatively, one can do*
56+
57+
Notice: Using `fchain` (FunctionChains.jl) reduces compilation time versus chaining with `` for many layers.
58+
5659
```julia
5760
ts = reduce(, [f32(PlanarLayer(d)) for i in 1:20])
5861
```

src/NormalizingFlows.jl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,11 @@ export train_flow, elbo, elbo_batch, loglikelihood
2121
2222
Train the given normalizing flow `flow` by calling `optimize`.
2323
24-
# Arguments
25-
- `rng::AbstractRNG`: random number generator
26-
- `vo`: variational objective
27-
- `flow`: normalizing flow to be trained, we recommend to define flow as `<:Bijectors.TransformedDistribution`
28-
- `args...`: additional arguments for `vo`
24+
Arguments
25+
- `rng::AbstractRNG`: random number generator (default: `Random.default_rng()`)
26+
- `vo`: objective with signature `vo(rng, flow, args...)`
27+
- `flow`: a `Bijectors.TransformedDistribution` (recommended)
28+
- `args...`: additional arguments passed to `vo`
2929
3030
# Keyword Arguments
3131
- `max_iters::Int=1000`: maximum number of iterations

src/flows/neuralspline.jl

Lines changed: 37 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -121,19 +121,23 @@ end
121121

122122

123123
"""
124-
NSF_layer(dims, hdims; paramtype = Float64)
125-
Default constructor of single layer of Neural Spline Flow (NSF)
126-
which is a composition of 2 neural spline coupling transformations with complementary masks.
127-
The masking strategy is odd-even masking.
128-
# Arguments
124+
NSF_layer(dims, hdims, K, B; paramtype = Float64)
125+
126+
Default constructor of a single layer of Neural Spline Flow (NSF), which is a
127+
composition of two neural spline coupling transformations with complementary
128+
odd–even masks.
129+
130+
Arguments
129131
- `dims::Int`: dimension of the problem
130-
- `hdims::AbstractVector{Int}`: dimension of hidden units for s and t
131-
- `K::Int`: number of knots
132-
- `B::AbstractFloat`: bound of the knots
133-
# Keyword Arguments
134-
- `paramtype::Type{T} = Float64`: type of the parameters, defaults to `Float64`
135-
# Returns
136-
- A `Bijectors.Bijector` representing the NSF layer.
132+
- `hdims::AbstractVector{Int}`: hidden sizes of the MLP used to parameterize the spline
133+
- `K::Int`: number of knots for the rational quadratic spline
134+
- `B::AbstractFloat`: boundary for the spline domain
135+
136+
Keyword Arguments
137+
- `paramtype::Type{T} = Float64`: parameter element type
138+
139+
Returns
140+
- A `Bijectors.Bijector` representing the NSF layer
137141
"""
138142
function NSF_layer(
139143
dims::T1, # dimension of problem
@@ -152,6 +156,27 @@ function NSF_layer(
152156
return reduce(, (nsf1, nsf2))
153157
end
154158

159+
"""
160+
nsf(q0, hdims, K, B, nlayers; paramtype = Float64)
161+
162+
Default constructor of Neural Spline Flow (NSF), which composes `nlayers` NSF_layer
163+
blocks with odd-even masking.
164+
165+
Arguments
166+
- `q0::Distribution{Multivariate,Continuous}`: base distribution (e.g., `MvNormal(zeros(d), I)`).
167+
- `hdims::AbstractVector{Int}`: hidden layer sizes of the coupling networks.
168+
- `K::Int`: number of spline knots.
169+
- `B::AbstractFloat`: boundary range for spline knots.
170+
- `nlayers::Int`: number of NSF_layer blocks.
171+
172+
Keyword Arguments
173+
- `paramtype::Type{T} = Float64`: parameter element type (e.g., `Float32` for GPU-friendly).
174+
175+
Returns
176+
- `Bijectors.MultivariateTransformed` representing the NSF flow.
177+
178+
Use the shorthand `nsf(q0)` to construct a default configuration.
179+
"""
155180
function nsf(
156181
q0::Distribution{Multivariate,Continuous},
157182
hdims::AbstractVector{Int}, # dimension of hidden units for s and t

src/flows/realnvp.jl

Lines changed: 29 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
"""
2-
Default constructor of Affine Coupling flow layer
2+
Affine coupling layer used in RealNVP.
33
4-
following the general architecture as Eq(3) in [^AD2025]
4+
Implements two subnetworks `s` (scale, exponentiated) and `t` (shift) applied to
5+
one partition of the input, conditioned on the complementary partition. The
6+
scale network uses `tanh` on its output before exponentiation to improve
7+
stability during training.
58
6-
[^AD2025]: Agrawal, J., & Domke, J. (2025). Disentangling impact of capacity, objective, batchsize, estimators, and step-size on flow VI. In *AISTATS*
9+
See also: Dinh et al., 2016 (RealNVP).
710
"""
811
struct AffineCoupling <: Bijectors.Bijector
912
dim::Int
@@ -119,19 +122,18 @@ end
119122
"""
120123
RealNVP_layer(dims, hdims; paramtype = Float64)
121124
122-
Default constructor of single layer of realnvp flow,
123-
which is a composition of 2 affine coupling transformations with complementary masks.
124-
The masking strategy is odd-even masking.
125+
Construct a single RealNVP layer using two affine coupling bijections with
126+
odd–even masks.
125127
126-
# Arguments
127-
- `dims::Int`: dimension of the problem
128-
- `hdims::AbstractVector{Int}`: dimension of hidden units for s and t
128+
Arguments
129+
- `dims::Int`: dimensionality of the target distribution
130+
- `hdims::AbstractVector{Int}`: hidden sizes for the `s` and `t` MLPs
129131
130-
# Keyword Arguments
131-
- `paramtype::Type{T} = Float64`: type of the parameters, defaults to `Float64`
132+
Keyword Arguments
133+
- `paramtype::Type{T} = Float64`: parameter element type
132134
133-
# Returns
134-
- A `Bijectors.Bijector` representing the RealNVP layer.
135+
Returns
136+
- A `Bijectors.Bijector` representing the RealNVP layer
135137
"""
136138
function RealNVP_layer(
137139
dims::Int, # dimension of problem
@@ -149,22 +151,24 @@ function RealNVP_layer(
149151
end
150152

151153
"""
152-
realnvp(q0, dims, hdims, nlayers; paramtype = Float64)
154+
realnvp(q0, hdims, nlayers; paramtype = Float64)
155+
realnvp(q0; paramtype = Float64)
153156
154-
Default constructor of RealNVP flow, which is a composition of `nlayers` RealNVP_layer.
155-
# Arguments
156-
- `q0::Distribution{Multivariate,Continuous}`: reference distribution, e.g. `MvNormal(zeros(dims), I)`
157-
- `dims::Int`: dimension of problem
158-
- `hdims::AbstractVector{Int}`: dimension of hidden units for s and t
159-
- `nlayers::Int`: number of RealNVP_layer
160-
# Keyword Arguments
161-
- `paramtype::Type{T} = Float64`: type of the parameters, defaults to `Float64`
157+
Construct a RealNVP flow by stacking `nlayers` RealNVP_layer blocks with
158+
odd–even masking. The no-argument variant uses 10 layers with `[32, 32]`
159+
hidden sizes per coupling network.
162160
163-
# Returns
164-
- A `Bijectors.MultivariateTransformed` representing the RealNVP flow.
161+
Arguments
162+
- `q0::Distribution{Multivariate,Continuous}`: base distribution (e.g. `MvNormal(zeros(d), I)`)
163+
- `hdims::AbstractVector{Int}`: hidden sizes for the `s` and `t` MLPs
164+
- `nlayers::Int`: number of RealNVP layers
165165
166-
"""
166+
Keyword Arguments
167+
- `paramtype::Type{T} = Float64`: parameter element type
167168
169+
Returns
170+
- `Bijectors.MultivariateTransformed` representing the RealNVP flow
171+
"""
168172
function realnvp(
169173
q0::Distribution{Multivariate,Continuous},
170174
hdims::AbstractVector{Int}, # dimension of hidden units for s and t

0 commit comments

Comments
 (0)