Skip to content

Commit 7102906

Browse files
jeremiahpslewiscscherrermschauergdalle
authored
Doc adjustments (#169)
* Tweaks * Further tweaks * Update docs/src/affine.md Co-authored-by: Chad Scherrer <[email protected]> * Update docs/src/affine.md * Update docs/src/affine.md * Update docs/src/affine.md * Update docs/src/affine.md Co-authored-by: Chad Scherrer <[email protected]> * Update docs/src/affine.md * Update docs/src/affine.md Co-authored-by: Chad Scherrer <[email protected]> * Update docs/src/affine.md Co-authored-by: Moritz Schauer <[email protected]> * Update docs/src/affine.md Co-authored-by: Chad Scherrer <[email protected]> Co-authored-by: Moritz Schauer <[email protected]> Co-authored-by: Guillaume Dalle <[email protected]>
1 parent 9f10340 commit 7102906

File tree

1 file changed

+21
-17
lines changed

1 file changed

+21
-17
lines changed

docs/src/affine.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,31 @@
11
# Affine Transformations
22

3-
It's very common for measures to be parameterized by `μ` and `σ`, for example as in `Normal(μ=3, σ=4)` or `StudentT(ν=1, μ=3, σ=4)`. In this context, `μ` and `σ` do not always refer to the mean and standard deviation (the `StudentT` above is equivalent to a Cauchy, so both are undefined).
3+
It's very common for measures to use parameters `μ` and `σ`, for example as in `Normal(μ=3, σ=4)` or `StudentT(ν=1, μ=3, σ=4)`. In this context, `μ` and `σ` need not always refer to the mean and standard deviation (the `StudentT` measure specified above is equivalent to a [Cauchy](https://en.wikipedia.org/wiki/Cauchy_distribution) measure, so both mean and standard deviation are undefined).
44

5-
Rather, `μ` is a "location parameter", and `σ` is a "scale parameter". Together these determine an affine transformation
5+
In general, `μ` is a "location parameter", and `σ` is a "scale parameter". Together these parameters determine an affine transformation.
66

77
```math
88
f(z) = σ z + μ
99
```
1010

11-
Here are below, we'll use ``z`` to represent an "un-transformed" variable, typically coming from a measure like `Normal()` with no location or scale parameters.
11+
Starting with the above definition, we'll use ``z`` to represent an "un-transformed" variable, typically coming from a measure which has neither a location nor a scale parameter, for example `Normal()`.
1212

13-
Affine transforms are often incorrectly referred to as "linear". Linearity requires ``f(ax + by) = a f(x) + b f(y)`` for scalars ``a`` and ``b``, which only holds for the above ``f`` if ``μ=0``.
13+
Affine transformations are often ambiguously referred as "linear transformations". In fact, an affine transformation is ["the composition of two functions: a translation and a linear map"](https://en.wikipedia.org/wiki/Affine_transformation#Representation) in the stricter algebraic sense: For a function `f` to be linear requires
14+
``f(ax + by) == a f(x) + b f(y)``
15+
for scalars ``a`` and ``b``. For an affine function
16+
``f(z) = σ * z + μ``, where the linear map is defined as ``σ`` and the translation defined as ``μ``,
17+
linearity holds only if the translation component ``μ`` is equal to zero.
1418

1519

1620
## Cholesky-based parameterizations
1721

18-
If the "un-transformed" `z` is a scalar, things are relatively simple. But it's important our approach handle the multivariate case as well.
22+
If the "un-transformed" `z` is univariate, things are relatively simple. But it's important our approach handle the multivariate case as well.
1923

20-
In the literature, it's common for a multivariate normal distribution to be parameterized by a mean `μ` and covariance matrix `Σ`. This is mathematically convenient, but can be very awkward from a computational perspective.
24+
In the literature, it's common for a multivariate normal distribution to be parameterized by a mean `μ` and covariance matrix `Σ`. This is mathematically convenient, but leads to an ``O(n^3)`` [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition), which becomes a significant bottleneck to compute as ``n`` gets large.
2125

2226
While MeasureTheory.jl includes (or will include) a parameterization using `Σ`, we prefer to work in terms of its Cholesky decomposition ``σ``.
2327

24-
Using "``σ``" for this may seem strange at first, so we should explain the notation. Let ``σ`` be a lower-triangular matrix satisfying
28+
To see the relationship between our ``σ`` parameterization and the likely more familiar ``Σ`` parameterization, let ``σ`` be a lower-triangular matrix satisfying
2529

2630
```math
2731
σ σᵗ = Σ
@@ -33,23 +37,23 @@ Then given a (multivariate) standard normal ``z``, the covariance matrix of ``σ
3337
𝕍[σ z + μ] = Σ
3438
```
3539

36-
Comparing to the one dimensional case where
40+
The one-dimensional case where we have
3741

3842
```math
3943
𝕍[σ z + μ] = σ²
4044
```
4145

42-
shows that the lower Cholesky factor of the covariance generalizes the concept of standard deviation, justifying the notation.
46+
shows that the lower Cholesky factor of the covariance generalizes the concept of standard deviation, completing the link between ``σ`` and `Σ`.
4347

4448
## The "Cholesky precision" parameterization
4549

46-
The ``(μ,σ)`` parameterization is especially convenient for random sampling. Any `z ~ Normal()` determines an `x ~ Normal(μ,σ)` through
50+
The ``(μ,σ)`` parameterization is especially convenient for random sampling. Any measure `z ~ Normal()` determines an `x ~ Normal(μ,σ)` through the affine transformation
4751

4852
```math
4953
x = σ z + μ
5054
```
5155

52-
On the other hand, the log-density computation is not quite so simple. Starting with an ``x``, we need to find ``z`` using
56+
The log-density transformation of a `Normal` with parameters μ, σ does not follow as directly. Starting with an ``x``, we need to find ``z`` using
5357

5458
```math
5559
z = σ⁻¹ (x - μ)
@@ -63,19 +67,19 @@ logdensity(d::Normal{(:μ,:σ)}, x) = logdensity(d.σ \ (x - d.μ)) - logdet(d.
6367

6468
Here the `- logdet(σ)` is the "log absolute Jacobian", required to account for the stretching of the space.
6569

66-
The above requires solving a linear system, which adds some overhead. Even with the convenience of a lower triangular system, it's still not quite a efficient as a multiplication.
70+
The above requires solving a linear system, which adds some overhead. Even with the convenience of a lower triangular system, it's still not quite as efficient as multiplication.
6771

68-
In addition to the covariance ``Σ``, it's also common to parameterize a multivariate normal by its _precision matrix_, ``Ω = Σ⁻¹``. Similarly to our use of ``σ``, we'll use ``ω`` for the lower Cholesky factor of ``Ω``.
72+
In addition to the covariance ``Σ``, it's also common to parameterize a multivariate normal by its _precision matrix_, defined as the inverse of the covariance matrix, ``Ω = Σ⁻¹``. Similar to our use of ``σ`` for the lower Cholesky factor of `Σ`, we'll use ``ω`` for the lower Cholesky factor of ``Ω``.
6973

70-
This allows a more efficient log-density,
74+
This parameterization enables more efficient calculation of the log-density using only multiplication and addition,
7175

7276
```julia
7377
logdensity(d::Normal{(:μ,:ω)}, x) = logdensity(d.ω * (x - d.μ)) + logdet(d.ω)
7478
```
7579

7680
## `AffineTransform`
7781

78-
Transforms like ``z → σ z + μ`` and ``z → ω \ z + μ`` can be represented using an `AffineTransform`. For example,
82+
Transforms like ``z → σ z + μ`` and ``z → ω \ z + μ`` can be specified in MeasureTheory.jl using an `AffineTransform`. For example,
7983

8084
```julia
8185
julia> f = AffineTransform((μ=3.=2.))
@@ -85,9 +89,9 @@ julia> f(1.0)
8589
5.0
8690
```
8791

88-
In the scalar case this is relatively simple to invert. But if `σ` is a matrix, this would require matrix inversion. Adding to this complication is that lower triangular matrices are not closed under matrix inversion.
92+
In the univariate case this is relatively simple to invert. But if `σ` is a matrix, matrix inversion becomes necessary. This is not always possible as lower triangular matrices are not closed under matrix inversion and as such are not guaranteed to exist.
8993

90-
Our multiple parameterizations make it convenient to deal with these issues. The inverse transform of a ``(μ,σ)`` transform will be in terms of ``(μ,ω)``, and vice-versa. So
94+
With multiple parameterizations of a given family of measures, we can work around these issues. The inverse transform of a ``(μ,σ)`` transform will be in terms of ``(μ,ω)``, and vice-versa. So
9195

9296
```julia
9397
julia> f⁻¹ = inv(f)

0 commit comments

Comments
 (0)