|
1 | 1 | # Affine Transformations
|
2 | 2 |
|
3 |
| -It's very common for measures to be parameterized by `μ` and `σ`, for example as in `Normal(μ=3,σ=4)` or `StudentT(ν=1,μ=3,σ=4)`. In this context, `μ` and `σ` do not always refer to the mean and standard deviation (the `StudentT` above is equivalent to a Cauchy, so both are undefined). |
| 3 | +It's very common for measures to be parameterized by `μ` and `σ`, for example as in `Normal(μ=3, σ=4)` or `StudentT(ν=1, μ=3, σ=4)`. In this context, `μ` and `σ` do not always refer to the mean and standard deviation (the `StudentT` above is equivalent to a Cauchy, so both are undefined). |
4 | 4 |
|
5 |
| -Rather, `μ` is a "location parameter", and `σ` is a "scale parameter". Together, these determine a transform |
| 5 | +Rather, `μ` is a "location parameter", and `σ` is a "scale parameter". Together these determine an affine transformation |
6 | 6 |
|
7 | 7 | ```math
|
8 |
| -x → σx + μ |
| 8 | +f(z) = σ z + μ |
| 9 | +``` |
| 10 | + |
| 11 | +Here are below, we'll use ``z`` to represent an "un-transformed" variable, typically coming from a measure like `Normal()` with no location or scale parameters. |
| 12 | + |
| 13 | +Affine transforms are often incorrectly referred to as "linear". Linearity requires ``f(ax + by) = a f(x) + b f(y)`` for scalars ``a`` and ``b``, which only holds for the above ``f`` if ``μ=0``. |
| 14 | + |
| 15 | + |
| 16 | +## Cholesky-based parameterizations |
| 17 | + |
| 18 | +If the "un-transformed" `z` is a scalar, things are relatively simple. But it's important our approach handle the multivariate case as well. |
| 19 | + |
| 20 | +In the literature, it's common for a multivariate normal distribution to be parameterized by a mean `μ` and covariance matrix `Σ`. This is mathematically convenient, but can be very awkward from a computational perspective. |
| 21 | + |
| 22 | +While MeasureTheory.jl includes (or will include) a parameterization using `Σ`, we prefer to work in terms of its Cholesky decomposition ``σ``. |
| 23 | + |
| 24 | +Using "``σ``" for this may seem strange at first, so we should explain the notation. Let ``σ`` be a lower-triangular matrix satisfying |
| 25 | + |
| 26 | +```math |
| 27 | +σ σᵗ = Σ |
| 28 | +``` |
| 29 | + |
| 30 | +Then given a (multivariate) standard normal ``z``, the covariance matrix of ``σ z + μ`` is |
| 31 | + |
| 32 | +```math |
| 33 | +𝕍[σ z + μ] = Σ |
| 34 | +``` |
| 35 | + |
| 36 | +This is similar to the one dimensional case where |
| 37 | + |
| 38 | +```math |
| 39 | +𝕍[σ z + μ] = σ² , |
| 40 | +``` |
| 41 | + |
| 42 | +and so the lower Cholesky factor of the covariance generalizes the concept of standard deviation, justifying the notation. |
| 43 | + |
| 44 | +## `Affine` and `AffineTransform` |
| 45 | + |
| 46 | + |
| 47 | + |
| 48 | +unif = ∫(x -> 0<x<1, Lebesgue(ℝ)) |
| 49 | + f = AffineTransform((μ=3,σ=2)) |
| 50 | + g = AffineTransform((μ=3,ω=2)) |
| 51 | + |
| 52 | +So for example, the implementation of `StudentT(ν=1, μ=3, σ=4)` is equivalent to |
| 53 | + |
| 54 | +```julia |
| 55 | +StudentT(nt::NamedTuple{(:ν,:μ,:σ)}) = Affine((μ=nt.μ, σ=nt.σ), StudentT((ν=1))) |
9 | 56 | ```
|
10 | 57 |
|
0 commit comments