Set up clean documentation (#168)

gdalle · web-flow · commit 7af6ad413495 · 2021-11-05T07:41:47.000-07:00
* Include MeasureBase docs in MeasureTheory

* Clean up README

* Remove support in docs

* Remove useless MeasureBase import

* Version bump
diff --git a/.gitignore b/.gitignore
@@ -2,7 +2,7 @@
 *.jl.cov
 *.jl.mem
 .DS_Store
-/Manifest.toml
+**/Manifest.toml
 /docs/build/
 /dev/
 /.history/
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "MeasureTheory"
 uuid = "eadaa1a4-d27c-401d-8699-e962e1bbc33b"
 authors = ["Chad Scherrer <chad.scherrer@gmail.com> and contributors"]
-version = "0.13.1"
+version = "0.13.2"
 
 [deps]
 Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697"
diff --git a/README.md b/README.md
@@ -5,201 +5,28 @@
 [![Build Status](https://github.com/cscherrer/MeasureTheory.jl/workflows/CI/badge.svg)](https://github.com/cscherrer/MeasureTheory.jl/actions)
 [![Coverage](https://codecov.io/gh/cscherrer/MeasureTheory.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/cscherrer/MeasureTheory.jl)
 
-Check out our [JuliaCon submission](https://github.com/cscherrer/MeasureTheory.jl/blob/paper/paper/paper.pdf)
-
 `MeasureTheory.jl` is a package for building and reasoning about measures.
 
-# Why?
+## Why?
 
-A distribution (as in Distributions.jl) is also called a _probability measure_, and carries with it the constraint of adding (or integrating) to one. Statistical work usually requires this "at the end of the day", but enforcing it at each step of a computation can have considerable overhead.
+A distribution (as provided by `Distributions.jl`) is also called a _probability measure_, and carries with it the constraint of adding (or integrating) to one. Statistical work usually requires this "at the end of the day", but enforcing it at each step of a computation can have considerable overhead. For instance, Bayesian modeling often requires working with unnormalized posterior densities or improper priors.
 
 As a generalization of the concept of volume, measures also have applications outside of probability theory.
 
-# Goals
-
-## Distributions.jl Compatibility
-
-Distirbutions.jl is wildly popular, and is large enough that replacing it all at once would be a major undertaking. 
-
-Instead, we should aim to make any Distribution easily usable as a Measure. We'll most likely implement this using an `IsMeasure` trait. 
-
-## Absolute Continuity
-
-For two measures μ, ν on a set X, we say μ is _absolutely continuous_ with respect to ν if ν(A)=0 implies μ(A)=0 for every measurable subset A of X.
-
-The following are equivalent:
-1. μ ≪ ν
-2. μ is absolutely continuous wrt ν
-3. There exists a function f such that μ = ∫f dν
-
-So we'll need a `≪` operator. Note that `≪` is not antisymmetric; it's common for both `μ ≪ ν` and  `ν ≪ μ` to hold. 
-
-If `μ ≪ ν` and  `ν ≪ μ`, we say μ and ν are _equivalent_ and write `μ ≃ ν`. (This is often written as `μ ~ ν`, but we reserve `~` for random variables following a distribution, as is common in the literature and probabilistic programming languages.)
-
-If we collapse the equivalence classes (under ≃), `≪` becomes a partial order.
+## Getting started
 
-_We need ≃ and ≪ to be fast_. If the support of a measure can be determined statically from its type, we can define ≃ and ≪ as generated functions. 
-
-## Radon-Nikodym Derivatives
-
-One of the equivalent conditions above was "There exists a function f such that μ = ∫f dν". In this case, `f` is called a _Radon-Nikodym derivative_, or (less formally) a _density_. In this case we often write `f = dμ/dν`.
-
-For any measures μ and ν with μ≪ν, we should be able to represent this.
-
-## Integration
-
-More generally, we'll need to be able to represent change of measure as above, `∫f dν`. We'll need an `Integral` type
+To install `MeasureTheory.jl`, open the Julia Pkg REPL (by typing `]` in the standard REPL) and run
 
 ```julia
-struct Integral{F,M}
-    f::F
-    μ::M
-end
+pkg> add MeasureTheory
 ```
 
-Then we'll have a function `∫`. For cases where μ = ∫f dν,  `∫(f, ν)` will just return `μ` (we can do this based on the types). For unknown cases (which will be most of them), we'll return `∫(f, ν) = Integral(f, ν)`.
-
-## Measure Combinators
-
-It should be very easy to build new measures from existing ones. This can be done using, for example, 
-
-- restriction
-- product measure
-- superposition
-- pushforward
-
-There's also function spaces, but this gets much trickier. We'll need to determine a good way to reason about this.
-
-## More???
-
-This is very much a work in progress. If there are things you think we should have as goals, please add an issue with the `goals` label.
-
-
-------------------
-# Old Stuff
-
-**WARNING: The current README is very developer-oriented. Casual use will be much simpler**
-
-For an example, let's walk through the construction of `src/probability/Normal`.
-
-First, we have
-
-```julia
-@measure Normal
-```
-
-this is just a little helper function, and is equivalent to
-
-# TODO: Clean up
-```julia
-quote
-    #= /home/chad/git/Measures.jl/src/Measures.jl:55 =#
-    struct Normal{var"#10#P", var"#11#X"} <: Measures.AbstractMeasure{var"#11#X"}
-        #= /home/chad/git/Measures.jl/src/Measures.jl:56 =#
-        par::var"#10#P"
-    end
-    #= /home/chad/git/Measures.jl/src/Measures.jl:59 =#
-    function Normal(var"#13#nt"::Measures.NamedTuple)
-        #= /home/chad/git/Measures.jl/src/Measures.jl:59 =#
-        #= /home/chad/git/Measures.jl/src/Measures.jl:60 =#
-        var"#12#P" = Measures.typeof(var"#13#nt")
-        #= /home/chad/git/Measures.jl/src/Measures.jl:61 =#
-        return Normal{var"#12#P", Measures.eltype(Normal{var"#12#P"})}
-    end
-    #= /home/chad/git/Measures.jl/src/Measures.jl:64 =#
-    Normal(; var"#14#kwargs"...) = begin
-            #= /home/chad/git/Measures.jl/src/Measures.jl:64 =#
-            Normal((; var"#14#kwargs"...))
-        end
-    #= /home/chad/git/Measures.jl/src/Measures.jl:66 =#
-    (var"#8#basemeasure"(var"#15#μ"::Normal{var"#16#P", var"#17#X"}) where {var"#16#P", var"#17#X"}) = begin
-            #= /home/chad/git/Measures.jl/src/Measures.jl:66 =#
-            Lebesgue{var"#17#X"}
-        end
-    #= /home/chad/git/Measures.jl/src/Measures.jl:68 =#
-    (var"#9#≪"(::Normal{var"#19#P", var"#20#X"}, ::Lebesgue{var"#20#X"}) where {var"#19#P", var"#20#X"}) = begin
-            #= /home/chad/git/Measures.jl/src/Measures.jl:68 =#
-            true
-        end
-end
-```
-
-Next we have 
-
-```julia
-Normal(μ::Real, σ::Real) = Normal(μ=μ, σ=σ)
-```
+To get an idea of the possibilities offered by this package, go to the [documentation](https://cscherrer.github.io/MeasureTheory.jl/stable).
 
-This defines a default. If we just give two numbers as arguments (but no names), we'll assume this parameterization.
-
-Next need to define a `eltype` function. This takes a constructor (here `Normal`) and a parameter, and tells us the space for which this defines a measure. Let's define this in terms of the types of the parameters,
-
-```julia
-eltype(::Type{Normal}, ::Type{NamedTuple{(:μ, :σ), Tuple{A, B}}}) where {A,B} = promote_type(A,B)
-```
-
-That's still kind of boring, so let's build the density. For this, we need to implement the trait
-
-```julia
-@trait Density{M,X} where {X = domain{M}} begin
-    basemeasure :: [M] => Measure{X}
-    logdensity :: [M, X] => Real
-end
-```
-
-A density doesn't exist by itself, but is defined relative to some _base measure_. For a normal distribution this is just Lebesgue measure on the real numbers. That, together with the usual Gaussian log-density, gives us
-
-```julia
-@implement Density{Normal{X,P},X} where {X, P <: NamedTuple{(:μ, :σ)}} begin
-    basemeasure(d) = Lebesgue(X)
-    logdensity(d, x) = - (log(2) + log(π)) / 2 - log(d.par.σ)  - (x - d.par.μ)^2 / (2 * d.par.σ^2)
-end
-```
-
-Now we can compute the log-density:
-
-```julia
-julia> logdensity(Normal(0.0, 0.5), 1.0)
--2.2257913526447273
-```
-
-And just to check that our default is working,
-
-```julia
-julia> logdensity(Normal(μ=0.0, σ=0.5), 1.0)
--2.2257913526447273
-```
-
-What about other parameterizations? Sure, no problem. Here's a way to write this for mean `μ` (as before), but using the _precision_ (inverse of the variance) instead of standard deviation:
-
-```julia
-eltype(::Type{Normal}, ::Type{NamedTuple{(:μ, :τ), Tuple{A, B}}}) where {A,B} = promote_type(A,B)
-
-@implement Density{Normal{X,P},X} where {X, P <: NamedTuple{(:μ, :τ)}} begin
-    basemeasure(d) = Lebesgue(X)
-    logdensity(d, x) = - (log(2) + log(π) - log(d.par.τ)  + d.par.τ * (x - d.par.μ)^2) / 2
-end
-```
-
-And another check:
-
-```julia
-julia> logdensity(Normal(μ=0.0, τ=4.0), 1.0)
--2.2257913526447273
-```
-
-We can combine measures in a few ways, for now just _scaling_ and _superposition_:
-
-```julia
-julia> 2.0*Lebesgue(Float64) + Normal(0.0,1.0)
-SuperpositionMeasure{Float64,2}((MeasureTheory.WeightedMeasure{Float64,Float64}(2.0, Lebesgue{Float64}()), Normal{NamedTuple{(:μ, :σ),Tuple{Float64,Float64}},Float64}((μ = 0.0, σ = 1.0))))
-```
-
----
-
-For an easy way to find expressions for the common log-densities, see [this gist](https://gist.github.com/cscherrer/47f0fc7597b4ffc11186d54cc4d8e577)
+To know more about the underlying theory and its applications to probabilistic programming, check out our [JuliaCon 2021 submission](https://arxiv.org/abs/2110.00602).
 
 ## Support
+
 [<img src=https://user-images.githubusercontent.com/1184449/140397787-9b7e3eb7-49cd-4c63-8f3c-e5cdc41e393d.png width="49%">](https://informativeprior.com/) [<img src=https://planting.space/sponsor/PlantingSpace-sponsor-3.png width=49%>](https://planting.space)
 
 ## Stargazers over time
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,15 +1,24 @@
-using MeasureTheory
 using Documenter
+using MeasureTheory
 
+DocMeta.setdocmeta!(MeasureBase, :DocTestSetup, :(using MeasureBase); recursive=true)
+DocMeta.setdocmeta!(MeasureTheory, :DocTestSetup, :(using MeasureTheory); recursive=true)
 
 pages = [
-    "Introduction" => "intro.md"
-    "Home" => "index.md"
-    "Adding a New Measure" => "adding.md"
+    "Home" => "index.md",
+    "Tutorials" => [
+        "Adding a new measure" => "adding.md",
+        "Affine transformations" => "affine.md",
+    ],
+    "API Reference" => [
+        "MeasureBase" => "api_measurebase.md",
+        "MeasureTheory" => "api_measuretheory.md",
+        "Index" => "api_index.md",
+    ],
 ]
 
 makedocs(;
-    modules=[MeasureTheory],
+    modules=[MeasureBase, MeasureTheory],
     authors="Chad Scherrer <chad.scherrer@gmail.com> and contributors",
     repo="https://github.com/cscherrer/MeasureTheory.jl/blob/{commit}{path}#L{line}",
     sitename="MeasureTheory.jl",
diff --git a/docs/src/affine.md b/docs/src/affine.md
@@ -0,0 +1,114 @@
+# Affine Transformations
+
+It's very common for measures to be parameterized by `μ` and `σ`, for example as in `Normal(μ=3, σ=4)` or `StudentT(ν=1, μ=3, σ=4)`. In this context, `μ` and `σ` do not always refer to the mean and standard deviation (the `StudentT` above is equivalent to a Cauchy, so both are undefined).
+
+Rather, `μ` is a "location parameter", and `σ` is a "scale parameter". Together these determine an affine transformation
+
+```math
+f(z) = σ z + μ
+```
+
+Here are below, we'll use ``z`` to represent an "un-transformed" variable, typically coming from a measure like `Normal()` with no location or scale parameters.
+
+Affine transforms are often incorrectly referred to as "linear". Linearity requires ``f(ax + by) = a f(x) + b f(y)`` for scalars ``a`` and ``b``, which only holds for the above ``f`` if ``μ=0``.
+
+
+## Cholesky-based parameterizations
+
+If the "un-transformed" `z` is a scalar, things are relatively simple. But it's important our approach handle the multivariate case as well.
+
+In the literature, it's common for a multivariate normal distribution to be parameterized by a mean `μ` and covariance matrix `Σ`. This is mathematically convenient, but can be very awkward from a computational perspective.
+
+While MeasureTheory.jl includes (or will include) a parameterization using `Σ`, we prefer to work in terms of its Cholesky decomposition ``σ``.
+
+Using "``σ``" for this may seem strange at first, so we should explain the notation. Let ``σ`` be a lower-triangular matrix satisfying
+
+```math
+σ σᵗ = Σ
+```
+
+Then given a (multivariate) standard normal ``z``, the covariance matrix of ``σ z + μ`` is
+
+```math
+𝕍[σ z + μ] = Σ
+```
+
+Comparing to the one dimensional case where
+
+```math
+𝕍[σ z + μ] = σ²
+```
+
+shows that the lower Cholesky factor of the covariance generalizes the concept of standard deviation, justifying the notation.
+
+## The "Cholesky precision" parameterization
+
+The ``(μ,σ)`` parameterization is especially convenient for random sampling. Any `z ~ Normal()` determines an `x ~ Normal(μ,σ)` through
+
+```math
+x = σ z + μ
+```
+
+On the other hand, the log-density computation is not quite so simple. Starting with an ``x``, we need to find ``z`` using
+
+```math
+z = σ⁻¹ (x - μ)
+```
+
+so the log-density is
+
+```julia
+logdensity(d::Normal{(:μ,:σ)}, x) = logdensity(d.σ \ (x - d.μ)) - logdet(d.σ)
+```
+
+Here the `- logdet(σ)` is the "log absolute Jacobian", required to account for the stretching of the space.
+
+The above requires solving a linear system, which adds some overhead. Even with the convenience of a lower triangular system, it's still not quite a efficient as a multiplication.
+
+In addition to the covariance ``Σ``, it's also common to parameterize a multivariate normal by its _precision matrix_, ``Ω = Σ⁻¹``. Similarly to our use of ``σ``, we'll use ``ω`` for the lower Cholesky factor of ``Ω``.
+
+This allows a more efficient log-density,
+
+```julia
+logdensity(d::Normal{(:μ,:ω)}, x) = logdensity(d.ω * (x - d.μ)) + logdet(d.ω)
+```
+
+## `AffineTransform`
+
+Transforms like ``z → σ z + μ`` and ``z → ω \ z + μ`` can be represented using an `AffineTransform`. For example,
+
+```julia
+julia> f = AffineTransform((μ=3.,σ=2.))
+AffineTransform{(:μ, :σ), Tuple{Float64, Float64}}((μ = 3.0, σ = 2.0))
+
+julia> f(1.0)
+5.0
+```
+
+In the scalar case this is relatively simple to invert. But if `σ` is a matrix, this would require matrix inversion. Adding to this complication is that lower triangular matrices are not closed under matrix inversion. 
+
+Our multiple parameterizations make it convenient to deal with these issues. The inverse transform of a ``(μ,σ)`` transform will be in terms of ``(μ,ω)``, and vice-versa. So
+
+```julia
+julia> f⁻¹ = inv(f)
+AffineTransform{(:μ, :ω), Tuple{Float64, Float64}}((μ = -1.5, ω = 2.0))
+
+julia> f(f⁻¹(4))
+4.0
+
+julia> f⁻¹(f(4))
+4.0
+```
+
+## `Affine`
+
+Of particular interest (the whole point of all of this, really) is to have a natural way to work with affine transformations of measures. In accordance with the principle of "common things should have shorter names", we call this `Affine`.
+
+The structure of `Affine` is relatively simple:
+
+```julia
+struct Affine{N,M,T} <: AbstractMeasure
+    f::AffineTransform{N,T}
+    parent::M
+end
+```
diff --git a/docs/src/api_index.md b/docs/src/api_index.md
@@ -0,0 +1,4 @@
+# Index
+
+```@index
+```
diff --git a/docs/src/api_measurebase.md b/docs/src/api_measurebase.md
@@ -0,0 +1,5 @@
+# MeasureBase API
+
+```@autodocs
+Modules = [MeasureBase]
+```
diff --git a/docs/src/api_measuretheory.md b/docs/src/api_measuretheory.md
@@ -0,0 +1,5 @@
+# MeasureTheory API
+
+```@autodocs
+Modules = [MeasureTheory]
+```
diff --git a/docs/src/index.md b/docs/src/index.md
diff --git a/docs/src/intro.md b/docs/src/intro.md
diff --git a/docs/src/old_readme.md b/docs/src/old_readme.md

-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +# Index
++
 +```@index
 +```