Skip to content

Commit 27ba552

Browse files
committed
Merge branch 'localchanges' into gp-module
2 parents 6b46f03 + 042cfae commit 27ba552

File tree

10 files changed

+4635
-776
lines changed

10 files changed

+4635
-776
lines changed

docs/source/gp.rst

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
===========================
2+
Gaussian Processes in PyMC3
3+
===========================
4+
5+
GP Basics
6+
=========
7+
8+
Sometimes an unknown parameter or variable in a model is not a scalar value or
9+
a fixed-length vector, but a *function*. A Gaussian process (GP) can be used
10+
as a prior probability distribution whose support is over the space of
11+
continuous functions. A GP prior on the function :math:`f(x)` is usually written,
12+
13+
.. math::
14+
15+
f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,.
16+
17+
The function values are modeled as a draw from a multivariate normal
18+
distribution that is parameterized by the mean function, :math:`m(x)`, and the
19+
covariance function, :math:`k(x, x')`. Gaussian processes are a convenient
20+
choice as priors over functions due to the marginalization and conditioning
21+
properties of the multivariate normal distribution. Usually, the marginal
22+
distribution over :math:`f(x)` is evaluated during the inference step. The
23+
conditional distribution is then used for predicting the function values
24+
:math:`f(x_*)` at new points, :math:`x_*`.
25+
26+
The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate
27+
normal,
28+
29+
.. math::
30+
31+
\begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim
32+
\text{N}\left(
33+
\begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,,
34+
\begin{bmatrix} k(x,x') & k(x_*, x) \\
35+
k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix}
36+
\right) \,.
37+
38+
Starting from the joint distribution, one obtains the marginal distribution
39+
of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional
40+
distribution is
41+
42+
.. math::
43+
44+
f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\,
45+
k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,.
46+
47+
.. note::
48+
49+
For more information on GPs, check out the book `Gaussian Processes for
50+
Machine Learning <http://www.gaussianprocess.org/gpml/>`_ by Rasmussen &
51+
Williams, or `this introduction <https://www.ics.uci.edu/~welling/teaching/KernelsICS273B/gpB.pdf>`_
52+
by D. Mackay.
53+
54+
=============================
55+
Overview of GP usage In PyMC3
56+
=============================
57+
58+
PyMC3 is a great environment for working with fully Bayesian Gaussian Process
59+
models. The GP functionality of PyMC3 is meant to have a clear syntax and be
60+
highly composable. It includes many predefined covariance functions (or
61+
kernels), mean functions, and several GP implementations. GPs are treated as
62+
distributions that can be used within larger or hierarchical models, not just
63+
as standalone regression models.
64+
65+
Mean and covariance functions
66+
=============================
67+
68+
Those who have used the GPy or GPflow Python packages will find the syntax for
69+
construction mean and covariance functions somewhat familiar. When first
70+
instantiated, the mean and covariance functions are parameterized, but not
71+
given their inputs yet. The covariance functions must additionally be provided
72+
with the number of dimensions of the input matrix, and a list that indexes
73+
which of those dimensions they are to operate on. The reason for this design
74+
is so that covariance functions can be constructed that are combinations of
75+
other covariance functions.
76+
77+
For example, to construct an exponentiated quadratic covariance function that
78+
operates on the second and third column of a three column matrix representing
79+
three predictor variables::
80+
81+
lengthscales = [2, 5]
82+
cov_func = pm.gp.cov.ExpQuad(input_dim=3, lengthscales, active_dims=[1, 2])
83+
84+
Here the :code:`lengthscales` parameter is two dimensional, allowing the second
85+
and third dimension to have a different lengthscale. The reason we have to
86+
specify :code:`input_dim`, the total number of columns of :code:`X`, and
87+
:code:`active_dims`, which of those columns or dimensions the covariance
88+
function will act on, is because :code:`cov_func` hasn't actually seen the
89+
input data yet. The :code:`active_dims` argument is optional, and defaults to
90+
all columns of the matrix of inputs.
91+
92+
Covariance functions in PyMC3 closely follow the algebraic rules for kernels,
93+
which allows users to combine covariance functions into new ones, for example:
94+
95+
- The sum two covariance functions is also a covariance function::
96+
97+
98+
cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...)
99+
100+
- The product of two covariance functions is also a covariance function::
101+
102+
103+
cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)
104+
105+
- The product (or sum) of a covariance function with a scalar is a covariance function::
106+
107+
108+
cov_func = eta**2 * pm.gp.cov.Matern32(...)
109+
110+
After the covariance function is defined, it is now a function that is
111+
evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). For more
112+
information on mean and covariance functions in PyMC3, check out the tutorial
113+
on covariance functions.
114+
115+
116+
:code:`gp.*` implementations
117+
============================
118+
119+
PyMC3 includes several GP implementations, including marginal and latent
120+
variable models and also some fast approximations. Their usage all follows a
121+
similar pattern: First, a GP is instantiated with a mean function and a
122+
covariance function. Then, GP objects can be added together, allowing for
123+
function characteristics to be carefully modeled and separated. Finally, one
124+
of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP
125+
object to actually construct the PyMC3 random variable that represents the
126+
function prior.
127+
128+
Using :code:`gp.Latent` for the example, the syntax to first specify the GP
129+
is::
130+
131+
gp = pm.gp.Latent(mean_func, cov_func)
132+
133+
The first argument is the mean function and the second is the covariance
134+
function. We've made the GP object, but we haven't made clear which function
135+
it is to be a prior for, what the inputs are, or what parameters it will be
136+
conditioned on.
137+
138+
GPs in PyMC3 are additive, which means we can write::
139+
140+
gp1 = pm.gp.Latent(mean_func1, cov_func1)
141+
gp2 = pm.gp.Latent(mean_func2, cov_func2)
142+
gp3 = gp1 + gp2
143+
144+
145+
Calling the `prior` method will create a PyMC3 random variable that represents
146+
the latent function :math:$f(x) = \mathbf{f}$::
147+
148+
f = gp3.prior("f", n_points, X)
149+
150+
:code:`f` is a random variable that can be used within a PyMC3 model like any
151+
other type of random variable. The first argument is the name of the random
152+
variable representing the function we are placing the prior over. The second
153+
argument is :code:`n_points`, which is number of points that the GP acts on ---
154+
the total number of dimensions. The third argument is the inputs to the
155+
function that the prior is over, :code:`X`. The inputs are usually known and
156+
present in the data, but they can also be PyMC3 random variables.
157+
158+
.. note::
159+
160+
The :code:`n_points` argument is required because of how Theano and PyMC3
161+
handle the shape information of distributions. For :code:`prior` or
162+
:code:`marginal_likelihood`, it is the number of rows in the inputs,
163+
:code:`X`. For :code:`conditional`, it is the number of rows in the new
164+
inputs, :code:`X_new`.
165+
166+
167+
The :code:`conditional` method creates the conditional, or predictive,
168+
distribution over the latent function at arbitrary :math:$x_*$ input points,
169+
:math:$f(x_*)$. It can be called on any or all of the component GPs,
170+
:code:`gp1`, :code:`gp2`, or :code:`gp3`. To construct the conditional
171+
distribution for :code:`gp3`, we write::
172+
173+
f_star = gp3.conditional("f_star", n_newpoints, X_star)
174+
175+
To construct the conditional distribution of one of the component GPs,
176+
:code:`gp1` or :code:`gp2`, we also need to include the original inputs
177+
:code:`X` as an argument::
178+
179+
f1_star = gp1.conditional("f1_start", n_newpoints, X_star, X=X)
180+
181+
The :code:`gp3` object keeps track of the inputs it used when :code:`prior` was
182+
set. Since the prior method of :code:`gp1` wasn't called, it needs to be
183+
provided with the inputs :code:`X`. In the same fashion as the prior,
184+
:code:`f_star` and :code:`f1_star` are random variables that can now be used
185+
like any other random variable in PyMC3.
186+
187+
.. note::
188+
189+
`gp.Latent` has a `prior` and a `conditional`, but not a `marginal_likelihood`,
190+
since that method doesn't make sense in this case. Other GP implementations
191+
have `marginal_likelihood`, but not a `prior`, such as `gp.Marginal` and
192+
`gp.MarginalSparse`.
193+
194+
There are examples demonstrating in more detail the usage
195+
of GP functionality in PyMC3, including examples demonstrating the usage of the
196+
different GP implementations. Link to other GPs here in a note
197+
198+

docs/source/notebooks/GP-Latent.ipynb

Lines changed: 586 additions & 0 deletions
Large diffs are not rendered by default.

docs/source/notebooks/GP-Marginal.ipynb

Lines changed: 471 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)