Skip to content

Commit fc5c238

Browse files
authored
Merge pull request #2444 from bwengals/gp-module
GP functionality 2
2 parents e94a89a + 7ae8cc3 commit fc5c238

17 files changed

+7406
-1959
lines changed

docs/source/examples.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,12 @@ Gaussian Processes
4848
==================
4949

5050
.. toctree::
51-
notebooks/GP-introduction.ipynb
52-
notebooks/GP-covariances.ipynb
51+
notebooks/GP-Marginal.ipynb
52+
notebooks/GP-Covariances.ipynb
53+
notebooks/GP-Latent.ipynb
54+
notebooks/GP-SparseApprox.ipynb
55+
notebooks/GP-TProcess.ipynb
56+
notebooks/GP-MaunaLoa.ipynb
5357
notebooks/GP-slice-sampling.ipynb
5458
notebooks/GP-smoothing.ipynb
5559

docs/source/gp.rst

Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
===========================
2+
Gaussian Processes in PyMC3
3+
===========================
4+
5+
GP Basics
6+
=========
7+
8+
Sometimes an unknown parameter or variable in a model is not a scalar value or
9+
a fixed-length vector, but a *function*. A Gaussian process (GP) can be used
10+
as a prior probability distribution whose support is over the space of
11+
continuous functions. A GP prior on the function :math:`f(x)` is usually written,
12+
13+
.. math::
14+
15+
f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,.
16+
17+
The function values are modeled as a draw from a multivariate normal
18+
distribution that is parameterized by the mean function, :math:`m(x)`, and the
19+
covariance function, :math:`k(x, x')`. Gaussian processes are a convenient
20+
choice as priors over functions due to the marginalization and conditioning
21+
properties of the multivariate normal distribution. Usually, the marginal
22+
distribution over :math:`f(x)` is evaluated during the inference step. The
23+
conditional distribution is then used for predicting the function values
24+
:math:`f(x_*)` at new points, :math:`x_*`.
25+
26+
The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate
27+
normal,
28+
29+
.. math::
30+
31+
\begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim
32+
\text{N}\left(
33+
\begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,,
34+
\begin{bmatrix} k(x,x') & k(x_*, x) \\
35+
k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix}
36+
\right) \,.
37+
38+
Starting from the joint distribution, one obtains the marginal distribution
39+
of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional
40+
distribution is
41+
42+
.. math::
43+
44+
f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\,
45+
k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,.
46+
47+
.. note::
48+
49+
For more information on GPs, check out the book `Gaussian Processes for
50+
Machine Learning <http://www.gaussianprocess.org/gpml/>`_ by Rasmussen &
51+
Williams, or `this introduction <https://www.ics.uci.edu/~welling/teaching/KernelsICS273B/gpB.pdf>`_
52+
by D. Mackay.
53+
54+
PyMC3 is a great environment for working with fully Bayesian Gaussian Process
55+
models. GPs in PyMC3 have a clear syntax and are highly composable, and many
56+
predefined covariance functions (or kernels), mean functions, and several GP
57+
implementations are included. GPs are treated as distributions that can be
58+
used within larger or hierarchical models, not just as standalone regression
59+
models.
60+
61+
Mean and covariance functions
62+
=============================
63+
64+
Those who have used the GPy or GPflow Python packages will find the syntax for
65+
construction mean and covariance functions somewhat familiar. When first
66+
instantiated, the mean and covariance functions are parameterized, but not
67+
given their inputs yet. The covariance functions must additionally be provided
68+
with the number of dimensions of the input matrix, and a list that indexes
69+
which of those dimensions they are to operate on. The reason for this design
70+
is so that covariance functions can be constructed that are combinations of
71+
other covariance functions.
72+
73+
For example, to construct an exponentiated quadratic covariance function that
74+
operates on the second and third column of a three column matrix representing
75+
three predictor variables::
76+
77+
ls = [2, 5] # the lengthscales
78+
cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2])
79+
80+
Here the :code:`ls`, or lengthscale, parameter is two dimensional, allowing the second
81+
and third dimension to have a different lengthscale. The reason we have to
82+
specify :code:`input_dim`, the total number of columns of :code:`X`, and
83+
:code:`active_dims`, which of those columns or dimensions the covariance
84+
function will act on, is because :code:`cov_func` hasn't actually seen the
85+
input data yet. The :code:`active_dims` argument is optional, and defaults to
86+
all columns of the matrix of inputs.
87+
88+
Covariance functions in PyMC3 closely follow the algebraic rules for kernels,
89+
which allows users to combine covariance functions into new ones, for example:
90+
91+
- The sum two covariance functions is also a covariance function::
92+
93+
94+
cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...)
95+
96+
- The product of two covariance functions is also a covariance function::
97+
98+
99+
cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)
100+
101+
- The product (or sum) of a covariance function with a scalar is a covariance
102+
function::
103+
104+
105+
cov_func = eta**2 * pm.gp.cov.Matern32(...)
106+
107+
After the covariance function is defined, it is now a function that is
108+
evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since
109+
PyMC3 is built on top of Theano, it is relatively easy to define and experiment
110+
with non-standard covariance and mean functons. For more information check out
111+
the tutorial on covariance functions.
112+
113+
114+
:code:`gp.*` implementations
115+
============================
116+
117+
PyMC3 includes several GP implementations, including marginal and latent
118+
variable models and also some fast approximations. Their usage all follows a
119+
similar pattern: First, a GP is instantiated with a mean function and a
120+
covariance function. Then, GP objects can be added together, allowing for
121+
function characteristics to be carefully modeled and separated. Finally, one
122+
of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP
123+
object to actually construct the PyMC3 random variable that represents the
124+
function prior.
125+
126+
Using :code:`gp.Latent` for the example, the syntax to first specify the GP
127+
is::
128+
129+
gp = pm.gp.Latent(mean_func, cov_func)
130+
131+
The first argument is the mean function and the second is the covariance
132+
function. We've made the GP object, but we haven't made clear which function
133+
it is to be a prior for, what the inputs are, or what parameters it will be
134+
conditioned on.
135+
136+
.. note::
137+
138+
The :code:`gp.Marginal` class and similar don't have a :code:`prior` method.
139+
Instead they have a :code:`marginal_likelihood` method that is used similarly,
140+
but has additional required arguments, such as the observed data, noise,
141+
or other, depending on the implementation. See the notebooks for examples.
142+
The :code:`conditional` method works similarly.
143+
144+
Calling the `prior` method will create a PyMC3 random variable that represents
145+
the latent function :math:`f(x) = \mathbf{f}`::
146+
147+
f = gp.prior("f", X)
148+
149+
:code:`f` is a random variable that can be used within a PyMC3 model like any
150+
other type of random variable. The first argument is the name of the random
151+
variable representing the function we are placing the prior over.
152+
The second argument is the inputs to the function that the prior is over,
153+
:code:`X`. The inputs are usually known and present in the data, but they can
154+
also be PyMC3 random variables. If the inputs are a Theano tensor or a
155+
PyMC3 random variable, the :code:`shape` needs to be given.
156+
157+
Usually at this point, inference is performed on the model. The
158+
:code:`conditional` method creates the conditional, or predictive,
159+
distribution over the latent function at arbitrary :math:`x_*` input points,
160+
:math:`f(x_*)`. To construct the conditional distribution we write::
161+
162+
f_star = gp.conditional("f_star", X_star)
163+
164+
Additive GPs
165+
============
166+
167+
The GP implementation in PyMC3 is constructed so that it is easy to define
168+
additive GPs and sample from individual GP components. We can write::
169+
170+
gp1 = pm.gp.Marginal(mean_func1, cov_func1)
171+
gp2 = pm.gp.Marginal(mean_func2, cov_func2)
172+
gp3 = gp1 + gp2
173+
174+
The GP objects have to have the same type, :code:`gp.Marginal` cannot
175+
be added to :code:`gp.Latent`.
176+
177+
Consider two independent GP distributed functions, :math:`f_1(x) \sim
178+
\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim
179+
\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`. The joint distribution of
180+
:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2 and f_1^* + f_2^*` is
181+
182+
.. math::
183+
184+
\begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^*
185+
\\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim
186+
\text{N}\left(
187+
\begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\
188+
m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\,
189+
\begin{bmatrix}
190+
K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\
191+
K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\
192+
0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\
193+
0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\
194+
K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\
195+
K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**}
196+
\end{bmatrix}
197+
\right) \,.
198+
199+
Using the joint distribution to obtain the conditional distribution of :math:`f_1^*`
200+
with the contribution due to :math:`f_1 + f_2` factored out, we get
201+
202+
.. math::
203+
f_1^* \mid f_1 + f_2 \sim \text{N}\left(
204+
m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\,
205+
K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,.
206+
207+
208+
These equations show how to break down GP models into individual components to see how each
209+
contributes to the data. For more information, check out `David Duvenaud's PhD
210+
thesis <https://www.cs.toronto.edu/~duvenaud/thesis.pdf>`_.
211+
212+
The GP objects in PyMC3 keeps track of these marginals automatically. The
213+
following code sketch shows how to define the conditional distribution of
214+
:math:`f_2^*`. We use `gp.Marginal` in the example, but the same works for
215+
other implementations. The first block fits the GP prior. We denote
216+
:math:`f_1 + f_2` as just :math:`f` for brevity::
217+
218+
with pm.Model() as model:
219+
gp1 = pm.gp.Marginal(mean_func1, cov_func1)
220+
gp2 = pm.gp.Marginal(mean_func2, cov_func2)
221+
222+
# gp represents f1 + f2.
223+
gp = gp1 + gp2
224+
225+
f = gp.marginal_likelihood("f", X, y, noise)
226+
227+
trace = pm.sample(1000)
228+
229+
230+
To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we
231+
also need to include the additional arguments, :code:`X`, :code:`y`, and
232+
:code:`noise`::
233+
234+
with model:
235+
# conditional distributions of f1 and f2
236+
f1_star = gp1.conditional("f1_star", X_star,
237+
given={"X": X, "y": y, "noise": noise, "gp": gp})
238+
f2_star = gp2.conditional("f2_star", X_star,
239+
given={"X": X, "y": y, "noise": noise, "gp": gp})
240+
241+
# conditional of f1 + f2, `given` not required
242+
f_star = gp.conditional("f_star", X_star)
243+
244+
This second block produces the conditional distributions. Notice that extra
245+
arguments are required for conditionals of :math:`f1` and :math:`f2`, but not
246+
:math:`f`. This is because those arguments are cached when calling
247+
:code:`.marginal_likelihood` was called on :code:`gp`.
248+
249+
.. note::
250+
When constructing conditionals, the additional arguments :code:`X`, :code:`y`,
251+
:code:`noise` and :code:`gp` must be provided as a dict called `given`!
252+
253+
Since the marginal likelihoood method of :code:`gp1` or :code:`gp2` weren't called,
254+
their conditionals need to be provided with the required inputs. In the same
255+
fashion as the prior, :code:`f_star`, :code:`f1_star` and :code:`f2_star` are random
256+
variables that can now be used like any other random variable in PyMC3.
257+
258+
Check the notebooks for detailed demonstrations of the usage of GP functionality
259+
in PyMC3.
260+
261+

docs/source/notebooks/GP-Covariances.ipynb

Lines changed: 1015 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)