Skip to content

Commit 1058be1

Browse files
committed
Merge branch 'localchanges' into gp-module
2 parents c900d34 + 842a8b8 commit 1058be1

File tree

11 files changed

+1900
-1743
lines changed

11 files changed

+1900
-1743
lines changed

docs/source/gp.rst

Lines changed: 129 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -51,16 +51,12 @@ distribution is
5151
Williams, or `this introduction <https://www.ics.uci.edu/~welling/teaching/KernelsICS273B/gpB.pdf>`_
5252
by D. Mackay.
5353

54-
=============================
55-
Overview of GP usage In PyMC3
56-
=============================
57-
5854
PyMC3 is a great environment for working with fully Bayesian Gaussian Process
59-
models. The GP functionality of PyMC3 is meant to have a clear syntax and be
60-
highly composable. It includes many predefined covariance functions (or
61-
kernels), mean functions, and several GP implementations. GPs are treated as
62-
distributions that can be used within larger or hierarchical models, not just
63-
as standalone regression models.
55+
models. GPs in PyMC3 have a clear syntax and are highly composable, and many
56+
predefined covariance functions (or kernels), mean functions, and several GP
57+
implementations are included. GPs are treated as distributions that can be
58+
used within larger or hierarchical models, not just as standalone regression
59+
models.
6460

6561
Mean and covariance functions
6662
=============================
@@ -78,10 +74,10 @@ For example, to construct an exponentiated quadratic covariance function that
7874
operates on the second and third column of a three column matrix representing
7975
three predictor variables::
8076

81-
lengthscales = [2, 5]
82-
cov_func = pm.gp.cov.ExpQuad(input_dim=3, lengthscales, active_dims=[1, 2])
77+
ls = [2, 5] # the lengthscales
78+
cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2])
8379

84-
Here the :code:`lengthscales` parameter is two dimensional, allowing the second
80+
Here the :code:`ls` parameter is two dimensional, allowing the second
8581
and third dimension to have a different lengthscale. The reason we have to
8682
specify :code:`input_dim`, the total number of columns of :code:`X`, and
8783
:code:`active_dims`, which of those columns or dimensions the covariance
@@ -102,15 +98,17 @@ which allows users to combine covariance functions into new ones, for example:
10298

10399
cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)
104100
105-
- The product (or sum) of a covariance function with a scalar is a covariance function::
101+
- The product (or sum) of a covariance function with a scalar is a covariance
102+
function::
106103

107104
108105
cov_func = eta**2 * pm.gp.cov.Matern32(...)
109106
110107
After the covariance function is defined, it is now a function that is
111-
evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). For more
112-
information on mean and covariance functions in PyMC3, check out the tutorial
113-
on covariance functions.
108+
evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since
109+
PyMC3 is built on top of Theano, it is relatively easy to define and experiment
110+
with non-standard covariance and mean functons. For more information check out
111+
the tutorial on covariance functions.
114112

115113

116114
:code:`gp.*` implementations
@@ -133,66 +131,140 @@ is::
133131
The first argument is the mean function and the second is the covariance
134132
function. We've made the GP object, but we haven't made clear which function
135133
it is to be a prior for, what the inputs are, or what parameters it will be
136-
conditioned on.
134+
conditioned on.
137135

138-
GPs in PyMC3 are additive, which means we can write::
136+
.. note::
139137

140-
gp1 = pm.gp.Latent(mean_func1, cov_func1)
141-
gp2 = pm.gp.Latent(mean_func2, cov_func2)
142-
gp3 = gp1 + gp2
143-
138+
The :code:`gp.Marginal` class and similar don't have a :code:`prior` method.
139+
Instead they have a :code:`marginal_likelihood` method that is used similarly,
140+
but has additional required arguments, such as the observed data, noise,
141+
or other, depending on the implementation. See the notebooks for examples.
142+
The :code:`conditional` method works similarly.
144143

145144
Calling the `prior` method will create a PyMC3 random variable that represents
146-
the latent function :math:$f(x) = \mathbf{f}$::
145+
the latent function :math:`f(x) = \mathbf{f}`::
147146
148-
f = gp3.prior("f", n_points, X)
147+
f = gp.prior("f", X)
149148

150149
:code:`f` is a random variable that can be used within a PyMC3 model like any
151150
other type of random variable. The first argument is the name of the random
152-
variable representing the function we are placing the prior over. The second
153-
argument is :code:`n_points`, which is number of points that the GP acts on ---
154-
the total number of dimensions. The third argument is the inputs to the
155-
function that the prior is over, :code:`X`. The inputs are usually known and
156-
present in the data, but they can also be PyMC3 random variables.
151+
variable representing the function we are placing the prior over.
152+
The second argument is the inputs to the function that the prior is over,
153+
:code:`X`. The inputs are usually known and present in the data, but they can
154+
also be PyMC3 random variables. If the inputs are a Theano tensor or a
155+
PyMC3 random variable, the :code:`n_points` argument is required.
157156

158157
.. note::
159158

160-
The :code:`n_points` argument is required because of how Theano and PyMC3
161-
handle the shape information of distributions. For :code:`prior` or
162-
:code:`marginal_likelihood`, it is the number of rows in the inputs,
163-
:code:`X`. For :code:`conditional`, it is the number of rows in the new
164-
inputs, :code:`X_new`.
159+
The :code:`n_points` argument is sometimes required due to how Theano and
160+
PyMC3 handle the shape information of distributions. It needs to be specified
161+
when :code:`X` or :code:`Xnew` is a Theano tensor or a random variable.
162+
For :code:`prior` or :code:`marginal_likelihood`, it is the number of rows
163+
of the inputs, :code:`X`. For :code:`conditional`, it is the number of
164+
rows in the new inputs, :code:`X_new`.
165165

166+
Usually at this point, inference is performed on the model. The
167+
:code:`conditional` method creates the conditional, or predictive,
168+
distribution over the latent function at arbitrary :math:`x_*` input points,
169+
:math:`f(x_*)`. To construct the conditional distribution we write::
166170

167-
The :code:`conditional` method creates the conditional, or predictive,
168-
distribution over the latent function at arbitrary :math:$x_*$ input points,
169-
:math:$f(x_*)$. It can be called on any or all of the component GPs,
170-
:code:`gp1`, :code:`gp2`, or :code:`gp3`. To construct the conditional
171-
distribution for :code:`gp3`, we write::
171+
f_star = gp.conditional("f_star", X_star)
172172

173-
f_star = gp3.conditional("f_star", n_newpoints, X_star)
173+
Additive GPs
174+
============
174175

175-
To construct the conditional distribution of one of the component GPs,
176-
:code:`gp1` or :code:`gp2`, we also need to include the original inputs
177-
:code:`X` as an argument::
176+
The GP implementation in PyMC3 is constructed so that it is easy to define
177+
additive GPs and sample from individual GP components. We can write::
178178

179-
f1_star = gp1.conditional("f1_start", n_newpoints, X_star, X=X)
179+
gp1 = pm.gp.Marginal(mean_func1, cov_func1)
180+
gp2 = pm.gp.Marginal(mean_func2, cov_func2)
181+
gp3 = gp1 + gp2
180182

181-
The :code:`gp3` object keeps track of the inputs it used when :code:`prior` was
182-
set. Since the prior method of :code:`gp1` wasn't called, it needs to be
183-
provided with the inputs :code:`X`. In the same fashion as the prior,
184-
:code:`f_star` and :code:`f1_star` are random variables that can now be used
185-
like any other random variable in PyMC3.
183+
The GP objects have to have the same type, :code:`gp.Marginal` cannot
184+
be added to :code:`gp.Latent`.
186185

187-
.. note::
186+
Consider two independent GP distributed functions, :math:`f_1(x) \sim
187+
\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim
188+
\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`. The joint distribution of
189+
:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2 and f_1^* + f_2^*` is
188190

189-
`gp.Latent` has a `prior` and a `conditional`, but not a `marginal_likelihood`,
190-
since that method doesn't make sense in this case. Other GP implementations
191-
have `marginal_likelihood`, but not a `prior`, such as `gp.Marginal` and
192-
`gp.MarginalSparse`.
191+
.. math::
192+
193+
\begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^*
194+
\\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim
195+
\text{N}\left(
196+
\begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\
197+
m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\,
198+
\begin{bmatrix}
199+
K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\
200+
K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\
201+
0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\
202+
0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\
203+
K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\
204+
K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**}
205+
\end{bmatrix}
206+
\right) \,.
207+
208+
Using the joint distribution to obtain the conditional distribution of :math:`f_1^*`
209+
with the contribution due to :math:`f_1 + f_2` factored out, we get
193210

194-
There are examples demonstrating in more detail the usage
195-
of GP functionality in PyMC3, including examples demonstrating the usage of the
196-
different GP implementations. Link to other GPs here in a note
211+
.. math::
212+
f_1^* \mid f_1 + f_2 \sim \text{N}\left(
213+
m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\,
214+
K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,.
215+
216+
217+
So one can break down GP models into individual components to examine how each
218+
contribute to the data. For more information, check out `David Duvenaud's PhD
219+
thesis <https://www.cs.toronto.edu/~duvenaud/thesis.pdf>`_.
220+
221+
The GP objects in PyMC3 keeps track of these marginals automatically. The
222+
following code sketch shows how to define the conditional distribution of
223+
:math:`f_2^*`. We use `gp.Marginal` in the example, but the same works for
224+
other implementations. The first block fits the GP prior. We denote
225+
:math:`f_1 + f_2` as just :math:`f` for brevity::
226+
227+
with pm.Model() as model:
228+
gp1 = pm.gp.Latent(mean_func1, cov_func1)
229+
gp2 = pm.gp.Latent(mean_func2, cov_func2)
230+
231+
# gp represents f1 + f2.
232+
gp = gp1 + gp2
233+
234+
f = gp.marginal_likelihood("f", X, y, noise)
235+
236+
trace = pm.sample(1000)
237+
238+
239+
The second block produces conditional distributions, including :math:`f_2^*
240+
\mid f_1 + f_2`. Notice that extra arguments are required for conditionals of
241+
:math:`f1` and :math:`f2`, but not :math:`f`. This is because those arguments
242+
are cached when calling :code:`.marginal_likelihood`.
243+
244+
To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we
245+
also need to include the additional arguments, :code:`X`, :code:`y`, and
246+
:code:`noise`::
247+
248+
with model:
249+
# conditional distributions of f1 and f2
250+
f1_star = gp1.conditional("f1_star", X_star, X=X, y=y, noise=noise)
251+
f2_star = gp2.conditional("f2_star", X_star, X=X, y=y, noise=noise)
252+
253+
# conditional of f1 + f2, additional kwargs not required
254+
f_star = gp.conditional("f_star", X_star)
255+
256+
.. note::
257+
The additional arguments :code:`X`, :code:`y`, and :code:`noise` must be
258+
provided as **keyword arguments**!
259+
260+
The :code:`gp` object keeps track of the inputs it used when :code:`marginal_likelihood`
261+
or :code:`prior` was set. Since the marginal likelihoood method of :code:`gp1` or
262+
:code:`gp2` weren't called, their conditionals need to be provided with the required
263+
inputs. In the same fashion as the prior, :code:`f_star`, :code:`f1_star` and
264+
:code:`f2_star` are random variables that can now be used like any other random
265+
variable in PyMC3.
266+
267+
Check the notebooks for detailed demonstrations of the usage of GP functionality
268+
in PyMC3.
197269

198270

docs/source/notebooks/GP-Covariances.ipynb

Lines changed: 1015 additions & 0 deletions
Large diffs are not rendered by default.

docs/source/notebooks/GP-Latent.ipynb

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
"source": [
2020
"## `.prior`\n",
2121
"\n",
22-
"Concretely, what this means is that with some data set of finite size, the `prior` method places a multivariate normal prior distribution on the vector of function values, $\\mathbf{f}$,\n",
22+
"With some data set of finite size, the `prior` method places a multivariate normal prior distribution on the vector of function values, $\\mathbf{f}$,\n",
2323
"\n",
2424
"$$\n",
2525
"\\mathbf{f} \\sim \\text{MvNormal}(\\mathbf{m}_{x},\\, \\mathbf{K}_{xx}) \\,,\n",
@@ -42,10 +42,10 @@
4242
" gp = pm.gp.Latent(cov_func=cov_func)\n",
4343
" \n",
4444
" # Place a GP prior over the function f.\n",
45-
" f = gp.prior(\"f\", n_points=10, X=X)\n",
45+
" f = gp.prior(\"f\", X=X)\n",
4646
"```\n",
4747
"\n",
48-
"By default, PyMC3 reparameterizes the prior on `f` by rotating it with the Cholesky factor. This helps to reduce covariances in the posterior of the transformed GP, `v`. The reparameterized model is,\n",
48+
"By default, PyMC3 reparameterizes the prior on `f` under the hood by rotating it with the Cholesky factor of its covariance matrix. This helps to reduce covariances in the posterior of the transformed random variable, `v`. The reparameterized model is,\n",
4949
"\n",
5050
"$$\n",
5151
"\\begin{aligned}\n",
@@ -64,29 +64,23 @@
6464
"source": [
6565
"## `.conditional`\n",
6666
"\n",
67-
"The conditional method implements the \"predictive\" distribution for function values that were not part of the original data set. This distribution is,\n",
67+
"The conditional method implements the predictive distribution for function values that were not part of the original data set. This distribution is,\n",
6868
"\n",
6969
"$$\n",
7070
"\\mathbf{f}_* \\mid \\mathbf{f} \\sim \\text{MvNormal} \\left(\n",
7171
" \\mathbf{m}_* + \\mathbf{K}_{*x}\\mathbf{K}_{xx}^{-1} \\mathbf{f} ,\\,\n",
7272
" \\mathbf{K}_{**} - \\mathbf{K}_{*x}\\mathbf{K}_{xx}^{-1}\\mathbf{K}_{x*} \\right)\n",
7373
"$$\n",
7474
"\n",
75-
"Using the same `gp` object we defined above, this is specified as,\n",
75+
"Using the same `gp` object we defined above, we can construct a random variable with this\n",
76+
"distribution by,\n",
7677
"\n",
7778
"```python\n",
7879
"# vector of new X points we want to predict the function at\n",
7980
"X_star = np.linspace(0, 2, 100)[:, None]\n",
8081
"\n",
8182
"with latent_gp_model:\n",
8283
" f_star = gp.conditional(\"f_star\", n_points=100, X_star)\n",
83-
"```\n",
84-
"\n",
85-
"If `gp` is part of a sum of GP objects, it can be conditioned on different components of that sum using the optional keyword argument `given`,\n",
86-
"\n",
87-
"```python\n",
88-
" f_star_diff = gp.conditional(\"f_star_diff\", n_points=100, X_star, \n",
89-
" gp=a_different_gp)\n",
9084
"```"
9185
]
9286
},

docs/source/notebooks/GP-Marginal.ipynb

Lines changed: 41 additions & 86 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)