@@ -51,16 +51,12 @@ distribution is
51
51
Williams, or `this introduction <https://www.ics.uci.edu/~welling/teaching/KernelsICS273B/gpB.pdf >`_
52
52
by D. Mackay.
53
53
54
- =============================
55
- Overview of GP usage In PyMC3
56
- =============================
57
-
58
54
PyMC3 is a great environment for working with fully Bayesian Gaussian Process
59
- models. The GP functionality of PyMC3 is meant to have a clear syntax and be
60
- highly composable. It includes many predefined covariance functions (or
61
- kernels), mean functions, and several GP implementations . GPs are treated as
62
- distributions that can be used within larger or hierarchical models, not just
63
- as standalone regression models.
55
+ models. GPs in PyMC3 have a clear syntax and are highly composable, and many
56
+ predefined covariance functions (or kernels), mean functions, and several GP
57
+ implementations are included . GPs are treated as distributions that can be
58
+ used within larger or hierarchical models, not just as standalone regression
59
+ models.
64
60
65
61
Mean and covariance functions
66
62
=============================
@@ -78,10 +74,10 @@ For example, to construct an exponentiated quadratic covariance function that
78
74
operates on the second and third column of a three column matrix representing
79
75
three predictor variables::
80
76
81
- lengthscales = [2, 5]
82
- cov_func = pm.gp.cov.ExpQuad(input_dim=3, lengthscales , active_dims=[1, 2])
77
+ ls = [2, 5] # the lengthscales
78
+ cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls , active_dims=[1, 2])
83
79
84
- Here the :code: `lengthscales ` parameter is two dimensional, allowing the second
80
+ Here the :code: `ls ` parameter is two dimensional, allowing the second
85
81
and third dimension to have a different lengthscale. The reason we have to
86
82
specify :code: `input_dim `, the total number of columns of :code: `X `, and
87
83
:code: `active_dims `, which of those columns or dimensions the covariance
@@ -102,15 +98,17 @@ which allows users to combine covariance functions into new ones, for example:
102
98
103
99
cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)
104
100
105
- - The product (or sum) of a covariance function with a scalar is a covariance function::
101
+ - The product (or sum) of a covariance function with a scalar is a covariance
102
+ function::
106
103
107
104
108
105
cov_func = eta**2 * pm.gp.cov.Matern32(...)
109
106
110
107
After the covariance function is defined, it is now a function that is
111
- evaluated by calling :code: `cov_func(x, x) ` (or :code: `mean_func(x) `). For more
112
- information on mean and covariance functions in PyMC3, check out the tutorial
113
- on covariance functions.
108
+ evaluated by calling :code: `cov_func(x, x) ` (or :code: `mean_func(x) `). Since
109
+ PyMC3 is built on top of Theano, it is relatively easy to define and experiment
110
+ with non-standard covariance and mean functons. For more information check out
111
+ the tutorial on covariance functions.
114
112
115
113
116
114
:code: `gp.* ` implementations
@@ -133,66 +131,140 @@ is::
133
131
The first argument is the mean function and the second is the covariance
134
132
function. We've made the GP object, but we haven't made clear which function
135
133
it is to be a prior for, what the inputs are, or what parameters it will be
136
- conditioned on.
134
+ conditioned on.
137
135
138
- GPs in PyMC3 are additive, which means we can write ::
136
+ .. note ::
139
137
140
- gp1 = pm.gp.Latent(mean_func1, cov_func1)
141
- gp2 = pm.gp.Latent(mean_func2, cov_func2)
142
- gp3 = gp1 + gp2
143
-
138
+ The :code: `gp.Marginal ` class and similar don't have a :code: `prior ` method.
139
+ Instead they have a :code: `marginal_likelihood ` method that is used similarly,
140
+ but has additional required arguments, such as the observed data, noise,
141
+ or other, depending on the implementation. See the notebooks for examples.
142
+ The :code: `conditional ` method works similarly.
144
143
145
144
Calling the `prior ` method will create a PyMC3 random variable that represents
146
- the latent function :math:$ f(x) = \m athbf{f}$ ::
145
+ the latent function :math: ` f(x) = \mathbf {f}` ::
147
146
148
- f = gp3 .prior("f", n_points , X)
147
+ f = gp .prior("f", X)
149
148
150
149
:code: `f ` is a random variable that can be used within a PyMC3 model like any
151
150
other type of random variable. The first argument is the name of the random
152
- variable representing the function we are placing the prior over. The second
153
- argument is :code: ` n_points `, which is number of points that the GP acts on ---
154
- the total number of dimensions. The third argument is the inputs to the
155
- function that the prior is over, :code: ` X ` . The inputs are usually known and
156
- present in the data, but they can also be PyMC3 random variables .
151
+ variable representing the function we are placing the prior over.
152
+ The second argument is the inputs to the function that the prior is over,
153
+ :code: ` X `. The inputs are usually known and present in the data, but they can
154
+ also be PyMC3 random variables . If the inputs are a Theano tensor or a
155
+ PyMC3 random variable, the :code: ` n_points ` argument is required .
157
156
158
157
.. note ::
159
158
160
- The :code: `n_points ` argument is required because of how Theano and PyMC3
161
- handle the shape information of distributions. For :code: `prior ` or
162
- :code: `marginal_likelihood `, it is the number of rows in the inputs,
163
- :code: `X `. For :code: `conditional `, it is the number of rows in the new
164
- inputs, :code: `X_new `.
159
+ The :code: `n_points ` argument is sometimes required due to how Theano and
160
+ PyMC3 handle the shape information of distributions. It needs to be specified
161
+ when :code: `X ` or :code: `Xnew ` is a Theano tensor or a random variable.
162
+ For :code: `prior ` or :code: `marginal_likelihood `, it is the number of rows
163
+ of the inputs, :code: `X `. For :code: `conditional `, it is the number of
164
+ rows in the new inputs, :code: `X_new `.
165
165
166
+ Usually at this point, inference is performed on the model. The
167
+ :code: `conditional ` method creates the conditional, or predictive,
168
+ distribution over the latent function at arbitrary :math: `x_*` input points,
169
+ :math: `f(x_*)`. To construct the conditional distribution we write::
166
170
167
- The :code: `conditional ` method creates the conditional, or predictive,
168
- distribution over the latent function at arbitrary :math: $x_*$ input points,
169
- :math: $f(x_*)$. It can be called on any or all of the component GPs,
170
- :code: `gp1 `, :code: `gp2 `, or :code: `gp3 `. To construct the conditional
171
- distribution for :code: `gp3 `, we write::
171
+ f_star = gp.conditional("f_star", X_star)
172
172
173
- f_star = gp3.conditional("f_star", n_newpoints, X_star)
173
+ Additive GPs
174
+ ============
174
175
175
- To construct the conditional distribution of one of the component GPs,
176
- :code: `gp1 ` or :code: `gp2 `, we also need to include the original inputs
177
- :code: `X ` as an argument::
176
+ The GP implementation in PyMC3 is constructed so that it is easy to define
177
+ additive GPs and sample from individual GP components. We can write::
178
178
179
- f1_star = gp1.conditional("f1_start", n_newpoints, X_star, X=X)
179
+ gp1 = pm.gp.Marginal(mean_func1, cov_func1)
180
+ gp2 = pm.gp.Marginal(mean_func2, cov_func2)
181
+ gp3 = gp1 + gp2
180
182
181
- The :code: `gp3 ` object keeps track of the inputs it used when :code: `prior ` was
182
- set. Since the prior method of :code: `gp1 ` wasn't called, it needs to be
183
- provided with the inputs :code: `X `. In the same fashion as the prior,
184
- :code: `f_star ` and :code: `f1_star ` are random variables that can now be used
185
- like any other random variable in PyMC3.
183
+ The GP objects have to have the same type, :code: `gp.Marginal ` cannot
184
+ be added to :code: `gp.Latent `.
186
185
187
- .. note ::
186
+ Consider two independent GP distributed functions, :math: `f_1 (x) \sim
187
+ \mathcal {GP}\left (m_1 (x),\, k_1 (x, x')\right )` and :math: `f_2 (x) \sim
188
+ \mathcal {GP}\left ( m_2 (x),\, k_2 (x, x')\right )`. The joint distribution of
189
+ :math: `f_1 ,\, f_1 ^*,\, f_2 ,\, f_2 ^*,\, f_1 + f_2 and f_1 ^* + f_2 ^*` is
188
190
189
- `gp.Latent ` has a `prior ` and a `conditional `, but not a `marginal_likelihood `,
190
- since that method doesn't make sense in this case. Other GP implementations
191
- have `marginal_likelihood `, but not a `prior `, such as `gp.Marginal ` and
192
- `gp.MarginalSparse `.
191
+ .. math ::
192
+
193
+ \begin {bmatrix} f_1 \\ f_1 ^* \\ f_2 \\ f_2 ^*
194
+ \\ f_1 + f_2 \\ f_1 ^* + f_2 ^* \end {bmatrix} \sim
195
+ \text {N}\left (
196
+ \begin {bmatrix} m_1 \\ m_1 ^* \\ m_2 \\ m_2 ^* \\
197
+ m_1 + m_2 \\ m_1 ^* + m_2 ^* \\ \end {bmatrix} \,,\,
198
+ \begin {bmatrix}
199
+ K_1 & K_1 ^* & 0 & 0 & K_1 & K_1 ^* \\
200
+ K_1 ^{*^T} & K_1 ^{**} & 0 & 0 & K_1 ^* & K_1 ^{**} \\
201
+ 0 & 0 & K_2 & K_2 ^* & K_2 & K_2 ^{*} \\
202
+ 0 & 0 & K_2 ^{*^T} & K_2 ^{**} & K_2 ^{*} & K_2 ^{**} \\
203
+ K_1 & K_1 ^{*} & K_2 & K_2 ^{*} & K_1 + K_2 & K_1 ^{*} + K_2 ^{*} \\
204
+ K_1 ^{*^T} & K_1 ^{**} & K_2 ^{*^T} & K_2 ^{**} & K_1 ^{*^T}+K_2 ^{*^T} & K_1 ^{**}+K_2 ^{**}
205
+ \end {bmatrix}
206
+ \right ) \,.
207
+
208
+ Using the joint distribution to obtain the conditional distribution of :math: `f_1 ^*`
209
+ with the contribution due to :math: `f_1 + f_2 ` factored out, we get
193
210
194
- There are examples demonstrating in more detail the usage
195
- of GP functionality in PyMC3, including examples demonstrating the usage of the
196
- different GP implementations. Link to other GPs here in a note
211
+ .. math ::
212
+ f_1 ^* \mid f_1 + f_2 \sim \text {N}\left (
213
+ m_1 ^* + K_1 ^{*^T}(K_1 + K_2 )^{-1 }\left [f_1 + f_2 - m_1 - m_2 \right ] \,,\,
214
+ K_1 ^{**} - K_1 ^{*^T}(K_1 + K_2 )^{-1 }K_1 ^* \right ) \,.
215
+
216
+
217
+ So one can break down GP models into individual components to examine how each
218
+ contribute to the data. For more information, check out `David Duvenaud's PhD
219
+ thesis <https://www.cs.toronto.edu/~duvenaud/thesis.pdf> `_.
220
+
221
+ The GP objects in PyMC3 keeps track of these marginals automatically. The
222
+ following code sketch shows how to define the conditional distribution of
223
+ :math: `f_2 ^*`. We use `gp.Marginal ` in the example, but the same works for
224
+ other implementations. The first block fits the GP prior. We denote
225
+ :math: `f_1 + f_2 ` as just :math: `f` for brevity::
226
+
227
+ with pm.Model() as model:
228
+ gp1 = pm.gp.Latent(mean_func1, cov_func1)
229
+ gp2 = pm.gp.Latent(mean_func2, cov_func2)
230
+
231
+ # gp represents f1 + f2.
232
+ gp = gp1 + gp2
233
+
234
+ f = gp.marginal_likelihood("f", X, y, noise)
235
+
236
+ trace = pm.sample(1000)
237
+
238
+
239
+ The second block produces conditional distributions, including :math: `f_2 ^*
240
+ \mid f_1 + f_2 `. Notice that extra arguments are required for conditionals of
241
+ :math: `f1 ` and :math: `f2 `, but not :math: `f`. This is because those arguments
242
+ are cached when calling :code: `.marginal_likelihood `.
243
+
244
+ To construct the conditional distribution of :code: `gp1 ` or :code: `gp2 `, we
245
+ also need to include the additional arguments, :code: `X `, :code: `y `, and
246
+ :code: `noise `::
247
+
248
+ with model:
249
+ # conditional distributions of f1 and f2
250
+ f1_star = gp1.conditional("f1_star", X_star, X=X, y=y, noise=noise)
251
+ f2_star = gp2.conditional("f2_star", X_star, X=X, y=y, noise=noise)
252
+
253
+ # conditional of f1 + f2, additional kwargs not required
254
+ f_star = gp.conditional("f_star", X_star)
255
+
256
+ .. note ::
257
+ The additional arguments :code: `X `, :code: `y `, and :code: `noise ` must be
258
+ provided as **keyword arguments **!
259
+
260
+ The :code: `gp ` object keeps track of the inputs it used when :code: `marginal_likelihood `
261
+ or :code: `prior ` was set. Since the marginal likelihoood method of :code: `gp1 ` or
262
+ :code: `gp2 ` weren't called, their conditionals need to be provided with the required
263
+ inputs. In the same fashion as the prior, :code: `f_star `, :code: `f1_star ` and
264
+ :code: `f2_star ` are random variables that can now be used like any other random
265
+ variable in PyMC3.
266
+
267
+ Check the notebooks for detailed demonstrations of the usage of GP functionality
268
+ in PyMC3.
197
269
198
270
0 commit comments