|
| 1 | +=========================== |
| 2 | +Gaussian Processes in PyMC3 |
| 3 | +=========================== |
| 4 | + |
| 5 | +GP Basics |
| 6 | +========= |
| 7 | + |
| 8 | +Sometimes an unknown parameter or variable in a model is not a scalar value or |
| 9 | +a fixed-length vector, but a *function*. A Gaussian process (GP) can be used |
| 10 | +as a prior probability distribution whose support is over the space of |
| 11 | +continuous functions. A GP prior on the function :math:`f(x)` is usually written, |
| 12 | + |
| 13 | +.. math:: |
| 14 | +
|
| 15 | + f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,. |
| 16 | +
|
| 17 | +The function values are modeled as a draw from a multivariate normal |
| 18 | +distribution that is parameterized by the mean function, :math:`m(x)`, and the |
| 19 | +covariance function, :math:`k(x, x')`. Gaussian processes are a convenient |
| 20 | +choice as priors over functions due to the marginalization and conditioning |
| 21 | +properties of the multivariate normal distribution. Usually, the marginal |
| 22 | +distribution over :math:`f(x)` is evaluated during the inference step. The |
| 23 | +conditional distribution is then used for predicting the function values |
| 24 | +:math:`f(x_*)` at new points, :math:`x_*`. |
| 25 | + |
| 26 | +The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate |
| 27 | +normal, |
| 28 | + |
| 29 | +.. math:: |
| 30 | +
|
| 31 | + \begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim |
| 32 | + \text{N}\left( |
| 33 | + \begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,, |
| 34 | + \begin{bmatrix} k(x,x') & k(x_*, x) \\ |
| 35 | + k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix} |
| 36 | + \right) \,. |
| 37 | +
|
| 38 | +Starting from the joint distribution, one obtains the marginal distribution |
| 39 | +of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional |
| 40 | +distribution is |
| 41 | + |
| 42 | +.. math:: |
| 43 | +
|
| 44 | + f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\, |
| 45 | + k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,. |
| 46 | +
|
| 47 | +.. note:: |
| 48 | + |
| 49 | + For more information on GPs, check out the book `Gaussian Processes for |
| 50 | + Machine Learning <http://www.gaussianprocess.org/gpml/>`_ by Rasmussen & |
| 51 | + Williams, or `this introduction <https://www.ics.uci.edu/~welling/teaching/KernelsICS273B/gpB.pdf>`_ |
| 52 | + by D. Mackay. |
| 53 | + |
| 54 | +============================= |
| 55 | +Overview of GP usage In PyMC3 |
| 56 | +============================= |
| 57 | + |
| 58 | +PyMC3 is a great environment for working with fully Bayesian Gaussian Process |
| 59 | +models. The GP functionality of PyMC3 is meant to have a clear syntax and be |
| 60 | +highly composable. It includes many predefined covariance functions (or |
| 61 | +kernels), mean functions, and several GP implementations. GPs are treated as |
| 62 | +distributions that can be used within larger or hierarchical models, not just |
| 63 | +as standalone regression models. |
| 64 | + |
| 65 | +Mean and covariance functions |
| 66 | +============================= |
| 67 | + |
| 68 | +Those who have used the GPy or GPflow Python packages will find the syntax for |
| 69 | +construction mean and covariance functions somewhat familiar. When first |
| 70 | +instantiated, the mean and covariance functions are parameterized, but not |
| 71 | +given their inputs yet. The covariance functions must additionally be provided |
| 72 | +with the number of dimensions of the input matrix, and a list that indexes |
| 73 | +which of those dimensions they are to operate on. The reason for this design |
| 74 | +is so that covariance functions can be constructed that are combinations of |
| 75 | +other covariance functions. |
| 76 | + |
| 77 | +For example, to construct an exponentiated quadratic covariance function that |
| 78 | +operates on the second and third column of a three column matrix representing |
| 79 | +three predictor variables:: |
| 80 | + |
| 81 | + lengthscales = [2, 5] |
| 82 | + cov_func = pm.gp.cov.ExpQuad(input_dim=3, lengthscales, active_dims=[1, 2]) |
| 83 | + |
| 84 | +Here the :code:`lengthscales` parameter is two dimensional, allowing the second |
| 85 | +and third dimension to have a different lengthscale. The reason we have to |
| 86 | +specify :code:`input_dim`, the total number of columns of :code:`X`, and |
| 87 | +:code:`active_dims`, which of those columns or dimensions the covariance |
| 88 | +function will act on, is because :code:`cov_func` hasn't actually seen the |
| 89 | +input data yet. The :code:`active_dims` argument is optional, and defaults to |
| 90 | +all columns of the matrix of inputs. |
| 91 | + |
| 92 | +Covariance functions in PyMC3 closely follow the algebraic rules for kernels, |
| 93 | +which allows users to combine covariance functions into new ones, for example: |
| 94 | + |
| 95 | +- The sum two covariance functions is also a covariance function:: |
| 96 | + |
| 97 | + |
| 98 | + cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...) |
| 99 | + |
| 100 | +- The product of two covariance functions is also a covariance function:: |
| 101 | + |
| 102 | + |
| 103 | + cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...) |
| 104 | + |
| 105 | +- The product (or sum) of a covariance function with a scalar is a covariance function:: |
| 106 | + |
| 107 | + |
| 108 | + cov_func = eta**2 * pm.gp.cov.Matern32(...) |
| 109 | + |
| 110 | +After the covariance function is defined, it is now a function that is |
| 111 | +evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). For more |
| 112 | +information on mean and covariance functions in PyMC3, check out the tutorial |
| 113 | +on covariance functions. |
| 114 | + |
| 115 | + |
| 116 | +:code:`gp.*` implementations |
| 117 | +============================ |
| 118 | + |
| 119 | +PyMC3 includes several GP implementations, including marginal and latent |
| 120 | +variable models and also some fast approximations. Their usage all follows a |
| 121 | +similar pattern: First, a GP is instantiated with a mean function and a |
| 122 | +covariance function. Then, GP objects can be added together, allowing for |
| 123 | +function characteristics to be carefully modeled and separated. Finally, one |
| 124 | +of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP |
| 125 | +object to actually construct the PyMC3 random variable that represents the |
| 126 | +function prior. |
| 127 | + |
| 128 | +Using :code:`gp.Latent` for the example, the syntax to first specify the GP |
| 129 | +is:: |
| 130 | + |
| 131 | + gp = pm.gp.Latent(mean_func, cov_func) |
| 132 | + |
| 133 | +The first argument is the mean function and the second is the covariance |
| 134 | +function. We've made the GP object, but we haven't made clear which function |
| 135 | +it is to be a prior for, what the inputs are, or what parameters it will be |
| 136 | +conditioned on. |
| 137 | + |
| 138 | +GPs in PyMC3 are additive, which means we can write:: |
| 139 | + |
| 140 | + gp1 = pm.gp.Latent(mean_func1, cov_func1) |
| 141 | + gp2 = pm.gp.Latent(mean_func2, cov_func2) |
| 142 | + gp3 = gp1 + gp2 |
| 143 | + |
| 144 | + |
| 145 | +Calling the `prior` method will create a PyMC3 random variable that represents |
| 146 | +the latent function :math:$f(x) = \mathbf{f}$:: |
| 147 | + |
| 148 | + f = gp3.prior("f", n_points, X) |
| 149 | + |
| 150 | +:code:`f` is a random variable that can be used within a PyMC3 model like any |
| 151 | +other type of random variable. The first argument is the name of the random |
| 152 | +variable representing the function we are placing the prior over. The second |
| 153 | +argument is :code:`n_points`, which is number of points that the GP acts on --- |
| 154 | +the total number of dimensions. The third argument is the inputs to the |
| 155 | +function that the prior is over, :code:`X`. The inputs are usually known and |
| 156 | +present in the data, but they can also be PyMC3 random variables. |
| 157 | + |
| 158 | +.. note:: |
| 159 | + |
| 160 | + The :code:`n_points` argument is required because of how Theano and PyMC3 |
| 161 | + handle the shape information of distributions. For :code:`prior` or |
| 162 | + :code:`marginal_likelihood`, it is the number of rows in the inputs, |
| 163 | + :code:`X`. For :code:`conditional`, it is the number of rows in the new |
| 164 | + inputs, :code:`X_new`. |
| 165 | + |
| 166 | + |
| 167 | +The :code:`conditional` method creates the conditional, or predictive, |
| 168 | +distribution over the latent function at arbitrary :math:$x_*$ input points, |
| 169 | +:math:$f(x_*)$. It can be called on any or all of the component GPs, |
| 170 | +:code:`gp1`, :code:`gp2`, or :code:`gp3`. To construct the conditional |
| 171 | +distribution for :code:`gp3`, we write:: |
| 172 | + |
| 173 | + f_star = gp3.conditional("f_star", n_newpoints, X_star) |
| 174 | + |
| 175 | +To construct the conditional distribution of one of the component GPs, |
| 176 | +:code:`gp1` or :code:`gp2`, we also need to include the original inputs |
| 177 | +:code:`X` as an argument:: |
| 178 | + |
| 179 | + f1_star = gp1.conditional("f1_start", n_newpoints, X_star, X=X) |
| 180 | + |
| 181 | +The :code:`gp3` object keeps track of the inputs it used when :code:`prior` was |
| 182 | +set. Since the prior method of :code:`gp1` wasn't called, it needs to be |
| 183 | +provided with the inputs :code:`X`. In the same fashion as the prior, |
| 184 | +:code:`f_star` and :code:`f1_star` are random variables that can now be used |
| 185 | +like any other random variable in PyMC3. |
| 186 | + |
| 187 | +.. note:: |
| 188 | + |
| 189 | + `gp.Latent` has a `prior` and a `conditional`, but not a `marginal_likelihood`, |
| 190 | + since that method doesn't make sense in this case. Other GP implementations |
| 191 | + have `marginal_likelihood`, but not a `prior`, such as `gp.Marginal` and |
| 192 | + `gp.MarginalSparse`. |
| 193 | + |
| 194 | +There are examples demonstrating in more detail the usage |
| 195 | +of GP functionality in PyMC3, including examples demonstrating the usage of the |
| 196 | +different GP implementations. Link to other GPs here in a note |
| 197 | + |
| 198 | + |
0 commit comments