more stuff on intro

bwengals · bwengals · commit 27fbe4798ced · 2017-08-04T21:26:45.000-05:00
diff --git a/docs/source/notebooks/GP-introduction.ipynb b/docs/source/notebooks/GP-introduction.ipynb
@@ -120,7 +120,7 @@
     "\n",
     "Here the `lengthscales` parameter is two dimensional, each dimension can have different lengthscales.  The reason we have to specify `input_dim`, the total number of columns of `X`, and `active_dims`, which of those columns or dimensions the covariance function will act on, is because `cov_func` hasn't actually seen the input data yet.  The `active_dims` argument is optional, and defaults to all columns of the matrix of inputs.  \n",
     "\n",
-    "Covariance functions in PyMC3 closely follow the algebraic rules for kernels:\n",
+    "Covariance functions in PyMC3 closely follow the algebraic rules for kernels, which allows users to combine covariance functions into new ones, for example:\n",
     "\n",
     "- The sum two covariance functions is also a covariance function.\n",
     "\n",
@@ -137,9 +137,7 @@
     "    \n",
     "    cov_func = eta**2 * pm.gp.cov.Matern32(...)\n",
     "    \n",
-    "- ...\n",
-    "\n",
-    "Like the `gp.*` objects, until the covariance functions are actually *evaluated* over a set of inputs, they are still a Python objects that aren't part of the model.  To evaluate a covariance function and create an actual covariance matrix, call `cov_func(x, x)`, or `cov_func(x, x_new)`. "
+    "For more information on combining covariance functions in PyMC3, check out the tutorial on covariance functions.  Like the `gp.*` objects, until the covariance functions are actually *evaluated* over a set of inputs, they are still a Python objects that aren't part of the model.  To evaluate a covariance function and create an actual covariance matrix, call `cov_func(x, x)`, or `cov_func(x, x_new)`. "
    ]
   },
   {
@@ -148,16 +146,16 @@
    "source": [
     "# Example: `gp.Latent`\n",
     "\n",
-    "The following is an example showing how to specify a simple model with a GP prior, and then sample from the posterior using NUTS.  We build an example data set to use using a multivariate normal and known covariance function to generate the data so we can verify that the inference we perform is correct."
+    "The following is an example showing how to specify a simple model with a GP prior, then sample from the posterior using NUTS.  We build an example data with a draw from a GP, so we can verify that the inference we perform is correct."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 14,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2017-08-04T21:00:54.024910Z",
-     "start_time": "2017-08-04T21:00:53.091113Z"
+     "end_time": "2017-08-05T00:30:53.357774Z",
+     "start_time": "2017-08-05T00:30:53.348602Z"
     },
     "collapsed": true
    },
@@ -437,7 +435,9 @@
    "source": [
     "# Example: `gp.Marginal`\n",
     "\n",
-    "There is a more efficient way to model the last example.  Most GP introductions or tutorials describe the scenario we just covered -- regression with IID Gaussian noise.  This is a special case, but is the most common GP model that people use.  Here there is no need to explicitly include the unknown function values as latent variables because $\\mathbf{f}_x$ can be integrated out analytically.  The product of the GP prior probability distribution with a normal likelihood is also normal, and is called the *marginal likelihood*.  Including the prior on the hyperparameters of the covariance function, we can write the *marginal posterior* as\n",
+    "There is a more efficient way to model the last example.  Most GP introductions or tutorials describe the scenario we just covered -- regression with IID Gaussian noise.  This is the most common GP model that people use, but it's really a special case.  When the noise is Gaussian there is no need to explicitly include $\\mathbf{f}_x$ as latent variables because it can be integrated out analytically.  \n",
+    "\n",
+    "As mentioned before, the product of the GP prior probability distribution with a normal likelihood is also normal.  It's called the *marginal likelihood*.  If we including the prior on the hyperparameters of the covariance function, we can write the *marginal posterior* as\n",
     "\n",
     "$$\n",
     "p(y \\mid x, \\theta)p(\\theta) = \\int p(y \\mid f, x, \\theta) \\, p(f \\mid x, \\theta) \\,\n",
@@ -453,9 +453,9 @@
     "  - \\frac{n}{2}\\log (2 \\pi) + \\log p(\\theta)\n",
     "$$\n",
     "\n",
-    "The first term penalizes lack of fit, the second term penalizes model complexity via the determinant of $K_{xx}$.  The third term is just a constant.  The final term is the log-prior of the covariance function hyperparameters.    \n",
+    "The first term penalizes lack of fit, the second term penalizes model complexity via the determinant of $K_{xx}$.  The third term is just a constant.  The final term on the right is the log-prior of the covariance function hyperparameters.    \n",
     "\n",
-    "We repeat the previous example using `gp.Marginal` instead.  The code to specify this equivalent model is a little bit different that before.  Notice that `gp.marginal_likelihood` subsumes both the GP prior and the Normal likelihood of the observed data, `y`.  Also, since we are using the marginal likelihood, it is possible to use `find_MAP` to quickly get the value at the mode of the covariance function hyperparameters.  "
+    "The code to specify this equivalent model using `gp.Marginal` is a little bit different that before.  The `gp.marginal_likelihood` subsumes both the GP prior and the Normal likelihood of the observed data, `y`.  Also, since we are using the marginal likelihood, it is possible to use `find_MAP` to quickly get the value at the mode of the covariance function hyperparameters.  "
    ]
   },
   {