You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: additive.tex
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -153,7 +153,7 @@ \subsection{Weighting different orders of interaction}
153
153
154
154
On different datasets, the dominant order of interaction estimated by the additive model varies widely.
155
155
In some cases, the variance is concentrated almost entirely onto a single order of interaction.
156
-
This may may be a side-effect of using the same lengthscales for all orders of interaction.; lengthscales appropriate for low-dimensional regression might not be appropriate for high-dimensional regression.
156
+
This may may be a side-effect of using the same lengthscales for all orders of interaction; lengthscales appropriate for low-dimensional regression might not be appropriate for high-dimensional regression.
157
157
A re-scaling of lengthscales which preserves relative average distances between datapoints might be expected to improve the model.
158
158
%An additive \gp{} with all of its variance coming from the 1st order is equivalent to a sum of one-dimensional functions.
159
159
%An additive \gp{} with all its variance coming from the $D$th order is equivalent to a \gp{} with an \seard{} kernel.
@@ -219,7 +219,7 @@ \subsubsection{Evaluation of derivatives}
219
219
\label{eq:additive-derivatives}
220
220
\end{align}
221
221
%
222
-
\Cref{eq:additive-derivatives} gives the terms that $k_j$ is multiplied by in the original polynomial, which are the terms required by the chain rule.
222
+
\Cref{eq:additive-derivatives} gives all terms that $k_j$ is multiplied by in the original polynomial, which are exactly the terms required by the chain rule.
223
223
These derivatives allow gradient-based optimization of the base kernel parameters with respect to the marginal likelihood.
Copy file name to clipboardExpand all lines: grammar.tex
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -676,7 +676,7 @@ \subsection{Structure recovery on synthetic data}
676
676
677
677
678
678
\Cref{tbl:synthetic} shows the results.
679
-
For the highest signal-to-noise ratio, \procedurename{} usually recoveres the correct structure.
679
+
For the highest signal-to-noise ratio, \procedurename{} usually recovers the correct structure.
680
680
The reported additional linear structure in the last row can be explained the fact that functions sampled from \kSE{} kernels with long length-scales occasionally have near-linear trends.
681
681
As the noise increases, our method generally backs off to simpler structures rather than reporting spurious structure.
Copy file name to clipboardExpand all lines: intro.tex
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ \chapter{Introduction}
25
25
26
26
%This thesis will be concerned with finding structure in functions.
27
27
%The types of structure examined in this thesis
28
-
One can construct models of functions having many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
28
+
One can construct models of functions that have many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
29
29
%To be able to learn a wide variety of structures, we would like to have an expressive language of models of functions.
30
30
%We would like to be able to represent simple kinds of functions, such as linear functions or polynomials.
31
31
%We would also like to have models of arbitrarily complex functions, specified in terms of high-level properties such as how smooth they are, whether they repeat over time, or which symmetries they have.
@@ -35,7 +35,7 @@ \chapter{Introduction}
35
35
%
36
36
%This chapter will introduce the basic properties of \gp{}s.
37
37
Chapter \ref{ch:kernels} will describe how to model these different types of structure using \gp{}s.
38
-
This short chapter will introduce the basic properties of \gp{}s, and provide an outline of the thesis.
38
+
This short chapter introduces the basic properties of \gp{}s, and provides an outline of the thesis.
39
39
%Chapter \ref{ch:grammar} will show how searching over many
Copy file name to clipboardExpand all lines: kernels.tex
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -146,15 +146,15 @@ \subsection{Combining properties through multiplication}
146
146
Here, we discuss a few examples:
147
147
148
148
\begin{itemize}
149
-
\item {\bf Locally Periodic Functions.}
150
-
In univariate data, multiplying a kernel by \kSE{} gives a way of converting global structure to local structure.
151
-
For example, $\Per$ corresponds to exactly periodic structure, whereas $\Per\kerntimes\SE$ corresponds to locally periodic structure, as shown in the second column of \cref{fig:kernels_times}.
152
-
153
149
\item {\bf Polynomial Regression.}
154
150
By multiplying together $T$ linear kernels, we obtain a prior on polynomials of degree $T$.
155
151
%This class of functions also has a simple parametric form.
156
152
The first column of \cref{fig:kernels_times} shows a quadratic kernel.
157
153
154
+
\item {\bf Locally Periodic Functions.}
155
+
In univariate data, multiplying a kernel by \kSE{} gives a way of converting global structure to local structure.
156
+
For example, $\Per$ corresponds to exactly periodic structure, whereas $\Per\kerntimes\SE$ corresponds to locally periodic structure, as shown in the second column of \cref{fig:kernels_times}.
157
+
158
158
\item {\bf Functions with Growing Amplitude.}
159
159
Multiplying by a linear kernel means that the marginal standard deviation of the function being modeled grows linearly away from the location given by kernel parameter $c$.
160
160
The third and fourth columns of \cref{fig:kernels_times} show two examples.
@@ -381,7 +381,7 @@ \subsection{Example: An additive model of concrete strength}
381
381
382
382
To illustrate how additive kernels give rise to interpretable models, we built an additive model of the strength of concrete as a function of the amount of seven different ingredients (cement, slag, fly ash, water, plasticizer, coarse aggregate and fine aggregate), and the age of the concrete \citep{yeh1998modeling}.
383
383
%We model measurements of the compressive strength of concrete, as a function of the concentration of 7 ingredients, plus the age of the concrete.
384
-
Our simple model is a sum of 8 different one-dimensional functions, each depending on one of these variables:
384
+
Our simple model is a sum of 8 different one-dimensional functions, each depending on only one of these quantities:
385
385
%
386
386
\begin{align}
387
387
f(\vx) & =
@@ -521,12 +521,12 @@ \subsubsection{Posterior covariance of additive components}
@@ -539,7 +539,7 @@ \subsubsection{Posterior covariance of additive components}
539
539
%Each plot shows the posterior correlations between the height of two functions, evaluated across the range of the data upon which they depend.
540
540
%Color indicates the amount of correlation between the function value of the two components.
541
541
Red indicates high correlation, teal indicates no correlation, and blue indicates negative correlation.
542
-
Plots on the diagonal show posterior correlations within each function.
542
+
Plots on the diagonal show posterior correlations between different values of the same function.
543
543
Correlations are evaluated over the same input ranges as in \cref{fig:interpretablefunctions}.
544
544
%Off-diagonal plots show posterior covariance between each pair of functions, as a function of both inputs.
545
545
%Negative correlation means that one function is high and the other low, but which one is uncertain.
@@ -549,7 +549,7 @@ \subsubsection{Posterior covariance of additive components}
549
549
\end{figure}
550
550
%
551
551
For example, \cref{fig:interpretableinteractions} shows the posterior correlation between all non-zero components of the concrete model.
552
-
This figure shows that most of the correlation occurs within components, but there is also negative correlation between the ``cement'' and ``slag'' variables.
552
+
This figure shows that most of the correlation occurs within components, but there is also negative correlation between the height of $f_1(\textnormal{cement})$ and $f_2(\textnormal{slag})$.
553
553
This reflects an ambiguity in the model about which one of these functions is high and the other low.
0 commit comments