You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: additive.tex
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -154,7 +154,7 @@ \subsection{Weighting different orders of interaction}
154
154
On different datasets, the dominant order of interaction estimated by the additive model varies widely.
155
155
In some cases, the variance is concentrated almost entirely onto a single order of interaction.
156
156
This may may be a side-effect of using the same lengthscales for all orders of interaction; lengthscales appropriate for low-dimensional regression might not be appropriate for high-dimensional regression.
157
-
A re-scaling of lengthscales which preserves relative average distances between datapoints might be expected to improve the model.
157
+
%A re-scaling of lengthscales which enforces similar distances between datapoints might improve the model.
158
158
%An additive \gp{} with all of its variance coming from the 1st order is equivalent to a sum of one-dimensional functions.
159
159
%An additive \gp{} with all its variance coming from the $D$th order is equivalent to a \gp{} with an \seard{} kernel.
Copy file name to clipboardExpand all lines: intro.tex
+7-4Lines changed: 7 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -23,9 +23,12 @@ \chapter{Introduction}
23
23
First, keeping all hypotheses that match the data helps to guard against over-fitting.
24
24
Second, comparing how well a dataset is fit by different models gives a way of finding which sorts of structure are present in that data. % models having different types of structure is a way to find which sorts of structure are present in a dataset.
25
25
26
-
%This thesis will be concerned with finding structure in functions.
26
+
This thesis focuses on constructing models of functions.
27
27
%The types of structure examined in this thesis
28
-
One can construct models of functions that have many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
28
+
Chapter \ref{ch:kernels} describes how to model functions having many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
29
+
Chapters \ref{ch:grammar} and \ref{ch:description} show how such models can be automatically constructed from data, and then automatically described.
30
+
Later chapters explore several extensions of these models.
31
+
%will describe how to model functions having many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
29
32
%To be able to learn a wide variety of structures, we would like to have an expressive language of models of functions.
30
33
%We would like to be able to represent simple kinds of functions, such as linear functions or polynomials.
31
34
%We would also like to have models of arbitrarily complex functions, specified in terms of high-level properties such as how smooth they are, whether they repeat over time, or which symmetries they have.
@@ -34,7 +37,7 @@ \chapter{Introduction}
34
37
%All of these models of function can be constructed using Gaussian processes (\gp{}s).%, a tractable class of models of functions.
35
38
%
36
39
%This chapter will introduce the basic properties of \gp{}s.
37
-
Chapter \ref{ch:kernels} will describe how to model these different types of structure using \gp{}s.
40
+
%Chapter \ref{ch:kernels} will describe how to model these different types of structure using \gp{}s.
38
41
This short chapter introduces the basic properties of \gp{}s, and provides an outline of the thesis.
39
42
%Chapter \ref{ch:grammar} will show how searching over many
40
43
@@ -96,7 +99,7 @@ \section{Gaussian process models}
96
99
\subsection{Model selection}
97
100
98
101
The crucial property of \gp{}s that allows us to automatically construct models is that we can compute the \emph{marginal likelihood} of a dataset given a particular model, also known as the \emph{evidence} \citep{mackay1992bayesian}.
99
-
The marginal likelihood allows one to compare models, automatically balancing between the capacity of a model and its fit to the data~\citep{rasmussen2001occam,mackay2003information}.
102
+
The marginal likelihood allows one to compare models, balancing between the capacity of a model and its fit to the data~\citep{rasmussen2001occam,mackay2003information}.
100
103
%discover the appropriate amount of detail to use, due to Bayesian Occam's razor
101
104
%
102
105
%Choosing a kernel, or kernel parameters, by maximizing the marginal likelihood will typically result in selecting the \emph{least} flexible model which still captures all the structure in the data.
Copy file name to clipboardExpand all lines: kernels.tex
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -429,7 +429,7 @@ \subsection{Example: An additive model of concrete strength}
429
429
430
430
\Cref{fig:interpretablefunctions} shows the marginal posterior distribution of each of the eight one-dimensional functions in the model. %\cref{eq:concrete}.
431
431
The parameters controlling the variance of two of the functions, $f_6(\textnormal{coarse})$ and $f_7(\textnormal{fine})$ were set to zero, meaning that the marginal likelihood preferred a parsimonious model which did not depend on these inputs.
432
-
This is an example of the automatic sparsity that arises by maximizing marginal likelihood in \gp{} models, and another example of automatic relevance determination (\ARD) \citep{neal1995bayesian}.
432
+
This is an example of the automatic sparsity that arises by maximizing marginal likelihood in \gp{} models, and is another example of automatic relevance determination (\ARD) \citep{neal1995bayesian}.
433
433
434
434
The ability to learn kernel parameters in this way is much more difficult when using non-probabilistic methods such as Support Vector Machines \citep{cortes1995support}, for which cross-validation is often the best method to select kernel parameters.
435
435
@@ -535,11 +535,11 @@ \subsubsection{Posterior covariance of additive components}
535
535
\end{tabular}
536
536
}
537
537
\caption[Visualizing posterior correlations between components]
538
-
{Posterior correlations between the heights of the one-dimensional functions in \cref{eq:concrete}, whose sum models concrete strength.
538
+
{Posterior correlations between the heights of different one-dimensional functions in \cref{eq:concrete}, whose sum models concrete strength.
539
539
%Each plot shows the posterior correlations between the height of two functions, evaluated across the range of the data upon which they depend.
540
540
%Color indicates the amount of correlation between the function value of the two components.
541
541
Red indicates high correlation, teal indicates no correlation, and blue indicates negative correlation.
542
-
Plots on the diagonal show posterior correlations between different values of the same function.
542
+
Plots on the diagonal show posterior correlations between different evaluations of the same function.
543
543
Correlations are evaluated over the same input ranges as in \cref{fig:interpretablefunctions}.
544
544
%Off-diagonal plots show posterior covariance between each pair of functions, as a function of both inputs.
545
545
%Negative correlation means that one function is high and the other low, but which one is uncertain.
@@ -550,7 +550,7 @@ \subsubsection{Posterior covariance of additive components}
550
550
%
551
551
For example, \cref{fig:interpretableinteractions} shows the posterior correlation between all non-zero components of the concrete model.
552
552
This figure shows that most of the correlation occurs within components, but there is also negative correlation between the height of $f_1(\textnormal{cement})$ and $f_2(\textnormal{slag})$.
553
-
This reflects an ambiguity in the model about which one of these functions is high and the other low.
553
+
%This reflects an ambiguity in the model about which one of these functions is high and the other low.
0 commit comments