Skip to content

Commit 8e9475b

Browse files
committed
more minor corrections
1 parent ca09853 commit 8e9475b

File tree

7 files changed

+13
-10
lines changed

7 files changed

+13
-10
lines changed

additive.pdf

-106 Bytes
Binary file not shown.

additive.tex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ \subsection{Weighting different orders of interaction}
154154
On different datasets, the dominant order of interaction estimated by the additive model varies widely.
155155
In some cases, the variance is concentrated almost entirely onto a single order of interaction.
156156
This may may be a side-effect of using the same lengthscales for all orders of interaction; lengthscales appropriate for low-dimensional regression might not be appropriate for high-dimensional regression.
157-
A re-scaling of lengthscales which preserves relative average distances between datapoints might be expected to improve the model.
157+
%A re-scaling of lengthscales which enforces similar distances between datapoints might improve the model.
158158
%An additive \gp{} with all of its variance coming from the 1st order is equivalent to a sum of one-dimensional functions.
159159
%An additive \gp{} with all its variance coming from the $D$th order is equivalent to a \gp{} with an \seard{} kernel.
160160
%
@@ -209,7 +209,7 @@ \subsection{Efficiently evaluating additive kernels}
209209
\subsubsection{Evaluation of derivatives}
210210

211211
Conveniently, we can use the same trick to efficiently compute the necessary derivatives of the additive kernel with respect to the base kernels.
212-
This can be done by removing the base kernel of interest $k_j$ from each term of the polynomials:
212+
This can be done by removing the base kernel of interest, $k_j$, from each term of the polynomials:
213213
%
214214
\begin{align}
215215
\frac{\partial k_{add_n}}{\partial k_j} =

intro.pdf

473 Bytes
Binary file not shown.

intro.tex

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,12 @@ \chapter{Introduction}
2323
First, keeping all hypotheses that match the data helps to guard against over-fitting.
2424
Second, comparing how well a dataset is fit by different models gives a way of finding which sorts of structure are present in that data. % models having different types of structure is a way to find which sorts of structure are present in a dataset.
2525

26-
%This thesis will be concerned with finding structure in functions.
26+
This thesis focuses on constructing models of functions.
2727
%The types of structure examined in this thesis
28-
One can construct models of functions that have many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
28+
Chapter \ref{ch:kernels} describes how to model functions having many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
29+
Chapters \ref{ch:grammar} and \ref{ch:description} show how such models can be automatically constructed from data, and then automatically described.
30+
Later chapters explore several extensions of these models.
31+
%will describe how to model functions having many different types of structure, such as additivity, symmetry, periodicity, changepoints, or combinations of these, using Gaussian processes (\gp{}s).
2932
%To be able to learn a wide variety of structures, we would like to have an expressive language of models of functions.
3033
%We would like to be able to represent simple kinds of functions, such as linear functions or polynomials.
3134
%We would also like to have models of arbitrarily complex functions, specified in terms of high-level properties such as how smooth they are, whether they repeat over time, or which symmetries they have.
@@ -34,7 +37,7 @@ \chapter{Introduction}
3437
%All of these models of function can be constructed using Gaussian processes (\gp{}s).%, a tractable class of models of functions.
3538
%
3639
%This chapter will introduce the basic properties of \gp{}s.
37-
Chapter \ref{ch:kernels} will describe how to model these different types of structure using \gp{}s.
40+
%Chapter \ref{ch:kernels} will describe how to model these different types of structure using \gp{}s.
3841
This short chapter introduces the basic properties of \gp{}s, and provides an outline of the thesis.
3942
%Chapter \ref{ch:grammar} will show how searching over many
4043

@@ -96,7 +99,7 @@ \section{Gaussian process models}
9699
\subsection{Model selection}
97100

98101
The crucial property of \gp{}s that allows us to automatically construct models is that we can compute the \emph{marginal likelihood} of a dataset given a particular model, also known as the \emph{evidence} \citep{mackay1992bayesian}.
99-
The marginal likelihood allows one to compare models, automatically balancing between the capacity of a model and its fit to the data~\citep{rasmussen2001occam,mackay2003information}.
102+
The marginal likelihood allows one to compare models, balancing between the capacity of a model and its fit to the data~\citep{rasmussen2001occam,mackay2003information}.
100103
%discover the appropriate amount of detail to use, due to Bayesian Occam's razor
101104
%
102105
%Choosing a kernel, or kernel parameters, by maximizing the marginal likelihood will typically result in selecting the \emph{least} flexible model which still captures all the structure in the data.

kernels.pdf

-86 Bytes
Binary file not shown.

kernels.tex

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@ \subsection{Example: An additive model of concrete strength}
429429

430430
\Cref{fig:interpretable functions} shows the marginal posterior distribution of each of the eight one-dimensional functions in the model. %\cref{eq:concrete}.
431431
The parameters controlling the variance of two of the functions, $f_6(\textnormal{coarse})$ and $f_7(\textnormal{fine})$ were set to zero, meaning that the marginal likelihood preferred a parsimonious model which did not depend on these inputs.
432-
This is an example of the automatic sparsity that arises by maximizing marginal likelihood in \gp{} models, and another example of automatic relevance determination (\ARD) \citep{neal1995bayesian}.
432+
This is an example of the automatic sparsity that arises by maximizing marginal likelihood in \gp{} models, and is another example of automatic relevance determination (\ARD) \citep{neal1995bayesian}.
433433

434434
The ability to learn kernel parameters in this way is much more difficult when using non-probabilistic methods such as Support Vector Machines \citep{cortes1995support}, for which cross-validation is often the best method to select kernel parameters.
435435

@@ -535,11 +535,11 @@ \subsubsection{Posterior covariance of additive components}
535535
\end{tabular}
536536
}
537537
\caption[Visualizing posterior correlations between components]
538-
{Posterior correlations between the heights of the one-dimensional functions in \cref{eq:concrete}, whose sum models concrete strength.
538+
{Posterior correlations between the heights of different one-dimensional functions in \cref{eq:concrete}, whose sum models concrete strength.
539539
%Each plot shows the posterior correlations between the height of two functions, evaluated across the range of the data upon which they depend.
540540
%Color indicates the amount of correlation between the function value of the two components.
541541
Red indicates high correlation, teal indicates no correlation, and blue indicates negative correlation.
542-
Plots on the diagonal show posterior correlations between different values of the same function.
542+
Plots on the diagonal show posterior correlations between different evaluations of the same function.
543543
Correlations are evaluated over the same input ranges as in \cref{fig:interpretable functions}.
544544
%Off-diagonal plots show posterior covariance between each pair of functions, as a function of both inputs.
545545
%Negative correlation means that one function is high and the other low, but which one is uncertain.
@@ -550,7 +550,7 @@ \subsubsection{Posterior covariance of additive components}
550550
%
551551
For example, \cref{fig:interpretable interactions} shows the posterior correlation between all non-zero components of the concrete model.
552552
This figure shows that most of the correlation occurs within components, but there is also negative correlation between the height of $f_1(\textnormal{cement})$ and $f_2(\textnormal{slag})$.
553-
This reflects an ambiguity in the model about which one of these functions is high and the other low.
553+
%This reflects an ambiguity in the model about which one of these functions is high and the other low.
554554

555555

556556

thesis.pdf

-1.38 KB
Binary file not shown.

0 commit comments

Comments
 (0)