Skip to content

Commit 53f80b8

Browse files
committed
More migration md -> mdx
1 parent 860af3e commit 53f80b8

File tree

6 files changed

+65
-69
lines changed

6 files changed

+65
-69
lines changed

.astro/types.d.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -332,13 +332,13 @@ declare module 'astro:content' {
332332
collection: "post";
333333
data: InferEntrySchema<"post">
334334
} & { render(): Render[".md"] };
335-
"laplace.md": {
336-
id: "laplace.md";
335+
"laplace.mdx": {
336+
id: "laplace.mdx";
337337
slug: "laplace";
338338
body: string;
339339
collection: "post";
340340
data: InferEntrySchema<"post">
341-
} & { render(): Render[".md"] };
341+
} & { render(): Render[".mdx"] };
342342
"lda-gibbs.md": {
343343
id: "lda-gibbs.md";
344344
slug: "lda-gibbs";

src/components/layout/Header.astro

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -23,34 +23,34 @@
2323
>Publications
2424
</a>
2525

26-
<button
27-
id='toggleDarkMode'
28-
class='relative rounded-md border border-border p-1.5 transition-all hover:bg-border'
29-
>
30-
<span class='sr-only'>Dark Theme</span>
31-
<svg
32-
xmlns='http://www.w3.org/2000/svg'
33-
width='32'
34-
height='32'
35-
viewBox='0 0 24 24'
36-
class='h-[1.2rem] w-[1.2rem] rotate-0 scale-100 transition-all dark:hidden dark:-rotate-90 dark:scale-0'
37-
><path
38-
fill='currentColor'
39-
d='M12 15q1.25 0 2.125-.875T15 12q0-1.25-.875-2.125T12 9q-1.25 0-2.125.875T9 12q0 1.25.875 2.125T12 15m0 1q-1.671 0-2.836-1.164T8 12q0-1.671 1.164-2.836T12 8q1.671 0 2.836 1.164T16 12q0 1.671-1.164 2.836T12 16m-7-3.5H1.5v-1H5zm17.5 0H19v-1h3.5zM11.5 5V1.5h1V5zm0 17.5V19h1v3.5zM6.746 7.404l-2.16-2.098l.695-.744l2.111 2.134zM18.72 19.438l-2.117-2.14l.652-.702l2.16 2.098zM16.596 6.746l2.098-2.16l.744.695l-2.134 2.111zM4.562 18.72l2.14-2.117l.663.652l-2.078 2.179zM12 12'
40-
></path></svg
41-
>
42-
<svg
43-
xmlns='http://www.w3.org/2000/svg'
44-
width='32'
45-
height='32'
46-
viewBox='0 0 24 24'
47-
class='hidden h-[1.2rem] w-[1.2rem] rotate-90 scale-0 transition-all dark:block dark:rotate-0 dark:scale-100'
48-
><path
49-
fill='currentColor'
50-
d='M12.058 20q-3.334 0-5.667-2.333Q4.058 15.333 4.058 12q0-3.038 1.98-5.27Q8.02 4.5 10.942 4.097q.081 0 .159.006t.153.017q-.506.706-.801 1.57q-.295.865-.295 1.811q0 2.667 1.866 4.533q1.867 1.867 4.534 1.867q.952 0 1.813-.295q.862-.295 1.548-.801q.012.075.018.153q.005.078.005.158q-.384 2.923-2.615 4.904T12.057 20'
51-
></path></svg
52-
>
53-
</button>
26+
<!-- <button -->
27+
<!-- id='toggleDarkMode' -->
28+
<!-- class='relative rounded-md border border-border p-1.5 transition-all hover:bg-border' -->
29+
<!-- > -->
30+
<!-- <span class='sr-only'>Dark Theme</span> -->
31+
<!-- <svg -->
32+
<!-- xmlns='http://www.w3.org/2000/svg' -->
33+
<!-- width='32' -->
34+
<!-- height='32' -->
35+
<!-- viewBox='0 0 24 24' -->
36+
<!-- class='h-[1.2rem] w-[1.2rem] rotate-0 scale-100 transition-all dark:hidden dark:-rotate-90 dark:scale-0' -->
37+
<!-- ><path -->
38+
<!-- fill='currentColor' -->
39+
<!-- d='M12 15q1.25 0 2.125-.875T15 12q0-1.25-.875-2.125T12 9q-1.25 0-2.125.875T9 12q0 1.25.875 2.125T12 15m0 1q-1.671 0-2.836-1.164T8 12q0-1.671 1.164-2.836T12 8q1.671 0 2.836 1.164T16 12q0 1.671-1.164 2.836T12 16m-7-3.5H1.5v-1H5zm17.5 0H19v-1h3.5zM11.5 5V1.5h1V5zm0 17.5V19h1v3.5zM6.746 7.404l-2.16-2.098l.695-.744l2.111 2.134zM18.72 19.438l-2.117-2.14l.652-.702l2.16 2.098zM16.596 6.746l2.098-2.16l.744.695l-2.134 2.111zM4.562 18.72l2.14-2.117l.663.652l-2.078 2.179zM12 12' -->
40+
<!-- ></path></svg -->
41+
<!-- > -->
42+
<!-- <svg -->
43+
<!-- xmlns='http://www.w3.org/2000/svg' -->
44+
<!-- width='32' -->
45+
<!-- height='32' -->
46+
<!-- viewBox='0 0 24 24' -->
47+
<!-- class='hidden h-[1.2rem] w-[1.2rem] rotate-90 scale-0 transition-all dark:block dark:rotate-0 dark:scale-100' -->
48+
<!-- ><path -->
49+
<!-- fill='currentColor' -->
50+
<!-- d='M12.058 20q-3.334 0-5.667-2.333Q4.058 15.333 4.058 12q0-3.038 1.98-5.27Q8.02 4.5 10.942 4.097q.081 0 .159.006t.153.017q-.506.706-.801 1.57q-.295.865-.295 1.811q0 2.667 1.866 4.533q1.867 1.867 4.534 1.867q.952 0 1.813-.295q.862-.295 1.548-.801q.012.075.018.153q.005.078.005.158q-.384 2.923-2.615 4.904T12.057 20' -->
51+
<!-- ></path></svg -->
52+
<!-- > -->
53+
<!-- </button> -->
5454
</div>
5555
</nav>
5656
</header>
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ be the set of the parametric densities $p_\theta(x)$. We can treat $M$ as a smoo
6363

6464
Let us assume that $\I$ is positive-definite everywhere, and each $\I_{ij}$ is smooth. Then we can use it as (the coordinates representation of) a Riemannian metric for $M$. This is because $\I$ is a covariant 2-tensor. (Recall the definition of a Riemannian metric.)
6565

66-
**Proposition 2.** _The component functions $\I\_{ij}$ of $\I$ follows the covariant transformation rule._
66+
**Proposition 2.** _The component functions $\I_{ij}$ of $\I$ follows the covariant transformation rule._
6767

6868
_Proof._ Let $\theta \mapsto \varphi$ be a change of coordinates and let $\ell(\varphi) := \log p_\varphi(x)$. The component function $\I_{ij}(\theta)$ in the "old" coordinates is expressed in terms of the "new" ones, as follows:
6969

@@ -113,13 +113,13 @@ We call this map a **_Markov embedding_**. The name suggests that $f$ embeds $\R
113113

114114
The result of Campbell (1986) characterizes the form of the Riemannian metric in $\R^n_{>0}$ that is invariant under any Markov embedding.
115115

116-
**Lemma 3 (Campbell, 1986).** _Let $g$ be a Riemannian metric on $\R^n\_{>0}$ where $n \geq 2$. Suppose that every Markov embedding on $(\R^n\_{>0}, g)$ is an isometry. Then_
116+
**Lemma 3 (Campbell, 1986).** _Let $g$ be a Riemannian metric on $\R^n_{>0}$ where $n \geq 2$. Suppose that every Markov embedding on $(\R^n_{>0}, g)$ is an isometry. Then_
117117

118118
$$
119119
g_{ij}(x) = A(\abs{x}) + \delta_{ij} \frac{\abs{x} B(\abs{x})}{x^i} ,
120120
$$
121121

122-
_where $\abs{x} = \sum\_{i=1}^n x^i$, $\delta\_{ij}$ is the Kronecker delta, and $A, B \in C^\infty(\R\_{>0})$ satisfying $B > 0$ and $A + B > 0$._
122+
_where $\abs{x} = \sum_{i=1}^n x^i$, $\delta_{ij}$ is the Kronecker delta, and $A, B \in C^\infty(\R_{>0})$ satisfying $B > 0$ and $A + B > 0$._
123123

124124
_Proof._ See Campbell (1986) and Amari (2016, Sec. 3.5).
125125

@@ -133,7 +133,7 @@ The fact that the Fisher information is the unique invariant metric under suffic
133133

134134
Let us, therefore, connect the result in Lemma 3 with the Fisher information on $\Delta^{n-1}$. We give the latter in the following lemma.
135135

136-
**Lemma 4.** _The Fisher information of a Categorical distribution $p\_\theta(z)$ where $z$ takes values in $\Omega = \\{ 1, \dots, n \\}$ and $\theta = \\{ \theta^1, \dots, \theta^n \\} \in \Delta^{n-1}$ is given by_
136+
**Lemma 4.** _The Fisher information of a Categorical distribution $p_\theta(z)$ where $z$ takes values in $\Omega = \\{ 1, \dots, n \\}$ and $\theta = \\{ \theta^1, \dots, \theta^n \\} \in \Delta^{n-1}$ is given by_
137137

138138
$$
139139
\I_{ij}(\theta) = \delta_{ij} \frac{1}{\theta^i} .
@@ -185,7 +185,7 @@ $$
185185

186186
for any $x \in \R^n_{> 0}$. Therefore, this is the form of the invariant metric under sufficient statistics in $\Delta^{n-1} \subset \R^n_{>0}$, i.e. when $n=m$ in the Markov embedding.
187187

188-
Let us therefore restrict $g$ to $\Delta^{n-1}$. For each $\theta \in \Delta^{n-1}$, the tangent space $T_\theta \Delta^{n-1}$ is orthogonal to the line $x^1 = x^2 = \dots = x^n$, which direction is given by the vector $\mathbf{1} = (1, \dots, 1) \in \R^n_{>0}$. This is a vector normal to $\Delta^{n-1}$, implying that any $v \in T_\theta \Delta^{n-1}$ satisfies $\inner{\mathbf{1}, v}\_g = 0$, i.e. $\sum_{i=1}^n v^i = 0$.
188+
Let us therefore restrict $g$ to $\Delta^{n-1}$. For each $\theta \in \Delta^{n-1}$, the tangent space $T_\theta \Delta^{n-1}$ is orthogonal to the line $x^1 = x^2 = \dots = x^n$, which direction is given by the vector $\mathbf{1} = (1, \dots, 1) \in \R^n_{>0}$. This is a vector normal to $\Delta^{n-1}$, implying that any $v \in T_\theta \Delta^{n-1}$ satisfies $\inner{\mathbf{1}, v}_g = 0$, i.e. $\sum_{i=1}^n v^i = 0$.
189189

190190
Moreover, if $\theta \in \Delta^{n-1}$, then $\abs{\theta} = \sum_{i=1}^n \theta^i = 1$ by definition. Thus, $A(1)$ and $B(1)$ are constants. So, if $v, w \in T_\theta \Delta^{n-1}$, we have:
191191

Lines changed: 21 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ description: 'The Laplace approximation (LA) is a simple yet powerful class of m
44
publishDate: 2021-10-27 00:00
55
tags: [bayes]
66
---
7+
import BlogImage from '@/components/BlogImage.astro';
78

89
Let $f: X \times \Theta \to Y$ defined by $(x, \theta) \mapsto f_\theta(x)$ be a neural network, where $X \subseteq \R^n$, $\Theta \subseteq \R^d$, and $Y \subseteq \R^c$ be the input, parameter, and output spaces, respectively.
9-
Given a dataset $\D := \\{ (x_i, y_i) : x_i \in X, y_i \in Y \\}_{i=1}^m$, we define the likelihood $p(\D \mid \theta) := \prod\_{i=1}^m p(y_i \mid f\_\theta(x_i))$.
10+
Given a dataset $\D := \\{ (x_i, y_i) : x_i \in X, y_i \in Y \\}_{i=1}^m$, we define the likelihood $p(\D \mid \theta) := \prod_{i=1}^m p(y_i \mid f_\theta(x_i))$.
1011
Then, given a prior $p(\theta)$, we can obtain the posterior via an application of Bayes' rule: $p(\theta \mid \D) = 1/Z \,\, p(\D \mid \theta) p(\theta)$.
1112
But, the exact computation of $p(\theta \mid \D)$ is intractable in general due to the need of computing the normalization constant
1213

@@ -49,7 +50,7 @@ $$
4950
\end{align*}
5051
$$
5152

52-
For simplicity, let $\varSigma := -\left(\nabla^2_\theta \L\vert\_{\theta\_\map}\right)^{-1}$. Then, using this approximation, we can also obtain an approximation of $Z$:
53+
For simplicity, let $\varSigma := -\left(\nabla^2_\theta \L\vert_{\theta_\map}\right)^{-1}$. Then, using this approximation, we can also obtain an approximation of $Z$:
5354

5455
$$
5556
\begin{align*}
@@ -91,7 +92,7 @@ which in general is less overconfident compared to the MAP-estimate-induced pred
9192
What we have seen is the most general framework of the LA.
9293
One can make a specific design decision, such as by imposing a special structure to the Hessian $\nabla^2_\theta \L$, and thus the covariance $\varSigma$.
9394

94-
## The <span style="font-family: monospace; font-size: 15pt">laplace-torch</span> library
95+
## The laplace-torch library
9596

9697
The simplicity of the LA is not without a drawback.
9798
Recall that the parameter $\theta$ is in $\Theta \subseteq \R^d$.
@@ -101,44 +102,40 @@ Together with the fact that the LA is an old method (and thus not "trendy" in th
101102

102103
Motivated by this observation, in our NeurIPS 2021 paper titled ["Laplace Redux -- Effortless Bayesian Deep Learning"](https://arxiv.org/abs/2106.14806), we showcase that (i) the Hessian can be obtained cheaply, thanks to recent advances in second-order optimization, and (ii) even the simplest LA can be competitive to more sophisticated VB and MCMC methods, while only being much cheaper than them.
103104
Of course, numbers alone are not sufficient to promote the goodness of the LA.
104-
So, in that paper, we also propose an extendible, easy-to-use software library for PyTorch called <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, which is available at <https://github.com/AlexImmer/Laplace>.
105+
So, in that paper, we also propose an extendible, easy-to-use software library for PyTorch called `laplace-torch`, which is available at [this Github repo](https://github.com/AlexImmer/Laplace).
105106

106-
The <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> is a simple library for, essentially, "turning standard NNs into BNNs".
107+
The `laplace-torch` is a simple library for, essentially, "turning standard NNs into BNNs".
107108
The main class of this library is the class `Laplace`, which can be used to transform a standard PyTorch model into a Laplace-approximated BNN.
108109
Here is an example.
109110

110-
```python
111+
```python title="try_laplace.py"
111112
from laplace import Laplace
112113

113114
model = load_pretrained_model()
114-
115115
la = Laplace(model, 'regression')
116116

117117
# Compute the Hessian
118-
119118
la.fit(train_loader)
120119

121120
# Hyperparameter tuning
122-
123121
la.optimize_prior_precision()
124122

125123
# Make prediction
126-
127124
pred_mean, pred_var = la(x_test)
128125
```
129126

130127
The resulting object, `la` is a fully-functioning BNN, yielding the following prediction.
131128
(Notice the identical regression curves---the LA essentially imbues MAP predictions with uncertainty estimates.)
132129

133-
![Regression]({{ site.baseurl }}/img/2021-10-27-laplace/regression_example.png){:width="50%"}
130+
<BlogImage imagePath="/img/laplace/regression_example.png" altText="Laplace for regression." />
134131

135-
Of course, <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> is flexible: the `Laplace` class has almost all state-of-the-art features in Laplace approximations.
136-
Those features, along with the corresponding options in <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, are summarized in the following flowchart.
132+
Of course, `laplace-torch` is flexible: the `Laplace` class has almost all state-of-the-art features in Laplace approximations.
133+
Those features, along with the corresponding options in `laplace-torch`, are summarized in the following flowchart.
137134
(The options `'subnetwork'` for `subset_of_weights` and `'lowrank'` for `hessian_structure` are in the work, by the time this post is first published.)
138135

139-
![Laplace Flowchart]({{ site.baseurl }}/img/2021-10-27-laplace/flowchart.png){:width="100%"}
136+
<BlogImage imagePath="/img/laplace/flowchart.png" altText="Modern arts of Laplace approximations." fullWidth />
140137

141-
The <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> library uses a very cheap yet highly-performant flavor of LA by default, based on [4]:
138+
The `laplace-torch` library uses a very cheap yet highly-performant flavor of LA by default, based on [4]:
142139

143140
```python
144141
def Laplace(model, likelihood, subset_of_weights='last_layer', hessian_structure='kron', ...)
@@ -147,19 +144,19 @@ def Laplace(model, likelihood, subset_of_weights='last_layer', hessian_structure
147144
That is, by default the `Laplace` class will fit a last-layer Laplace with a Kronecker-factored Hessian for approximating the covariance.
148145
Let us see how this default flavor of LA performs compared to the more sophisticated, recent (all-layer) Bayesian baselines in classification.
149146

150-
![Classification]({{ site.baseurl }}/img/2021-10-27-laplace/classification.png){:width="100%"}
147+
<BlogImage imagePath="/img/laplace/classification.png" altText="Laplace for classification." fullWidth />
151148

152149
Here we can see that `Laplace`, with default options, improves the calibration (in terms of expected calibration error (ECE)) of the MAP model.
153150
Moreover, it is guaranteed to preserve the accuracy of the MAP model---something that cannot be said for other baselines.
154-
Ultimately, this improvement is cheap: <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> only incurs little overhead relative to the MAP model---far cheaper than other Bayesian baselines.
151+
Ultimately, this improvement is cheap: `laplace-torch` only incurs little overhead relative to the MAP model---far cheaper than other Bayesian baselines.
155152

156153
## Hyperparameter Tuning
157154

158155
Hyperparameter tuning, especially for the prior variance/precision, is crucial in modern Laplace approximations for BNNs.
159-
<span style="font-family: monospace; font-size: 12pt">laplace-torch</span> provides several options: (i) cross-validation and (ii) marginal-likelihood maximization (MLM, also known as empirical Bayes and type-II maximum likelihood).
156+
`laplace-torch` provides several options: (i) cross-validation and (ii) marginal-likelihood maximization (MLM, also known as empirical Bayes and type-II maximum likelihood).
160157

161158
Cross-validation is simple but needs a validation dataset.
162-
In <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, this can be done via the following.
159+
In `laplace-torch`, this can be done via the following.
163160

164161
```python
165162
la.optimize_prior_precision(method='CV', val_loader=val_loader)
@@ -170,7 +167,7 @@ Recall that by taking the second-order Taylor expansion over the log-posterior,
170167
This object is called the marginal likelihood: it is a probability over the dataset $\D$ and crucially, it is a function of the hyperparameter since the parameter $\theta$ is marginalized out.
171168
Thus, we can find the best values for our hyperparameters by maximizing this function.
172169

173-
In <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, the marginal likelihood can be accessed via
170+
In `laplace-torch`, the marginal likelihood can be accessed via
174171

175172
```python
176173
ml = la.log_marginal_likelihood(prior_precision)
@@ -182,16 +179,16 @@ This function is compatible with PyTorch's autograd, so we can backpropagate thr
182179
ml.backward() # Works!
183180
```
184181

185-
Thus, MLM can easily be done in <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>.
186-
By extension, recent methods such as online MLM [5], can also easily be applied using <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>.
182+
Thus, MLM can easily be done in `laplace-torch`.
183+
By extension, recent methods such as online MLM [5], can also easily be applied using `laplace-torch`.
187184

188185
## Outlooks
189186

190-
The <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> library is continuously developed.
187+
The `laplace-torch` library is continuously developed.
191188
Support for more likelihood functions and priors, subnetwork Laplace, etc. are on the way.
192189

193190
In any case, we hope to see the revival of the LA in the Bayesian deep learning community.
194-
So, please try out our library at <https://github.com/AlexImmer/Laplace>!
191+
So, please try out our library at [https://github.com/AlexImmer/Laplace](https://github.com/AlexImmer/Laplace)!
195192

196193
## References
197194

0 commit comments

Comments
 (0)