More migration md -> mdx

wiseodd · wiseodd · commit 53f80b80e77a · 2024-08-02T11:21:19.000-04:00
diff --git a/.astro/types.d.ts b/.astro/types.d.ts
@@ -332,13 +332,13 @@ declare module 'astro:content' {
   collection: "post";
   data: InferEntrySchema<"post">
 } & { render(): Render[".md"] };
-"laplace.md": {
-	id: "laplace.md";
+"laplace.mdx": {
+	id: "laplace.mdx";
   slug: "laplace";
   body: string;
   collection: "post";
   data: InferEntrySchema<"post">
-} & { render(): Render[".md"] };
+} & { render(): Render[".mdx"] };
 "lda-gibbs.md": {
 	id: "lda-gibbs.md";
   slug: "lda-gibbs";
diff --git a/src/components/layout/Header.astro b/src/components/layout/Header.astro
@@ -23,34 +23,34 @@
 				>Publications
 			</a>
 
-			<button
-				id='toggleDarkMode'
-				class='relative rounded-md border border-border p-1.5 transition-all hover:bg-border'
-			>
-				<span class='sr-only'>Dark Theme</span>
-				<svg
-					xmlns='http://www.w3.org/2000/svg'
-					width='32'
-					height='32'
-					viewBox='0 0 24 24'
-					class='h-[1.2rem] w-[1.2rem] rotate-0 scale-100 transition-all dark:hidden dark:-rotate-90 dark:scale-0'
-					><path
-						fill='currentColor'
-						d='M12 15q1.25 0 2.125-.875T15 12q0-1.25-.875-2.125T12 9q-1.25 0-2.125.875T9 12q0 1.25.875 2.125T12 15m0 1q-1.671 0-2.836-1.164T8 12q0-1.671 1.164-2.836T12 8q1.671 0 2.836 1.164T16 12q0 1.671-1.164 2.836T12 16m-7-3.5H1.5v-1H5zm17.5 0H19v-1h3.5zM11.5 5V1.5h1V5zm0 17.5V19h1v3.5zM6.746 7.404l-2.16-2.098l.695-.744l2.111 2.134zM18.72 19.438l-2.117-2.14l.652-.702l2.16 2.098zM16.596 6.746l2.098-2.16l.744.695l-2.134 2.111zM4.562 18.72l2.14-2.117l.663.652l-2.078 2.179zM12 12'
-					></path></svg
-				>
-				<svg
-					xmlns='http://www.w3.org/2000/svg'
-					width='32'
-					height='32'
-					viewBox='0 0 24 24'
-					class='hidden h-[1.2rem] w-[1.2rem] rotate-90 scale-0 transition-all dark:block dark:rotate-0 dark:scale-100'
-					><path
-						fill='currentColor'
-						d='M12.058 20q-3.334 0-5.667-2.333Q4.058 15.333 4.058 12q0-3.038 1.98-5.27Q8.02 4.5 10.942 4.097q.081 0 .159.006t.153.017q-.506.706-.801 1.57q-.295.865-.295 1.811q0 2.667 1.866 4.533q1.867 1.867 4.534 1.867q.952 0 1.813-.295q.862-.295 1.548-.801q.012.075.018.153q.005.078.005.158q-.384 2.923-2.615 4.904T12.057 20'
-					></path></svg
-				>
-			</button>
+			<!-- <button -->
+			<!-- 	id='toggleDarkMode' -->
+			<!-- 	class='relative rounded-md border border-border p-1.5 transition-all hover:bg-border' -->
+			<!-- > -->
+			<!-- 	<span class='sr-only'>Dark Theme</span> -->
+			<!-- 	<svg -->
+			<!-- 		xmlns='http://www.w3.org/2000/svg' -->
+			<!-- 		width='32' -->
+			<!-- 		height='32' -->
+			<!-- 		viewBox='0 0 24 24' -->
+			<!-- 		class='h-[1.2rem] w-[1.2rem] rotate-0 scale-100 transition-all dark:hidden dark:-rotate-90 dark:scale-0' -->
+			<!-- 		><path -->
+			<!-- 			fill='currentColor' -->
+			<!-- 			d='M12 15q1.25 0 2.125-.875T15 12q0-1.25-.875-2.125T12 9q-1.25 0-2.125.875T9 12q0 1.25.875 2.125T12 15m0 1q-1.671 0-2.836-1.164T8 12q0-1.671 1.164-2.836T12 8q1.671 0 2.836 1.164T16 12q0 1.671-1.164 2.836T12 16m-7-3.5H1.5v-1H5zm17.5 0H19v-1h3.5zM11.5 5V1.5h1V5zm0 17.5V19h1v3.5zM6.746 7.404l-2.16-2.098l.695-.744l2.111 2.134zM18.72 19.438l-2.117-2.14l.652-.702l2.16 2.098zM16.596 6.746l2.098-2.16l.744.695l-2.134 2.111zM4.562 18.72l2.14-2.117l.663.652l-2.078 2.179zM12 12' -->
+			<!-- 		></path></svg -->
+			<!-- 	> -->
+			<!-- 	<svg -->
+			<!-- 		xmlns='http://www.w3.org/2000/svg' -->
+			<!-- 		width='32' -->
+			<!-- 		height='32' -->
+			<!-- 		viewBox='0 0 24 24' -->
+			<!-- 		class='hidden h-[1.2rem] w-[1.2rem] rotate-90 scale-0 transition-all dark:block dark:rotate-0 dark:scale-100' -->
+			<!-- 		><path -->
+			<!-- 			fill='currentColor' -->
+			<!-- 			d='M12.058 20q-3.334 0-5.667-2.333Q4.058 15.333 4.058 12q0-3.038 1.98-5.27Q8.02 4.5 10.942 4.097q.081 0 .159.006t.153.017q-.506.706-.801 1.57q-.295.865-.295 1.811q0 2.667 1.866 4.533q1.867 1.867 4.534 1.867q.952 0 1.813-.295q.862-.295 1.548-.801q.012.075.018.153q.005.078.005.158q-.384 2.923-2.615 4.904T12.057 20' -->
+			<!-- 		></path></svg -->
+			<!-- 	> -->
+			<!-- </button> -->
 		</div>
 	</nav>
 </header>
diff --git a/src/content/post/chentsov-theorem.mdx b/src/content/post/chentsov-theorem.mdx
@@ -63,7 +63,7 @@ be the set of the parametric densities $p_\theta(x)$. We can treat $M$ as a smoo
 
 Let us assume that $\I$ is positive-definite everywhere, and each $\I_{ij}$ is smooth. Then we can use it as (the coordinates representation of) a Riemannian metric for $M$. This is because $\I$ is a covariant 2-tensor. (Recall the definition of a Riemannian metric.)
 
-**Proposition 2.** _The component functions $\I\_{ij}$ of $\I$ follows the covariant transformation rule._
+**Proposition 2.** _The component functions $\I_{ij}$ of $\I$ follows the covariant transformation rule._
 
 _Proof._ Let $\theta \mapsto \varphi$ be a change of coordinates and let $\ell(\varphi) := \log p_\varphi(x)$. The component function $\I_{ij}(\theta)$ in the "old" coordinates is expressed in terms of the "new" ones, as follows:
 
@@ -113,13 +113,13 @@ We call this map a **_Markov embedding_**. The name suggests that $f$ embeds $\R
 
 The result of Campbell (1986) characterizes the form of the Riemannian metric in $\R^n_{>0}$ that is invariant under any Markov embedding.
 
-**Lemma 3 (Campbell, 1986).** _Let $g$ be a Riemannian metric on $\R^n\_{>0}$ where $n \geq 2$. Suppose that every Markov embedding on $(\R^n\_{>0}, g)$ is an isometry. Then_
+**Lemma 3 (Campbell, 1986).** _Let $g$ be a Riemannian metric on $\R^n_{>0}$ where $n \geq 2$. Suppose that every Markov embedding on $(\R^n_{>0}, g)$ is an isometry. Then_
 
 $$
     g_{ij}(x) = A(\abs{x}) + \delta_{ij} \frac{\abs{x} B(\abs{x})}{x^i} ,
 $$
 
-_where $\abs{x} = \sum\_{i=1}^n x^i$, $\delta\_{ij}$ is the Kronecker delta, and $A, B \in C^\infty(\R\_{>0})$ satisfying $B > 0$ and $A + B > 0$._
+_where $\abs{x} = \sum_{i=1}^n x^i$, $\delta_{ij}$ is the Kronecker delta, and $A, B \in C^\infty(\R_{>0})$ satisfying $B > 0$ and $A + B > 0$._
 
 _Proof._ See Campbell (1986) and Amari (2016, Sec. 3.5).
 
@@ -133,7 +133,7 @@ The fact that the Fisher information is the unique invariant metric under suffic
 
 Let us, therefore, connect the result in Lemma 3 with the Fisher information on $\Delta^{n-1}$. We give the latter in the following lemma.
 
-**Lemma 4.** _The Fisher information of a Categorical distribution $p\_\theta(z)$ where $z$ takes values in $\Omega = \\{ 1, \dots, n \\}$ and $\theta = \\{ \theta^1, \dots, \theta^n \\} \in \Delta^{n-1}$ is given by_
+**Lemma 4.** _The Fisher information of a Categorical distribution $p_\theta(z)$ where $z$ takes values in $\Omega = \\{ 1, \dots, n \\}$ and $\theta = \\{ \theta^1, \dots, \theta^n \\} \in \Delta^{n-1}$ is given by_
 
 $$
     \I_{ij}(\theta) = \delta_{ij} \frac{1}{\theta^i} .
@@ -185,7 +185,7 @@ $$
 
 for any $x \in \R^n_{> 0}$. Therefore, this is the form of the invariant metric under sufficient statistics in $\Delta^{n-1} \subset \R^n_{>0}$, i.e. when $n=m$ in the Markov embedding.
 
-Let us therefore restrict $g$ to $\Delta^{n-1}$. For each $\theta \in \Delta^{n-1}$, the tangent space $T_\theta \Delta^{n-1}$ is orthogonal to the line $x^1 = x^2 = \dots = x^n$, which direction is given by the vector $\mathbf{1} = (1, \dots, 1) \in \R^n_{>0}$. This is a vector normal to $\Delta^{n-1}$, implying that any $v \in T_\theta \Delta^{n-1}$ satisfies $\inner{\mathbf{1}, v}\_g = 0$, i.e. $\sum_{i=1}^n v^i = 0$.
+Let us therefore restrict $g$ to $\Delta^{n-1}$. For each $\theta \in \Delta^{n-1}$, the tangent space $T_\theta \Delta^{n-1}$ is orthogonal to the line $x^1 = x^2 = \dots = x^n$, which direction is given by the vector $\mathbf{1} = (1, \dots, 1) \in \R^n_{>0}$. This is a vector normal to $\Delta^{n-1}$, implying that any $v \in T_\theta \Delta^{n-1}$ satisfies $\inner{\mathbf{1}, v}_g = 0$, i.e. $\sum_{i=1}^n v^i = 0$.
 
 Moreover, if $\theta \in \Delta^{n-1}$, then $\abs{\theta} = \sum_{i=1}^n \theta^i = 1$ by definition. Thus, $A(1)$ and $B(1)$ are constants. So, if $v, w \in T_\theta \Delta^{n-1}$, we have:
 
diff --git a/src/content/post/laplace.mdx b/src/content/post/laplace.mdx
@@ -4,9 +4,10 @@ description: 'The Laplace approximation (LA) is a simple yet powerful class of m
 publishDate: 2021-10-27 00:00
 tags: [bayes]
 ---
+import BlogImage from '@/components/BlogImage.astro';
 
 Let $f: X \times \Theta \to Y$ defined by $(x, \theta) \mapsto f_\theta(x)$ be a neural network, where $X \subseteq \R^n$, $\Theta \subseteq \R^d$, and $Y \subseteq \R^c$ be the input, parameter, and output spaces, respectively.
-Given a dataset $\D := \\{ (x_i, y_i) : x_i \in X, y_i \in Y \\}_{i=1}^m$, we define the likelihood $p(\D \mid \theta) := \prod\_{i=1}^m p(y_i \mid f\_\theta(x_i))$.
+Given a dataset $\D := \\{ (x_i, y_i) : x_i \in X, y_i \in Y \\}_{i=1}^m$, we define the likelihood $p(\D \mid \theta) := \prod_{i=1}^m p(y_i \mid f_\theta(x_i))$.
 Then, given a prior $p(\theta)$, we can obtain the posterior via an application of Bayes' rule: $p(\theta \mid \D) = 1/Z \,\, p(\D \mid \theta) p(\theta)$.
 But, the exact computation of $p(\theta \mid \D)$ is intractable in general due to the need of computing the normalization constant
 
@@ -49,7 +50,7 @@ $$
 \end{align*}
 $$
 
-For simplicity, let $\varSigma := -\left(\nabla^2_\theta \L\vert\_{\theta\_\map}\right)^{-1}$. Then, using this approximation, we can also obtain an approximation of $Z$:
+For simplicity, let $\varSigma := -\left(\nabla^2_\theta \L\vert_{\theta_\map}\right)^{-1}$. Then, using this approximation, we can also obtain an approximation of $Z$:
 
 $$
 \begin{align*}
@@ -91,7 +92,7 @@ which in general is less overconfident compared to the MAP-estimate-induced pred
 What we have seen is the most general framework of the LA.
 One can make a specific design decision, such as by imposing a special structure to the Hessian $\nabla^2_\theta \L$, and thus the covariance $\varSigma$.
 
-## The <span style="font-family: monospace; font-size: 15pt">laplace-torch</span> library
+## The laplace-torch library
 
 The simplicity of the LA is not without a drawback.
 Recall that the parameter $\theta$ is in $\Theta \subseteq \R^d$.
@@ -101,44 +102,40 @@ Together with the fact that the LA is an old method (and thus not "trendy" in th
 
 Motivated by this observation, in our NeurIPS 2021 paper titled ["Laplace Redux -- Effortless Bayesian Deep Learning"](https://arxiv.org/abs/2106.14806), we showcase that (i) the Hessian can be obtained cheaply, thanks to recent advances in second-order optimization, and (ii) even the simplest LA can be competitive to more sophisticated VB and MCMC methods, while only being much cheaper than them.
 Of course, numbers alone are not sufficient to promote the goodness of the LA.
-So, in that paper, we also propose an extendible, easy-to-use software library for PyTorch called <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, which is available at <https://github.com/AlexImmer/Laplace>.
+So, in that paper, we also propose an extendible, easy-to-use software library for PyTorch called `laplace-torch`, which is available at [this Github repo](https://github.com/AlexImmer/Laplace).
 
-The <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> is a simple library for, essentially, "turning standard NNs into BNNs".
+The `laplace-torch` is a simple library for, essentially, "turning standard NNs into BNNs".
 The main class of this library is the class `Laplace`, which can be used to transform a standard PyTorch model into a Laplace-approximated BNN.
 Here is an example.
 
-```python
+```python title="try_laplace.py"
 from laplace import Laplace
 
 model = load_pretrained_model()
-
 la = Laplace(model, 'regression')
 
 # Compute the Hessian
-
 la.fit(train_loader)
 
 # Hyperparameter tuning
-
 la.optimize_prior_precision()
 
 # Make prediction
-
 pred_mean, pred_var = la(x_test)
 ```
 
 The resulting object, `la` is a fully-functioning BNN, yielding the following prediction.
 (Notice the identical regression curves---the LA essentially imbues MAP predictions with uncertainty estimates.)
 
-![Regression]({{ site.baseurl }}/img/2021-10-27-laplace/regression_example.png){:width="50%"}
+<BlogImage imagePath="/img/laplace/regression_example.png" altText="Laplace for regression." />
 
-Of course, <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> is flexible: the `Laplace` class has almost all state-of-the-art features in Laplace approximations.
-Those features, along with the corresponding options in <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, are summarized in the following flowchart.
+Of course, `laplace-torch` is flexible: the `Laplace` class has almost all state-of-the-art features in Laplace approximations.
+Those features, along with the corresponding options in `laplace-torch`, are summarized in the following flowchart.
 (The options `'subnetwork'` for `subset_of_weights` and `'lowrank'` for `hessian_structure` are in the work, by the time this post is first published.)
 
-![Laplace Flowchart]({{ site.baseurl }}/img/2021-10-27-laplace/flowchart.png){:width="100%"}
+<BlogImage imagePath="/img/laplace/flowchart.png" altText="Modern arts of Laplace approximations." fullWidth />
 
-The <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> library uses a very cheap yet highly-performant flavor of LA by default, based on [4]:
+The `laplace-torch` library uses a very cheap yet highly-performant flavor of LA by default, based on [4]:
 
 ```python
 def Laplace(model, likelihood, subset_of_weights='last_layer', hessian_structure='kron', ...)
@@ -147,19 +144,19 @@ def Laplace(model, likelihood, subset_of_weights='last_layer', hessian_structure
 That is, by default the `Laplace` class will fit a last-layer Laplace with a Kronecker-factored Hessian for approximating the covariance.
 Let us see how this default flavor of LA performs compared to the more sophisticated, recent (all-layer) Bayesian baselines in classification.
 
-![Classification]({{ site.baseurl }}/img/2021-10-27-laplace/classification.png){:width="100%"}
+<BlogImage imagePath="/img/laplace/classification.png" altText="Laplace for classification." fullWidth />
 
 Here we can see that `Laplace`, with default options, improves the calibration (in terms of expected calibration error (ECE)) of the MAP model.
 Moreover, it is guaranteed to preserve the accuracy of the MAP model---something that cannot be said for other baselines.
-Ultimately, this improvement is cheap: <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> only incurs little overhead relative to the MAP model---far cheaper than other Bayesian baselines.
+Ultimately, this improvement is cheap: `laplace-torch` only incurs little overhead relative to the MAP model---far cheaper than other Bayesian baselines.
 
 ## Hyperparameter Tuning
 
 Hyperparameter tuning, especially for the prior variance/precision, is crucial in modern Laplace approximations for BNNs.
-<span style="font-family: monospace; font-size: 12pt">laplace-torch</span> provides several options: (i) cross-validation and (ii) marginal-likelihood maximization (MLM, also known as empirical Bayes and type-II maximum likelihood).
+`laplace-torch` provides several options: (i) cross-validation and (ii) marginal-likelihood maximization (MLM, also known as empirical Bayes and type-II maximum likelihood).
 
 Cross-validation is simple but needs a validation dataset.
-In <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, this can be done via the following.
+In `laplace-torch`, this can be done via the following.
 
 ```python
 la.optimize_prior_precision(method='CV', val_loader=val_loader)
@@ -170,7 +167,7 @@ Recall that by taking the second-order Taylor expansion over the log-posterior,
 This object is called the marginal likelihood: it is a probability over the dataset $\D$ and crucially, it is a function of the hyperparameter since the parameter $\theta$ is marginalized out.
 Thus, we can find the best values for our hyperparameters by maximizing this function.
 
-In <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>, the marginal likelihood can be accessed via
+In `laplace-torch`, the marginal likelihood can be accessed via
 
 ```python
 ml = la.log_marginal_likelihood(prior_precision)
@@ -182,16 +179,16 @@ This function is compatible with PyTorch's autograd, so we can backpropagate thr
 ml.backward()  # Works!
 ```
 
-Thus, MLM can easily be done in <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>.
-By extension, recent methods such as online MLM [5], can also easily be applied using <span style="font-family: monospace; font-size: 12pt">laplace-torch</span>.
+Thus, MLM can easily be done in `laplace-torch`.
+By extension, recent methods such as online MLM [5], can also easily be applied using `laplace-torch`.
 
 ## Outlooks
 
-The <span style="font-family: monospace; font-size: 12pt">laplace-torch</span> library is continuously developed.
+The `laplace-torch` library is continuously developed.
 Support for more likelihood functions and priors, subnetwork Laplace, etc. are on the way.
 
 In any case, we hope to see the revival of the LA in the Bayesian deep learning community.
-So, please try out our library at <https://github.com/AlexImmer/Laplace>!
+So, please try out our library at [https://github.com/AlexImmer/Laplace](https://github.com/AlexImmer/Laplace)!
 
 ## References
 
diff --git a/src/content/post/manifold-gaussians.mdx b/src/content/post/manifold-gaussians.mdx
diff --git a/src/site.config.ts b/src/site.config.ts