From 218ec651380bc889f1466ede68c72ebb13cf58d6 Mon Sep 17 00:00:00 2001
From: armenk <armen0@gmail.com>
Date: Thu, 30 Jan 2025 21:13:48 -0500
Subject: [PATCH] compiled existing markdown to HTML; markdown had corrected
 typos but HTML was never recompiled

---
 docs/autoregressive/index.html |  77 ++--
 docs/autoregressive/index.tex  |   8 +-
 docs/css/tufte.css             | 795 ++++++++++++++++++++-------------
 docs/css/tufte.css.map         |   1 +
 docs/flow/index.html           |  61 ++-
 docs/gan/index.html            |  52 +--
 docs/index.html                |   4 +-
 docs/introduction/index.html   |  60 +--
 docs/vae/index.html            |  90 ++--
 9 files changed, 664 insertions(+), 484 deletions(-)
 create mode 100644 docs/css/tufte.css.map
diff --git a/docs/autoregressive/index.html b/docs/autoregressive/index.html
index 80afa71..56af57c 100644
--- a/docs/autoregressive/index.html
+++ b/docs/autoregressive/index.html
@@ -77,19 +77,18 @@ <h1>Autoregressive models</h1>
 </script>
 
 
-<p>We begin our study into generative modeling with autoregressive models. As before, we assume we are given access to a dataset <script type="math/tex">\mathcal{D}</script> of <script type="math/tex">n</script>-dimensional datapoints <script type="math/tex">\mathbf{x}</script>. For simplicity, we assume the datapoints are binary, i.e., <script type="math/tex">\mathbf{x} \in \{0,1\}^n</script>.</p>
+<p>We begin our study into generative modeling with autoregressive models. As before, we assume we are given access to a dataset \(\mathcal{D}\) of \(n\)-dimensional datapoints \(\mathbf{x}\). For simplicity, we assume the datapoints are binary, i.e., \(\mathbf{x} \in \{0,1\}^n\).</p>
 
 <h1 id="representation">Representation</h1>
 
-<p>By the chain rule of probability, we can factorize the joint distribution over the <script type="math/tex">n</script>-dimensions as</p>
+<p>By the chain rule of probability, we can factorize the joint distribution over the \(n\)-dimensions as</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 p(\mathbf{x}) = \prod\limits_{i=1}^{n}p(x_i \vert x_1, x_2, \ldots, x_{i-1}) = 
 \prod\limits_{i=1}^{n} p(x_i \vert \mathbf{x}_{< i } )
 </script></div>
 
-<p>where <script type="math/tex">% <![CDATA[
-\mathbf{x}_{< i}=[x_1, x_2, \ldots, x_{i-1}] %]]></script> denotes the vector of random variables with index less than <script type="math/tex">i</script>.</p>
+<p>where \(\mathbf{x}_{&lt; i}=[x_1, x_2, \ldots, x_{i-1}]\) denotes the vector of random variables with index less than \(i\).</p>
 
 <p>The chain rule factorization can be expressed graphically as a Bayesian network.</p>
 
@@ -101,24 +100,21 @@ <h1 id="representation">Representation</h1>
 </figure>
 
 <p>Such a Bayesian network that makes no conditional independence assumptions is said to obey the <em>autoregressive</em> property.
-The term <em>autoregressive</em> originates from the literature on time-series models where observations from the previous time-steps are used to predict the value at the current time step. Here, we fix an ordering of the variables <script type="math/tex">x_1, x_2, \ldots, x_n</script> and the distribution for the <script type="math/tex">i</script>-th random variable depends on the values of all the preceeding random variables in the chosen ordering <script type="math/tex">x_1, x_2, \ldots, x_{i-1}</script>.</p>
+The term <em>autoregressive</em> originates from the literature on time-series models where observations from the previous time-steps are used to predict the value at the current time step. Here, we fix an ordering of the variables \(x_1, x_2, \ldots, x_n\) and the distribution for the \(i\)-th random variable depends on the values of all the preceding random variables in the chosen ordering \(x_1, x_2, \ldots, x_{i-1}\).</p>
 
-<p>If we allow for every conditional <script type="math/tex">% <![CDATA[
-p(x_i \vert \mathbf{x}_{< i}) %]]></script> to be specified in a tabular form, then such a representation is fully general and can represent any possible distribution over <script type="math/tex">n</script> random variables. However, the space complexity for such a representation grows exponentially with <script type="math/tex">n</script>.</p>
+<p>If we allow for every conditional \(p(x_i \vert \mathbf{x}_{&lt; i})\) to be specified in a tabular form, then such a representation is fully general and can represent any possible distribution over \(n\) random variables. However, the space complexity for such a representation grows exponentially with \(n\).</p>
 
-<p>To see why, let us consider the conditional for the last dimension, given by <script type="math/tex">% <![CDATA[
-p(x_n \vert \mathbf{x}_{< n}) %]]></script>. In order to fully specify this conditional, we need to specify a probability for <script type="math/tex">2^{n-1}</script> configurations of the variables <script type="math/tex">x_1, x_2, \ldots, x_{n-1}</script>. Since the probabilities should sum to 1, the total number of parameters for specifying this conditional is given by <script type="math/tex">2^{n-1} -1</script>. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.</p>
+<p>To see why, let us consider the conditional for the last dimension, given by \(p(x_n \vert \mathbf{x}_{&lt; n})\). In order to fully specify this conditional, we need to specify a probability distribution for each of the \(2^{n-1}\) configurations of the variables \(x_1, x_2, \ldots, x_{n-1}\). For any one of the \(2^{n-1}\) possible configurations of the variables, the probabilities should sum to one. Therefore, we need only one parameter for each configuration, so the total number of parameters for specifying this conditional is given by \(2^{n-1}\). Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.</p>
 
-<p>In an <em>autoregressive generative model</em>, the conditionals are specified as parameterized functions with a fixed number of parameters. That is, we assume the conditional distributions <script type="math/tex">% <![CDATA[
-p(x_i \vert \mathbf{x}_{< i}) %]]></script> to correspond to a Bernoulli random variable and learn a function that maps the preceeding random variables <script type="math/tex">x_1, x_2, \ldots, x_{i-1}</script> to the
+<p>In an <em>autoregressive generative model</em>, the conditionals are specified as parameterized functions with a fixed number of parameters. That is, we assume the conditional distributions \(p(x_i \vert \mathbf{x}_{&lt; i})\) to correspond to a Bernoulli random variable and learn a function that maps the preceding random variables \(x_1, x_2, \ldots, x_{i-1}\) to the
 mean of this distribution. Hence, we have</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 p_{\theta_i}(x_i \vert \mathbf{x}_{< i}) = \mathrm{Bern}(f_i(x_1, x_2, \ldots, x_{i-1}))
 </script></div>
-<p>where <script type="math/tex">\theta_i</script> denotes the set of parameters used to specify the mean
-function <script type="math/tex">f_i: \{0,1\}^{i-1}\rightarrow [0,1]</script>.</p>
+<p>where \(\theta_i\) denotes the set of parameters used to specify the mean
+function \(f_i: \{0,1\}^{i-1}\rightarrow [0,1]\).</p>
 
-<p>The number of parameters of an autoregressive generative model are given by <script type="math/tex">\sum_{i=1}^n \vert \theta_i \vert</script>. As we shall see in the examples below, the number of parameters are much fewer than the tabular setting considered previously. Unlike the tabular setting however, an autoregressive generative model cannot represent all possible distributions. Its expressiveness is limited by the fact that we are limiting the conditional distributions to correspond to a Bernoulli random variable with the mean specified via a restricted class of parameterized functions.</p>
+<p>The number of parameters of an autoregressive generative model are given by \(\sum_{i=1}^n \vert \theta_i \vert\). As we shall see in the examples below, the number of parameters are much fewer than the tabular setting considered previously. Unlike the tabular setting however, an autoregressive generative model cannot represent all possible distributions. Its expressiveness is limited by the fact that we are limiting the conditional distributions to correspond to a Bernoulli random variable with the mean specified via a restricted class of parameterized functions.</p>
 
 <figure>
 <img src="fvsbn.png" alt="drawing" width="200" class="center" />
@@ -132,17 +128,17 @@ <h1 id="representation">Representation</h1>
 f_i(x_1, x_2, \ldots, x_{i-1}) =\sigma(\alpha^{(i)}_0 + \alpha^{(i)}_1 x_1 + \ldots + \alpha^{(i)}_{i-1} x_{i-1})
 </script></div>
 
-<p>where <script type="math/tex">\sigma</script> denotes the sigmoid function and <script type="math/tex">\theta_i=\{\alpha^{(i)}_0,\alpha^{(i)}_1, \ldots, \alpha^{(i)}_{i-1}\}</script> denote the parameters of the mean function. The conditional for variable <script type="math/tex">i</script> requires <script type="math/tex">i</script> parameters, and hence the total number of parameters in the model is given by <script type="math/tex">\sum_{i=1}^ni= O(n^2)</script>. Note that the number of parameters are much fewer than the exponential complexity of the tabular case.</p>
+<p>where \(\sigma\) denotes the sigmoid function and \(\theta_i=\{\alpha^{(i)}_0,\alpha^{(i)}_1, \ldots, \alpha^{(i)}_{i-1}\}\) denote the parameters of the mean function. The conditional for variable \(i\) requires \(i\) parameters, and hence the total number of parameters in the model is given by \(\sum_{i=1}^ni= O(n^2)\). Note that the number of parameters are much fewer than the exponential complexity of the tabular case.</p>
 
-<p>A natural way to increase the expressiveness of an autoregressive generative model is to use more flexible parameterizations for the mean function e.g., multi-layer perceptrons (MLP). For example, consider the case of a neural network with 1 hidden layer. The mean function for variable <script type="math/tex">i</script> can be expressed as</p>
+<p>A natural way to increase the expressiveness of an autoregressive generative model is to use more flexible parameterizations for the mean function e.g., multi-layer perceptrons (MLP). For example, consider the case of a neural network with 1 hidden layer. The mean function for variable \(i\) can be expressed as</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \mathbf{h}_i = \sigma(A_i \mathbf{x_{< i}} + \mathbf{c}_i)\\
 f_i(x_1, x_2, \ldots, x_{i-1}) =\sigma(\boldsymbol{\alpha}^{(i)}\mathbf{h}_i +b_i )
 </script></div>
 
-<p>where <script type="math/tex">\mathbf{h}_i \in \mathbb{R}^d</script> denotes the hidden layer activations for the MLP and <script type="math/tex">\theta_i = \{A_i \in \mathbb{R}^{d\times (i-1)},  \mathbf{c}_i \in \mathbb{R}^d, \boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d, b_i \in \mathbb{R}\}</script>
-are the set of parameters for the mean function <script type="math/tex">\mu_i(\cdot)</script>. The total number of parameters in this model is dominated by the matrices <script type="math/tex">A_i</script> and given by <script type="math/tex">O(n^2 d)</script>.</p>
+<p>where \(\mathbf{h}_i \in \mathbb{R}^d\) denotes the hidden layer activations for the MLP and \(\theta_i = \{A_i \in \mathbb{R}^{d\times (i-1)},  \mathbf{c}_i \in \mathbb{R}^d, \boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d, b_i \in \mathbb{R}\}\)
+are the set of parameters for the mean function \(\mu_i(\cdot)\). The total number of parameters in this model is dominated by the matrices \(A_i\) and given by \(O(n^2 d)\).</p>
 
 <figure>
 <img src="nade.png" alt="drawing" width="200" class="center" />
@@ -157,26 +153,26 @@ <h1 id="representation">Representation</h1>
 \mathbf{h}_i = \sigma(W_{., < i} \mathbf{x_{< i}} + \mathbf{c})\\
 f_i(x_1, x_2, \ldots, x_{i-1}) =\sigma(\boldsymbol{\alpha}^{(i)}\mathbf{h}_i +b_i )
 </script></div>
-<p>where <script type="math/tex">\theta=\{W\in \mathbb{R}^{d\times n}, \mathbf{c} \in \mathbb{R}^d, \{\boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d\}^n_{i=1}, \{b_i \in \mathbb{R}\}^n_{i=1}\}</script>is
-the full set of parameters for the mean functions <script type="math/tex">f_1(\cdot), f_2(\cdot), \ldots, f_n(\cdot)</script>. The weight matrix <script type="math/tex">W</script> and the bias vector <script type="math/tex">\mathbf{c}</script> are shared across the conditionals. Sharing parameters offers two benefits:</p>
+<p>where \(\theta=\{W\in \mathbb{R}^{d\times n}, \mathbf{c} \in \mathbb{R}^d, \{\boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d\}^n_{i=1}, \{b_i \in \mathbb{R}\}^n_{i=1}\}\)is
+the full set of parameters for the mean functions \(f_1(\cdot), f_2(\cdot), \ldots, f_n(\cdot)\). The weight matrix \(W\) and the bias vector \(\mathbf{c}\) are shared across the conditionals. Sharing parameters offers two benefits:</p>
 
 <ol>
   <li>
-    <p>The total number of parameters gets reduced from <script type="math/tex">O(n^2 d)</script> to <script type="math/tex">O(nd)</script> [readers are encouraged to check!].</p>
+    <p>The total number of parameters gets reduced from \(O(n^2 d)\) to \(O(nd)\) [readers are encouraged to check!].</p>
   </li>
   <li>
-    <p>The hidden unit activations can be evaluated in <script type="math/tex">O(nd)</script> time via the following recursive strategy:</p>
+    <p>The hidden unit activations can be evaluated in \(O(nd)\) time via the following recursive strategy:</p>
     <div class="mathblock"><script type="math/tex; mode=display">
 \mathbf{h}_i = \sigma(\mathbf{a}_i)\\
 \mathbf{a}_{i+1} = \mathbf{a}_{i} + W[., i]x_i
 </script></div>
-    <p>with the base case given by <script type="math/tex">\mathbf{a}_1=\mathbf{c}</script>.</p>
+    <p>with the base case given by \(\mathbf{a}_1=\mathbf{c}\).</p>
   </li>
 </ol>
 
 <h3 id="extensions-to-nade">Extensions to NADE</h3>
 
-<p>The <a href="https://arxiv.org/abs/1306.0186">RNADE</a> algorithm extends NADE to learn generative models over real-valued data. Here, the conditionals are modeled via a continuous distribution such as a equi-weighted mixture of <script type="math/tex">K</script> Gaussians. Instead of learning a mean function, we know learn the means <script type="math/tex">\mu_{i,1}, \mu_{i,2},\ldots, \mu_{i,K}</script> and variances <script type="math/tex">\Sigma_{i,1}, \Sigma_{i,2},\ldots, \Sigma_{i,K}</script> of the <script type="math/tex">K</script> Gaussians for every conditional. For statistical and computational efficiency, a single function <script type="math/tex">g_i: \mathbb{R}^{i-1}\rightarrow\mathbb{R}^{2K}</script> outputs all the means and variances of the <script type="math/tex">K</script> Gaussians for the <script type="math/tex">i</script>-th conditional distribution.</p>
+<p>The <a href="https://arxiv.org/abs/1306.0186">RNADE</a> algorithm extends NADE to learn generative models over real-valued data. Here, the conditionals are modeled via a continuous distribution such as a equi-weighted mixture of \(K\) Gaussians. Instead of learning a mean function, we now learn the means \(\mu_{i,1}, \mu_{i,2},\ldots, \mu_{i,K}\) and variances \(\Sigma_{i,1}, \Sigma_{i,2},\ldots, \Sigma_{i,K}\) of the \(K\) Gaussians for every conditional. For statistical and computational efficiency, a single function \(g_i: \mathbb{R}^{i-1}\rightarrow\mathbb{R}^{2K}\) outputs all the means and variances of the \(K\) Gaussians for the \(i\)-th conditional distribution.</p>
 
 <p>Notice that NADE requires specifying a single, fixed ordering of the variables. The choice of ordering can lead to different models. The <a href="https://arxiv.org/abs/1310.1757">EoNADE</a> algorithm allows training an ensemble of NADE models with different orderings.</p>
 
@@ -189,33 +185,32 @@ <h1 id="learning-and-inference">Learning and inference</h1>
 (p_{\mathrm{data}}, p_{\theta}) = \mathbb{E}_{\mathbf{x} \sim p_{\mathrm{data}} }\left[\log p_{\mathrm{data}}(\mathbf{x}) - \log p_{\theta}(\mathbf{x})\right]
 </script></div>
 
-<p>Before moving any further, we make two comments about the KL divergence. First, we note that the KL divergence between any two distributions is asymmetric. As we navigate through this chapter, the reader is encouraged to think what could go wrong if we decided to optimize the reverse KL divergence instead. Secondly, the KL divergences heavily penalizes any model distribution <script type="math/tex">p_\theta</script> which assigns low probability to a datapoint that is likely to be sampled under <script type="math/tex">p_{\mathrm{data}}</script>. In the extreme case, if the density <script type="math/tex">p_\theta(\mathbf{x})</script> evaluates to zero for a datapoint sampled from <script type="math/tex">p_{\mathrm{data}}</script>, the objective evaluates to <script type="math/tex">+\infty</script>.</p>
+<p>Before moving any further, we make two comments about the KL divergence. First, we note that the KL divergence between any two distributions is asymmetric. As we navigate through this chapter, the reader is encouraged to think what could go wrong if we decided to optimize the reverse KL divergence instead. Secondly, the KL divergences heavily penalizes any model distribution \(p_\theta\) which assigns low probability to a datapoint that is likely to be sampled under \(p_{\mathrm{data}}\). In the extreme case, if the density \(p_\theta(\mathbf{x})\) evaluates to zero for a datapoint sampled from \(p_{\mathrm{data}}\), the objective evaluates to \(+\infty\).</p>
 
-<p>Since <script type="math/tex">p_{\mathrm{data}}</script> does not depend on <script type="math/tex">\theta</script>, we can equivalently recover the optimal parameters via maximizing likelihood estimation.</p>
+<p>Since \(p_{\mathrm{data}}\) does not depend on \(\theta\), we can equivalently recover the optimal parameters via maximizing likelihood estimation.</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \max_{\theta\in \mathcal{M}}\mathbb{E}_{\mathbf{x} \sim p_{\mathrm{data}} }\left[\log p_{\theta}(\mathbf{x})\right].
 </script></div>
 
-<p>Here, <script type="math/tex">\log p_{\theta}(\mathbf{x})</script> is referred to as the log-likelihood of the datapoint <script type="math/tex">\mathbf{x}</script> with respect to the model distribution <script type="math/tex">p_\theta</script>.</p>
+<p>Here, \(\log p_{\theta}(\mathbf{x})\) is referred to as the log-likelihood of the datapoint \(\mathbf{x}\) with respect to the model distribution \(p_\theta\).</p>
 
-<p>To approximate the expectation over the unknown <script type="math/tex">p_{\mathrm{data}}</script>, we make an assumption: points in the dataset <script type="math/tex">\mathcal{D}</script> are sampled i.i.d. from <script type="math/tex">p_{\mathrm{data}}</script>. This allows us to obtain an unbiased Monte Carlo estimate of the objective as</p>
+<p>To approximate the expectation over the unknown \(p_{\mathrm{data}}\), we make an assumption: points in the dataset \(\mathcal{D}\) are sampled i.i.d. from \(p_{\mathrm{data}}\). This allows us to obtain an unbiased Monte Carlo estimate of the objective as</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \max_{\theta\in \mathcal{M}}\frac{1}{\vert D \vert} \sum_{\mathbf{x} \in\mathcal{D} }\log p_{\theta}(\mathbf{x}) = \mathcal{L}(\theta \vert \mathcal{D}).
  </script></div>
 
-<p>The maximum likelihood estimation (MLE) objective has an intuitive interpretation: pick the model parameters <script type="math/tex">\theta \in \mathcal{M}</script> that maximize the log-probability of the observed datapoints in <script type="math/tex">\mathcal{D}</script>.</p>
+<p>The maximum likelihood estimation (MLE) objective has an intuitive interpretation: pick the model parameters \(\theta \in \mathcal{M}\) that maximize the log-probability of the observed datapoints in \(\mathcal{D}\).</p>
 
-<p>In practice, we optimize the MLE objective using mini-batch gradient ascent. The algorithm operates in iterations. At every iteration <script type="math/tex">t</script>, we sample a mini-batch <script type="math/tex">\mathcal{B}_t</script> of datapoints sampled randomly from the dataset (<script type="math/tex">% <![CDATA[
-\vert \mathcal{B}_t\vert < \vert \mathcal{D} \vert %]]></script>) and compute gradients of the objective evaluated for the mini-batch. These parameters at iteration <script type="math/tex">t+1</script> are then given via the following update rule</p>
+<p>In practice, we optimize the MLE objective using mini-batch gradient ascent. The algorithm operates in iterations. At every iteration \(t\), we sample a mini-batch \(\mathcal{B}_t\) of datapoints sampled randomly from the dataset (\(\vert \mathcal{B}_t\vert &lt; \vert \mathcal{D} \vert\)) and compute gradients of the objective evaluated for the mini-batch. These parameters at iteration \(t+1\) are then given via the following update rule</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \theta^{(t+1)} = \theta^{(t)} + r_t \nabla_\theta\mathcal{L}(\theta^{(t)} \vert \mathcal{B}_t)
 </script></div>
 
-<p>where <script type="math/tex">\theta^{(t+1)}</script> and <script type="math/tex">\theta^{(t)}</script> are the parameters at iterations <script type="math/tex">t+1</script> and <script type="math/tex">t</script> respectively, and <script type="math/tex">r_t</script> is the learning rate at iteration <script type="math/tex">t</script>. Typically, we only specify the initial learning rate <script type="math/tex">r_1</script> and update the rate based on a schedule. <a href="http://cs231n.github.io/optimization-1/">Variants</a> of stochastic gradient ascent, such as RMS prop and Adam, employ modified update rules that work slightly better in practice.</p>
+<p>where \(\theta^{(t+1)}\) and \(\theta^{(t)}\) are the parameters at iterations \(t+1\) and \(t\) respectively, and \(r_t\) is the learning rate at iteration \(t\). Typically, we only specify the initial learning rate \(r_1\) and update the rate based on a schedule. <a href="http://cs231n.github.io/optimization-1/">Variants</a> of stochastic gradient ascent, such as RMS prop and Adam, employ modified update rules that work slightly better in practice.</p>
 
-<p>From a practical standpoint, we must think about how to choose hyperaparameters (such as the initial learning rate) and a stopping criteria for the gradient descent. For both these questions, we follow the standard practice in machine learning of monitoring the objective on a validation dataset. Consequently, we choose the hyperparameters with the best performance on the validation dataset and stop updating the parameters when the validation log-likelihoods cease to improve<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>.</p>
+<p>From a practical standpoint, we must think about how to choose hyperparameters (such as the initial learning rate) and a stopping criteria for the gradient descent. For both these questions, we follow the standard practice in machine learning of monitoring the objective on a validation dataset. Consequently, we choose the hyperparameters with the best performance on the validation dataset and stop updating the parameters when the validation log-likelihoods cease to improve<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>
 
 <p>Now that we have a well-defined objective and optimization procedure, the only remaining task is to evaluate the objective in the context of an autoregressive generative model. To this end, we substitute the factorized joint distribution of an autoregressive model in the MLE objective to get</p>
 
@@ -223,14 +218,12 @@ <h1 id="learning-and-inference">Learning and inference</h1>
 \max_{\theta \in \mathcal{M}}\frac{1}{\vert D \vert} \sum_{\mathbf{x} \in\mathcal{D} }\sum_{i=1}^n\log p_{\theta_i}(x_i \vert \mathbf{x}_{< i})
 </script></div>
 
-<p>where <script type="math/tex">\theta = \{\theta_1, \theta_2, \ldots, \theta_n\}</script> now denotes the
+<p>where \(\theta = \{\theta_1, \theta_2, \ldots, \theta_n\}\) now denotes the
 collective set of parameters for the conditionals.</p>
 
-<p>Inference in an autoregressive model is straightforward. For density estimation of an arbitrary point <script type="math/tex">\mathbf{x}</script>, we simply evaluate the log-conditionals <script type="math/tex">% <![CDATA[
-\log p_{\theta_i}(x_i \vert \mathbf{x}_{< i}) %]]></script> for each <script type="math/tex">i</script> and add these up to obtain the log-likelihood assigned by the model to <script type="math/tex">\mathbf{x}</script>. Since we know conditioning vector <script type="math/tex">\mathbf{x}</script>, each of the conditionals can be evaluated in parallel. Hence, density estimation is efficient on modern hardware.</p>
+<p>Inference in an autoregressive model is straightforward. For density estimation of an arbitrary point \(\mathbf{x}\), we simply evaluate the log-conditionals \(\log p_{\theta_i}(x_i \vert \mathbf{x}_{&lt; i})\) for each \(i\) and add these up to obtain the log-likelihood assigned by the model to \(\mathbf{x}\). Since we know conditioning vector \(\mathbf{x}\), each of the conditionals can be evaluated in parallel. Hence, density estimation is efficient on modern hardware.</p>
 
-<p>Sampling from an autoregressive model is a sequential procedure. Here, we first sample <script type="math/tex">x_1</script>, then we sample <script type="math/tex">x_2</script> conditioned on the sampled <script type="math/tex">x_1</script>, followed by <script type="math/tex">x_3</script> conditioned on both <script type="math/tex">x_1</script> and <script type="math/tex">x_2</script> and so on until we sample <script type="math/tex">x_n</script> conditioned on the previously sampled <script type="math/tex">% <![CDATA[
-\mathbf{x}_{< n} %]]></script>. For applications requiring real-time generation of high-dimensional data such as audio synthesis, the sequential sampling can be an expensive process. Later in this course, we will discuss how parallel Wavenet, an autoregressive model sidesteps this expensive sampling process.</p>
+<p>Sampling from an autoregressive model is a sequential procedure. Here, we first sample \(x_1\), then we sample \(x_2\) conditioned on the sampled \(x_1\), followed by \(x_3\) conditioned on both \(x_1\) and \(x_2\) and so on until we sample \(x_n\) conditioned on the previously sampled \(\mathbf{x}_{&lt; n}\). For applications requiring real-time generation of high-dimensional data such as audio synthesis, the sequential sampling can be an expensive process. Later in this course, we will discuss how parallel WaveNet, an autoregressive model sidesteps this expensive sampling process.</p>
 
 <!-- TODO: add NADE samples figure -->
 
@@ -244,10 +237,10 @@ <h1 id="learning-and-inference">Learning and inference</h1>
 
 <h1 id="footnotes">Footnotes</h1>
 
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
   <ol>
     <li id="fn:1">
-      <p>Given the non-convex nature of such problems, the optimization procedure can get stuck in local optima. Hence, early stopping will generally not be optimal but is a very practical strategy. <a href="#fnref:1" class="reversefootnote">&#8617;</a></p>
+      <p>Given the non-convex nature of such problems, the optimization procedure can get stuck in local optima. Hence, early stopping will generally not be optimal but is a very practical strategy. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
   </ol>
 </div>
@@ -283,8 +276,8 @@ <h1 id="footnotes">Footnotes</h1>
   <!--      -->
   <!-- </ul> -->
 <div class="credits">
-<!-- <span>&#38;copy; 2018 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
-<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2018</span> 
+<!-- <span>&#38;copy; 2025 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
+<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2025</span> 
 </div>  
 </footer>
 
diff --git a/docs/autoregressive/index.tex b/docs/autoregressive/index.tex
index bc157b2..4972ba5 100644
--- a/docs/autoregressive/index.tex
+++ b/docs/autoregressive/index.tex
@@ -13,7 +13,7 @@ \section{Representation}
 \]
 where $\mathbf{x}_{<i}=[x_1, x_2, \ldots, x_{i-1}]$ denotes the vector of random variables with index less than $i$. If we allow for every conditional $p(x_i \vert \mathbf{x}_{<i})$ to be specified in a tabular form, then such a representation is fully general and can represent any possible distribution over $n$ random variables. However, the space complexity for such a representation grows exponentially with $n$. 
 
-To see why, let us consider the conditional for the last dimension, given by $p(x_n \vert \mathbf{x}_{<n})$. In order to fully specify this conditional, we need to specify a probability for $2^{n-1}$ configurations of the variables $x_1, x_2, \ldots, x_{n-1}$.  Since the probabilities should sum to 1, the total number of parameters for specifying this conditional is given by $2^{n-1} -1$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution in (\ref{eq:chain_rule}) . 
+To see why, let us consider the conditional for the last dimension, given by $p(x_n \vert \mathbf{x}_{<n})$. In order to fully specify this conditional, we need to specify a probability for $2^{n-1}$ configurations of the variables $x_1, x_2, \ldots, x_{n-1}$.  Since $x_n$ is a binary variable we need to specify one parameter per configuration $x_1, x_2, \ldots, x_{n-1}$ : namely $p(x_n = 1 | x_1, x_2, \ldots, x_{n-1})$. In total, the number of parameters needed to specify this conditional is given by $2^{n-1}$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution in (\ref{eq:chain_rule}) . 
 
 In an \textit{autoregressive generative model}, the conditionals are specified as parameterized functions with a fixed number of parameters. That is, we assume the conditional distributions $p(x_i \vert \mathbf{x}_{<i})$ to correspond to a Bernoulli random variable and learn a function that maps the preceeding random variables $x_1, x_2, \ldots, x_{i-1}$ to the mean of this distribution. Hence, we have:
 \[
@@ -35,7 +35,7 @@ \section{Representation}
 \mathbf{h}_i = \sigma(A_i \mathbf{x_{<i}} + \mathbf{c}_i)\\
 f_i(x_1, x_2, \ldots, x_{i-1}) =\sigma(\boldsymbol{\alpha}^{(i)}\mathbf{h}_i +b_i )  
 \]
-where $\mathbf{h}_i \in \mathbb{R}^d$ denotes the hidden layer activations for the MLP and$\theta_i = \{A_i \in \mathbb{R}^{d\times (i-1)},  \mathbf{c}_i \in \mathbb{R}^d, \boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d, b_i \in \mathbb{R}\}$ are the set of parameters for the mean function $\mu_i(\cdot)$.  The total number of parameters in this model is dominated by the matrices $A_i$ and given by $O(n^2 d)$. 
+where $\mathbf{h}_i \in \mathbb{R}^d$ denotes the hidden layer activations for the MLP and$\theta_i = \{A_i \in \mathbb{R}^{d\times (i-1)},  \mathbf{c}_i \in \mathbb{R}^d, \boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d, b_i \in \mathbb{R}\}$ are the set of parameters for the mean function $f_i(\cdot)$.  The total number of parameters in this model is dominated by the matrices $A_i$ and given by $O(n^2 d)$. 
 
 The Neural Autoregressive Density Estimation (NADE) provides an efficient MLP parameterization that shares parameters used for evaluating the hidden layer activations.
 \[
@@ -56,7 +56,7 @@ \section{Representation}
 
 \section{Learning and inference}
 
-Recall that learning a generative model involves optimizing the closeness between the data and model distributions. One commonly used notion of closeness in the KL divergence between the data and the model distributions.
+Recall that learning a generative model involves optimizing the closeness between the data and model distributions. One commonly used notion of closeness is the KL divergence between the data and the model distributions.
 
 $$
 \begin{align*}
@@ -111,4 +111,4 @@ \section{Learning and inference}
 
 
 TODO: Autoregressive generative models based on Autoencoders, RNNs, and CNNs.
-MADE, Char-RNN, Pixel-CNN, Wavenet
\ No newline at end of file
+MADE, Char-RNN, Pixel-CNN, Wavenet
diff --git a/docs/css/tufte.css b/docs/css/tufte.css
index a048db4..43414bf 100644
--- a/docs/css/tufte.css
+++ b/docs/css/tufte.css
@@ -11,209 +11,289 @@
   src: url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot");
   src: url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff") format("woff"), url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.svg#etbookromanosf") format("svg");
   font-weight: normal;
-  font-style: normal; }
+  font-style: normal;
+}
 @font-face {
   font-family: "et-book";
   src: url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot");
   src: url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.woff") format("woff"), url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.svg#etbookromanosf") format("svg");
   font-weight: normal;
-  font-style: italic; }
+  font-style: italic;
+}
 @font-face {
   font-family: "et-book";
   src: url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot");
   src: url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.woff") format("woff"), url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.svg#etbookromanosf") format("svg");
   font-weight: bold;
-  font-style: normal; }
+  font-style: normal;
+}
 @font-face {
   font-family: "et-book-roman-old-style";
   src: url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot");
   src: url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot?#iefix") format("embedded-opentype"), url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.woff") format("woff"), url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.ttf") format("truetype"), url("../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.svg#etbookromanosf") format("svg");
   font-weight: normal;
-  font-style: normal; }
+  font-style: normal;
+}
 /* This file contains all the constants for colors and font styles */
 /**
  * Syntax highlighting styles
  */
 .highlight {
-  margin-bottom: 15px; }
+  margin-bottom: 15px;
+}
 
 .highlight {
-  background: #fffff8; }
-  .highlight .c {
-    color: #998;
-    font-style: italic; }
-  .highlight .err {
-    color: #a61717;
-    background-color: #e3d2d2; }
-  .highlight .k {
-    font-weight: bold; }
-  .highlight .o {
-    font-weight: bold; }
-  .highlight .cm {
-    color: #998;
-    font-style: italic; }
-  .highlight .cp {
-    color: #999;
-    font-weight: bold; }
-  .highlight .c1 {
-    color: #998;
-    font-style: italic; }
-  .highlight .cs {
-    color: #999;
-    font-weight: bold;
-    font-style: italic; }
-  .highlight .gd {
-    color: #000;
-    background-color: #fdd; }
-  .highlight .gd .x {
-    color: #000;
-    background-color: #faa; }
-  .highlight .ge {
-    font-style: italic; }
-  .highlight .gr {
-    color: #a00; }
-  .highlight .gh {
-    color: #999; }
-  .highlight .gi {
-    color: #000;
-    background-color: #dfd; }
-  .highlight .gi .x {
-    color: #000;
-    background-color: #afa; }
-  .highlight .go {
-    color: #888; }
-  .highlight .gp {
-    color: #555; }
-  .highlight .gs {
-    font-weight: bold; }
-  .highlight .gu {
-    color: #aaa; }
-  .highlight .gt {
-    color: #a00; }
-  .highlight .kc {
-    font-weight: bold; }
-  .highlight .kd {
-    font-weight: bold; }
-  .highlight .kp {
-    font-weight: bold; }
-  .highlight .kr {
-    font-weight: bold; }
-  .highlight .kt {
-    color: #458;
-    font-weight: bold; }
-  .highlight .m {
-    color: #099; }
-  .highlight .s {
-    color: #d14; }
-  .highlight .na {
-    color: #008080; }
-  .highlight .nb {
-    color: #0086B3; }
-  .highlight .nc {
-    color: #458;
-    font-weight: bold; }
-  .highlight .no {
-    color: #008080; }
-  .highlight .ni {
-    color: #800080; }
-  .highlight .ne {
-    color: #900;
-    font-weight: bold; }
-  .highlight .nf {
-    color: #900;
-    font-weight: bold; }
-  .highlight .nn {
-    color: #555; }
-  .highlight .nt {
-    color: #000080; }
-  .highlight .nv {
-    color: #008080; }
-  .highlight .ow {
-    font-weight: bold; }
-  .highlight .w {
-    color: #bbb; }
-  .highlight .mf {
-    color: #099; }
-  .highlight .mh {
-    color: #099; }
-  .highlight .mi {
-    color: #099; }
-  .highlight .mo {
-    color: #099; }
-  .highlight .sb {
-    color: #d14; }
-  .highlight .sc {
-    color: #d14; }
-  .highlight .sd {
-    color: #d14; }
-  .highlight .s2 {
-    color: #d14; }
-  .highlight .se {
-    color: #d14; }
-  .highlight .sh {
-    color: #d14; }
-  .highlight .si {
-    color: #d14; }
-  .highlight .sx {
-    color: #d14; }
-  .highlight .sr {
-    color: #009926; }
-  .highlight .s1 {
-    color: #d14; }
-  .highlight .ss {
-    color: #990073; }
-  .highlight .bp {
-    color: #999; }
-  .highlight .vc {
-    color: #008080; }
-  .highlight .vg {
-    color: #008080; }
-  .highlight .vi {
-    color: #008080; }
-  .highlight .il {
-    color: #099; }
+  background: #fffff8;
+}
+.highlight .c {
+  color: #998;
+  font-style: italic;
+}
+.highlight .err {
+  color: #a61717;
+  background-color: #e3d2d2;
+}
+.highlight .k {
+  font-weight: bold;
+}
+.highlight .o {
+  font-weight: bold;
+}
+.highlight .cm {
+  color: #998;
+  font-style: italic;
+}
+.highlight .cp {
+  color: #999;
+  font-weight: bold;
+}
+.highlight .c1 {
+  color: #998;
+  font-style: italic;
+}
+.highlight .cs {
+  color: #999;
+  font-weight: bold;
+  font-style: italic;
+}
+.highlight .gd {
+  color: #000;
+  background-color: #fdd;
+}
+.highlight .gd .x {
+  color: #000;
+  background-color: #faa;
+}
+.highlight .ge {
+  font-style: italic;
+}
+.highlight .gr {
+  color: #a00;
+}
+.highlight .gh {
+  color: #999;
+}
+.highlight .gi {
+  color: #000;
+  background-color: #dfd;
+}
+.highlight .gi .x {
+  color: #000;
+  background-color: #afa;
+}
+.highlight .go {
+  color: #888;
+}
+.highlight .gp {
+  color: #555;
+}
+.highlight .gs {
+  font-weight: bold;
+}
+.highlight .gu {
+  color: #aaa;
+}
+.highlight .gt {
+  color: #a00;
+}
+.highlight .kc {
+  font-weight: bold;
+}
+.highlight .kd {
+  font-weight: bold;
+}
+.highlight .kp {
+  font-weight: bold;
+}
+.highlight .kr {
+  font-weight: bold;
+}
+.highlight .kt {
+  color: #458;
+  font-weight: bold;
+}
+.highlight .m {
+  color: #099;
+}
+.highlight .s {
+  color: #d14;
+}
+.highlight .na {
+  color: #008080;
+}
+.highlight .nb {
+  color: #0086B3;
+}
+.highlight .nc {
+  color: #458;
+  font-weight: bold;
+}
+.highlight .no {
+  color: #008080;
+}
+.highlight .ni {
+  color: #800080;
+}
+.highlight .ne {
+  color: #900;
+  font-weight: bold;
+}
+.highlight .nf {
+  color: #900;
+  font-weight: bold;
+}
+.highlight .nn {
+  color: #555;
+}
+.highlight .nt {
+  color: #000080;
+}
+.highlight .nv {
+  color: #008080;
+}
+.highlight .ow {
+  font-weight: bold;
+}
+.highlight .w {
+  color: #bbb;
+}
+.highlight .mf {
+  color: #099;
+}
+.highlight .mh {
+  color: #099;
+}
+.highlight .mi {
+  color: #099;
+}
+.highlight .mo {
+  color: #099;
+}
+.highlight .sb {
+  color: #d14;
+}
+.highlight .sc {
+  color: #d14;
+}
+.highlight .sd {
+  color: #d14;
+}
+.highlight .s2 {
+  color: #d14;
+}
+.highlight .se {
+  color: #d14;
+}
+.highlight .sh {
+  color: #d14;
+}
+.highlight .si {
+  color: #d14;
+}
+.highlight .sx {
+  color: #d14;
+}
+.highlight .sr {
+  color: #009926;
+}
+.highlight .s1 {
+  color: #d14;
+}
+.highlight .ss {
+  color: #990073;
+}
+.highlight .bp {
+  color: #999;
+}
+.highlight .vc {
+  color: #008080;
+}
+.highlight .vg {
+  color: #008080;
+}
+.highlight .vi {
+  color: #008080;
+}
+.highlight .il {
+  color: #099;
+}
 
 * {
   margin: 0;
-  padding: 0; }
+  padding: 0;
+}
 
 /* clearfix hack after Cederholm (group class name) */
 .group:after {
   content: "";
   display: table;
-  clear: both; }
+  clear: both;
+}
 
 html, body {
-  height: 100%; }
+  height: 100%;
+}
 
 html {
   text-align: baseline;
   font-size: 11px;
   -webkit-font-smoothing: antialiased;
-  -moz-osx-font-smoothing: grayscale; }
+  -moz-osx-font-smoothing: grayscale;
+}
 
 @media screen and (min-width: 800px) {
   html {
-    font-size: 12px; } }
+    font-size: 12px;
+  }
+}
 @media screen and (min-width: 900px) {
   html {
-    font-size: 13px; } }
+    font-size: 13px;
+  }
+}
 @media screen and (min-width: 1000px) {
   html {
-    font-size: 14px; } }
+    font-size: 14px;
+  }
+}
 @media screen and (min-width: 1100px) {
   html {
-    font-size: 15px; } }
+    font-size: 15px;
+  }
+}
 .mathblock {
-  font-size: 1.5rem; }
+  font-size: 1.5rem;
+}
 
 a {
   color: #a00000;
-  text-decoration: none; }
+  text-decoration: none;
+}
 
 /* Links: replicate underline that clears descenders */
 p > a:link, p > a:visited {
-  color: inherit; }
+  color: inherit;
+}
 
 p > a:link {
   text-decoration: none;
@@ -224,18 +304,23 @@ p > a:link {
   background-size: 0.05em 1px, 0.05em 1px, 1px 1px;
   background-repeat: no-repeat, no-repeat, repeat-x;
   text-shadow: 0.03em 0 #fffff8, -0.03em 0 #fffff8, 0 0.03em #fffff8, 0 -0.03em #fffff8, 0.06em 0 #fffff8, -0.06em 0 #fffff8, 0.09em 0 #fffff8, -0.09em 0 #fffff8, 0.12em 0 #fffff8, -0.12em 0 #fffff8, 0.15em 0 #fffff8, -0.15em 0 #fffff8;
-  background-position: 0% 93%, 100% 93%, 0% 93%; }
+  background-position: 0% 93%, 100% 93%, 0% 93%;
+}
 
 @media screen and (-webkit-min-device-pixel-ratio: 0) {
   p > a:link {
-    background-position-y: 87%, 87%, 87%; } }
+    background-position-y: 87%, 87%, 87%;
+  }
+}
 p > a:link::selection {
   text-shadow: 0.03em 0 #b4d5fe, -0.03em 0 #b4d5fe, 0 0.03em #b4d5fe, 0 -0.03em #b4d5fe, 0.06em 0 #b4d5fe, -0.06em 0 #b4d5fe, 0.09em 0 #b4d5fe, -0.09em 0 #b4d5fe, 0.12em 0 #b4d5fe, -0.12em 0 #b4d5fe, 0.15em 0 #b4d5fe, -0.15em 0 #b4d5fe;
-  background: #b4d5fe; }
+  background: #b4d5fe;
+}
 
 p > a:link::-moz-selection {
   text-shadow: 0.03em 0 #b4d5fe, -0.03em 0 #b4d5fe, 0 0.03em #b4d5fe, 0 -0.03em #b4d5fe, 0.06em 0 #b4d5fe, -0.06em 0 #b4d5fe, 0.09em 0 #b4d5fe, -0.09em 0 #b4d5fe, 0.12em 0 #b4d5fe, -0.12em 0 #b4d5fe, 0.15em 0 #b4d5fe, -0.15em 0 #b4d5fe;
-  background: #b4d5fe; }
+  background: #b4d5fe;
+}
 
 body {
   width: 87.5%;
@@ -246,14 +331,16 @@ body {
   background-color: #fffff8;
   color: #111;
   max-width: 1400px;
-  counter-reset: sidenote-counter; }
+  counter-reset: sidenote-counter;
+}
 
 h1 {
   font-weight: 400;
   margin-top: 1.568rem;
   margin-bottom: 1.568rem;
   font-size: 2.5rem;
-  line-height: 0.784; }
+  line-height: 0.784;
+}
 
 h2 {
   font-style: italic;
@@ -261,7 +348,8 @@ h2 {
   margin-top: 4rem;
   margin-bottom: 1rem;
   font-size: 2.2rem;
-  line-height: 1; }
+  line-height: 1;
+}
 
 h3 {
   font-style: italic;
@@ -269,7 +357,8 @@ h3 {
   font-size: 1.7rem;
   margin-top: 2rem;
   margin-bottom: 0;
-  line-height: 1; }
+  line-height: 1;
+}
 
 p.subtitle {
   font-style: italic;
@@ -277,17 +366,20 @@ p.subtitle {
   margin-bottom: 1rem;
   font-size: 1.8rem;
   display: block;
-  line-height: 1; }
+  line-height: 1;
+}
 
 p, ol, ul {
-  font-size: 1.4rem; }
+  font-size: 1.4rem;
+}
 
 p {
   line-height: 2rem;
   margin-top: 1.4rem;
   margin-bottom: 1.4rem;
   padding-right: 0;
-  vertical-align: baseline; }
+  vertical-align: baseline;
+}
 
 blockquote p {
   font-size: 1.1rem;
@@ -296,31 +388,38 @@ blockquote p {
   margin-bottom: 1.78181818rem;
   width: 45%;
   padding-left: 2.5%;
-  padding-right: 2.5%; }
+  padding-right: 2.5%;
+}
 
 blockquote {
-  font-size: 1.4rem; }
+  font-size: 1.4rem;
+}
 
 blockquote p {
-  width: 50%; }
+  width: 50%;
+}
 
 blockquote footer {
   width: 50%;
   font-size: 1.1rem;
-  text-align: right; }
+  text-align: right;
+}
 
 .sans {
   font-family: "Gill Sans", "Gill Sans MT", "Lato", Calibri, sans-serif;
-  letter-spacing: .03em; }
+  letter-spacing: 0.03em;
+}
 
 pre, pre code, p code, p pre code {
   font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
   font-size: 1.2rem;
   line-height: 1.71428571;
-  margin-top: 1.71428571rem; }
+  margin-top: 1.71428571rem;
+}
 
 h1 code, h2 code, h3 code {
-  font-size: 0.80em; }
+  font-size: 0.8em;
+}
 
 /*-- Table styling section - For Tufte-Jekyll, booktabs style is default for Markdown tables  --*/
 table, table.booktabs {
@@ -328,32 +427,37 @@ table, table.booktabs {
   margin: 0 auto;
   border-spacing: 0px;
   border-top: 2px solid #333333;
-  border-bottom: 2px solid #333333; }
+  border-bottom: 2px solid #333333;
+}
 
 .booktabs th, th {
   border-bottom: 1px solid #333333;
   padding: 0.65ex 0.5em 0.4ex 0.5em;
   font-weight: normal;
-  text-align: center; }
+  text-align: center;
+}
 
 th, td {
   font-size: 1.2rem;
-  line-height: 1.71428571; }
+  line-height: 1.71428571;
+}
 
 .booktabs th.cmid, th {
-  border-bottom: 1px solid #737373; }
+  border-bottom: 1px solid rgb(114.75, 114.75, 114.75);
+}
 
 .booktabs th.nocmid {
-  border-bottom: none; }
+  border-bottom: none;
+}
 
 .booktabs tbody tr:first-child td, tr:first-child td {
-  padding-top: 0.65ex; }
-
-/* add space between thead row and tbody */
+  padding-top: 0.65ex;
+} /* add space between thead row and tbody */
 .booktabs td, td {
   padding-left: 0.5em;
   padding-right: 0.5em;
-  text-align: left; }
+  text-align: left;
+}
 
 .booktabs caption, caption {
   font-size: 90%;
@@ -362,16 +466,20 @@ th, td {
   margin-left: auto;
   margin-right: auto;
   margin-top: 1ex;
-  caption-side: top; }
+  caption-side: top;
+}
 
 .booktabs tbody tr td.l {
-  text-align: left !important; }
+  text-align: left !important;
+}
 
 .booktabs tbody tr td.c {
-  text-align: center !important; }
+  text-align: center !important;
+}
 
 .booktabs tbody tr td.r {
-  text-align: right !important; }
+  text-align: right !important;
+}
 
 .table-caption {
   float: right;
@@ -380,41 +488,49 @@ th, td {
   width: 50%;
   margin-top: 0;
   margin-bottom: 0;
-  font-size: 1.0rem;
-  line-height: 1.96; }
+  font-size: 1rem;
+  line-height: 1.96;
+}
 
 /* -- End of Table Styling section --*/
 /* Basic Layout stuff --*/
 article {
   position: relative;
-  padding: 5rem 0rem; }
+  padding: 5rem 0rem;
+}
 
 section {
   padding-top: 1rem;
-  padding-bottom: 1rem; }
+  padding-bottom: 1rem;
+}
 
 p, ol, ul {
-  font-size: 1.4rem; }
+  font-size: 1.4rem;
+}
 
 ul {
   width: 45%;
   -webkit-padding-start: 5%;
   -webkit-padding-end: 5%;
-  list-style-type: none; }
+  list-style-type: none;
+}
 
 ol {
   -webkit-padding-start: 5%;
   -webkit-padding-end: 5%;
-  list-style-type: decimal; }
+  list-style-type: decimal;
+}
 
 ul li {
-  padding: 0.5em 0; }
+  padding: 0.5em 0;
+}
 
 figure, figure img.maincolumn {
   max-width: 55%;
   -webkit-margin-start: 0;
   -webkit-margin-end: 0;
-  margin-bottom: 3em; }
+  margin-bottom: 3em;
+}
 
 figcaption {
   float: right;
@@ -422,19 +538,22 @@ figcaption {
   margin-right: -48%;
   margin-top: 0;
   margin-bottom: 0;
-  font-size: 1.0rem;
+  font-size: 1rem;
   line-height: 1.6;
   vertical-align: baseline;
   position: relative;
-  max-width: 40%; }
+  max-width: 40%;
+}
 
 figure.fullwidth figcaption {
   float: left;
   margin-right: 0%;
-  margin-left: 36%; }
+  margin-left: 36%;
+}
 
 img {
-  max-width: 100%; }
+  max-width: 100%;
+}
 
 .sidenote, .marginnote {
   float: right;
@@ -443,101 +562,120 @@ img {
   width: 50%;
   margin-top: 0;
   margin-bottom: 1.96rem;
-  font-size: 1.0rem;
+  font-size: 1rem;
   line-height: 1.96;
   vertical-align: baseline;
-  position: relative; }
+  position: relative;
+}
 
 li .sidenote, li .marginnote {
-  margin-right: -80%; }
+  margin-right: -80%;
+}
 
 blockquote .sidenote, blockquote .marginnote {
-  margin-right: -79%; }
+  margin-right: -79%;
+}
 
 .sidenote-number {
-  counter-increment: sidenote-counter; }
+  counter-increment: sidenote-counter;
+}
 
 .sidenote-number:after, .sidenote:before {
   content: counter(sidenote-counter) " ";
   font-family: et-book-roman-old-style;
   color: #a00000;
   position: relative;
-  vertical-align: baseline; }
+  vertical-align: baseline;
+}
 
 .sidenote-number:after {
   content: counter(sidenote-counter);
   font-size: 1rem;
   top: -0.5rem;
-  left: 0.1rem; }
+  left: 0.1rem;
+}
 
 .sidenote:before {
   content: counter(sidenote-counter) ". ";
   color: #a00000;
-  top: 0rem; }
+  top: 0rem;
+}
 
 p, footer, div.table-wrapper, div.mathblock {
-  width: 55%; }
+  width: 55%;
+}
 
 div.table-wrapper {
-  overflow-x: auto; }
+  overflow-x: auto;
+}
 
 @media screen and (max-width: 760px) {
   p, footer, div.mathblock {
-    width: 90%; }
-
+    width: 90%;
+  }
   pre code {
-    width: 87.5%; }
-
+    width: 87.5%;
+  }
   ul {
-    width: 85%; }
-
+    width: 85%;
+  }
   figure {
-    max-width: 90%; }
-
+    max-width: 90%;
+  }
   figcaption, figure.fullwidth figcaption {
     margin-right: 0%;
-    max-width: none; }
-
+    max-width: none;
+  }
   blockquote p, blockquote footer {
     width: 80%;
     padding-left: 5%;
-    padding-right: 5%; } }
+    padding-right: 5%;
+  }
+}
 .marginnote code, .sidenote code {
-  font-size: 1rem; }
+  font-size: 1rem;
+}
 
 pre, pre code, p pre code {
   width: 52.5%;
   padding-left: 2.5%;
-  overflow-x: auto; }
+  overflow-x: auto;
+}
 
 .fullwidth, li.listing div {
-  max-width: 90%; }
+  max-width: 90%;
+}
 
 .full-width .sidenote, .full-width .sidenote-number, .full-width .marginnote {
-  display: none; }
+  display: none;
+}
 
 span.newthought {
   font-variant: small-caps;
   font-size: 1.2em;
-  letter-spacing: 0.05rem; }
+  letter-spacing: 0.05rem;
+}
 
 input.margin-toggle {
-  display: none; }
+  display: none;
+}
 
 label.sidenote-number {
-  display: inline; }
+  display: inline;
+}
 
 label.margin-toggle:not(.sidenote-number) {
-  display: none; }
+  display: none;
+}
 
 @media (max-width: 760px) {
   label.margin-toggle:not(.sidenote-number) {
     display: inline;
-    color: #a00000; }
-
+    color: #a00000;
+  }
   .sidenote, .marginnote {
-    display: none; }
-
+    display: none;
+  }
   .margin-toggle:checked + .sidenote,
   .margin-toggle:checked + .marginnote {
     display: block;
@@ -547,15 +685,15 @@ label.margin-toggle:not(.sidenote-number) {
     width: 95%;
     margin: 1rem 2.5%;
     vertical-align: baseline;
-    position: relative; }
-
+    position: relative;
+  }
   label {
-    cursor: pointer; }
-
+    cursor: pointer;
+  }
   pre, pre code, p code, p pre code {
     width: 90%;
-    padding: 0; }
-
+    padding: 0;
+  }
   .table-caption {
     display: block;
     float: right;
@@ -566,33 +704,40 @@ label.margin-toggle:not(.sidenote-number) {
     margin-left: 1%;
     margin-right: 1%;
     vertical-align: baseline;
-    position: relative; }
-
+    position: relative;
+  }
   div.table-wrapper, table, table.booktabs {
-    width: 85%; }
-
+    width: 85%;
+  }
   div.table-wrapper {
-    border-right: 1px solid #efefef; }
-
+    border-right: 1px solid #efefef;
+  }
   img {
-    max-width: 100%; } }
+    max-width: 100%;
+  }
+}
 /*--- End of Basic Layout stuff from tufte.css ---*/
 /* -- Jekyll specific styling --*/
 .contrast {
-  color: #a00000; }
+  color: #a00000;
+}
 
 .smaller {
-  font-size: 80%; }
+  font-size: 80%;
+}
 
 header > nav.group, body footer {
   width: 95%;
-  padding-top: 2rem; }
+  padding-top: 2rem;
+}
 
 nav.group a.active:before {
-  content: "\0003c\000a0"; }
+  content: "< ";
+}
 
 nav.group a.active:after {
-  content: " >"; }
+  content: " >";
+}
 
 header > nav a {
   font-size: 1.2rem;
@@ -607,65 +752,81 @@ header > nav a {
   margin-top: 0;
   margin-bottom: 0;
   padding-right: 2rem;
-  vertical-align: baseline; }
+  vertical-align: baseline;
+}
 
 header > nav a img {
   height: 5rem;
   position: relative;
   max-width: 100%;
-  top: -1.5rem; }
+  top: -1.5rem;
+}
 
 ul.footer-links, .credits {
   list-style: none;
   text-align: center;
-  margin: 0 auto; }
+  margin: 0 auto;
+}
 
 ul.footer-links li {
   display: inline;
-  padding: 0.5rem 0.25rem; }
+  padding: 0.5rem 0.25rem;
+}
 
 .credits {
-  padding: 1rem 0rem; }
+  padding: 1rem 0rem;
+}
 
 .credits {
-  font-family: "Gill Sans", "Gill Sans MT", "Lato", Calibri, sans-serif; }
-  .credits a {
-    color: #a00000; }
+  font-family: "Gill Sans", "Gill Sans MT", "Lato", Calibri, sans-serif;
+}
+.credits a {
+  color: #a00000;
+}
 
 body.full-width, .content-listing, ul.content-listing li.listing {
   width: 90%;
   margin-left: auto;
   margin-right: auto;
-  padding: 0% 5%; }
+  padding: 0% 5%;
+}
 
 .full-width article p {
-  width: 90%; }
+  width: 90%;
+}
 
 h1.content-listing-header {
   font-style: normal;
   text-transform: uppercase;
   letter-spacing: 0.2rem;
-  font-size: 1.8rem; }
+  font-size: 1.8rem;
+}
 
 li.listing hr {
-  width: 100%; }
+  width: 100%;
+}
 
 .listing, .listing h3 {
   display: inline-block;
-  margin: 0; }
+  margin: 0;
+}
 
 li.listing {
-  margin: 0; }
-  li.listing p {
-    width: 100%; }
+  margin: 0;
+}
+li.listing p {
+  width: 100%;
+}
 
 li.listing:last-of-type {
   border-bottom: none;
-  margin-bottom: 1.4rem; }
+  margin-bottom: 1.4rem;
+}
 
 li.listing h3.new {
   text-transform: uppercase;
-  font-style: normal; }
+  font-style: normal;
+}
 
 hr.slender {
   border: 0;
@@ -675,11 +836,14 @@ hr.slender {
   background-image: -webkit-linear-gradient(left, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
   background-image: -moz-linear-gradient(left, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
   background-image: -ms-linear-gradient(left, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
-  background-image: -o-linear-gradient(left, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0)); }
+  background-image: -o-linear-gradient(left, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
+}
 
 @media screen {
   .print-footer {
-    display: none; } }
+    display: none;
+  }
+}
 @media print {
   *,
   *:before,
@@ -687,52 +851,56 @@ hr.slender {
     background: transparent !important;
     color: #000 !important;
     box-shadow: none !important;
-    text-shadow: none !important; }
-
+    text-shadow: none !important;
+  }
   @page {
     margin: 0.75in 0.5in 0.75in 0.5in;
     orphans: 4;
-    widows: 2; }
+    widows: 2;
+  }
   body {
-    font-size: 12pt; }
-
+    font-size: 12pt;
+  }
   html body span.print-footer {
     font-family: "Gill Sans", "Gill Sans MT", "Lato", Calibri, sans-serif;
     font-size: 9pt;
     margin-top: 22.4pt;
     padding-top: 4pt;
-    border-top: 1px solid #000; }
-
+    border-top: 1px solid #000;
+  }
   thead {
-    display: table-header-group; }
-
+    display: table-header-group;
+  }
   tr,
   img {
-    page-break-inside: avoid; }
-
+    page-break-inside: avoid;
+  }
   img {
-    max-width: 100% !important; }
-
+    max-width: 100% !important;
+  }
   p,
   h2,
   h3 {
     orphans: 4;
-    widows: 4; }
-
+    widows: 4;
+  }
   article h2, article h2 h3, article h3, article h3 h4, article h4, article h4 h5 {
-    page-break-after: avoid; }
-
+    page-break-after: avoid;
+  }
   body header, body footer {
-    display: none; } }
+    display: none;
+  }
+}
 /* --- Icomoon icon fonts CSS --*/
 @font-face {
-  font-family: 'icomoon';
+  font-family: "icomoon";
   src: url("../fonts/icomoon.eot?rgwlb8");
   src: url("../fonts/icomoon.eot?#iefixrgwlb8") format("embedded-opentype"), url("../fonts/icomoon.woff?rgwlb8") format("woff"), url("../fonts/icomoon.ttf?rgwlb8") format("truetype"), url("../fonts/icomoon.svg?rgwlb8#icomoon") format("svg");
   font-weight: normal;
-  font-style: normal; }
-[class^="icon-"], [class*=" icon-"] {
-  font-family: 'icomoon';
+  font-style: normal;
+}
+[class^=icon-], [class*=" icon-"] {
+  font-family: "icomoon";
   speak: none;
   font-style: normal;
   font-weight: normal;
@@ -742,54 +910,73 @@ hr.slender {
   color: #a00000;
   /* Better Font Rendering =========== */
   -webkit-font-smoothing: antialiased;
-  -moz-osx-font-smoothing: grayscale; }
+  -moz-osx-font-smoothing: grayscale;
+}
 
 .icon-pencil:before {
-  content: "\e600"; }
+  content: "\e600";
+}
 
 .icon-film:before {
-  content: "\e60f"; }
+  content: "\e60f";
+}
 
 .icon-calendar:before {
-  content: "\e601"; }
+  content: "\e601";
+}
 
 .icon-link:before {
-  content: "\e602"; }
+  content: "\e602";
+}
 
 .icon-info:before {
-  content: "\e603"; }
+  content: "\e603";
+}
 
 .icon-cancel-circle:before {
-  content: "\e604"; }
+  content: "\e604";
+}
 
 .icon-checkmark-circle:before {
-  content: "\e605"; }
+  content: "\e605";
+}
 
 .icon-spam:before {
-  content: "\e606"; }
+  content: "\e606";
+}
 
 .icon-mail:before {
-  content: "\e607"; }
+  content: "\e607";
+}
 
 .icon-googleplus:before {
-  content: "\e608"; }
+  content: "\e608";
+}
 
 .icon-facebook:before {
-  content: "\e609"; }
+  content: "\e609";
+}
 
 .icon-twitter:before {
-  content: "\e60a"; }
+  content: "\e60a";
+}
 
 .icon-feed:before {
-  content: "\e60b"; }
+  content: "\e60b";
+}
 
 .icon-flickr:before {
-  content: "\e60c"; }
+  content: "\e60c";
+}
 
 .icon-github:before {
-  content: "\e60d"; }
+  content: "\e60d";
+}
 
 .icon-box-add:before {
-  content: "\e60e"; }
+  content: "\e60e";
+}
 
 /*-- End of Icomoon icon font section --*/
+
+/*# sourceMappingURL=tufte.css.map */
\ No newline at end of file
diff --git a/docs/css/tufte.css.map b/docs/css/tufte.css.map
new file mode 100644
index 0000000..a7148ed
--- /dev/null
+++ b/docs/css/tufte.css.map
@@ -0,0 +1 @@
+{"version":3,"sourceRoot":"","sources":["tufte.scss","../_sass/_fonts.scss","../_sass/_settings.scss","../_sass/_syntax-highlighting.scss"],"names":[],"mappings":";AAAA;AACA;AAAA;AAAA;AAAA;AAAA;AAAA;ACsCA;EACE;EACA;EACA;EACA;EACA;;AAGF;EACE;EACA;EACA;EACA;EACA;;AAGF;EACE;EACA;EACA;EACA;EACA;;AAGF;EACE;EACA;EACA;EACA;EACA;;ACpEF;ACAA;AAAA;AAAA;AAIA;EACI;;;AAGJ;EACI;;AAGA;EAAS;EAAa;;AACtB;EAAS;EAAgB;;AACzB;EAAS;;AACT;EAAS;;AACT;EAAS;EAAa;;AACtB;EAAS;EAAa;;AACtB;EAAS;EAAa;;AACtB;EAAS;EAAa;EAAmB;;AACzC;EAAS;EAAa;;AACtB;EAAS;EAAa;;AACtB;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;EAAa;;AACtB;EAAS;EAAa;;AACtB;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;EAAa;;AACtB;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;EAAa;;AACtB;EAAS;;AACT;EAAS;;AACT;EAAS;EAAa;;AACtB;EAAS;EAAa;;AACtB;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;AACT;EAAS;;;AHtDb;EAAI;EAAW;;;AAEf;AACA;EACE;EACA;EACA;;;AAGF;EAAa;;;AAKb;EACI;EACA;EACA;EACA;;;AAIJ;EAAsC;IAAM;;;AAE5C;EAAsC;IAAM;;;AAE5C;EAAuC;IAAM;;;AAE7C;EAAuC;IAAM;;;AAO7C;EACE;;;AAEF;EACE,OE3Ce;EF4Cf;;;AAiBF;AACA;EAA4B;;;AAE5B;EAAa;EACJ;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAET;EAAwD;IAAa;;;AAErE;EAAwB;EACJ;;;AAEpB;EAA6B;EACJ;;;AAEzB;EAAO;EACA;EACA;EACA;EACA;EACA,kBExFI;EFyFJ,OE1FM;EF2FN;EACA;;;AAKP;EAAK;EACA;EACA;EACA;EACA;;;AAsBL;EAAK;EACA;EACA;EACA;EACA;EACA;;;AAEL;EAAK;EACA;EACA;EACA;EACA;EACA;;;AAWL;EAAa;EACA;EACA;EACA;EACA;EACA;;;AAEb;EAAY;;;AAOZ;EAAI;EACA;EACA;EACA;EACA;;;AAEJ;EAAgB;EACA;EACA;EACA;EACA;EACA;EACA;;;AAKhB;EAAa;;;AAEb;EAAe;;;AAEf;EAAoB;EACA;EACA;;;AAGpB;EAAQ,aE7LK;EF8LL;;;AAER;EAAoC,aE/LxB;EFgMJ;EACA;EACA;;;AAGR;EAA4B;;;AAE5B;AAEA;EAAwB;EACP;EACA;EACA;EACA;;;AAEjB;EAAmB;EACJ;EACA;EACA;;;AAEf;EAAQ;EACA;;;AAER;EAAwB;;;AAExB;EAAsB;;;AAEtB;EAAwD;EAAuB;AAE/E;EAAoB;EACA;EACA;;;AAEpB;EAA6B;EACT;EACA;EACA;EACA;EACA;EACA;;;AAEpB;EAA0B;;;AAC1B;EAA0B;;;AAC1B;EAA0B;;;AAE1B;EAAiB;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AACjB;AAGA;AAUA;EAAU;EACA;;;AAEV;EAAU;EACA;;;AAEV;EAAY;;;AAEZ;EAAK;EACA;EACA;EACA;;;AAEL;EAAK;EACA;EACA;;;AAEL;EAAQ;;;AAER;EAAgC;EACvB;EACA;EACA;;;AAET;EAAa;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAEb;EAA8B;EAAa;EAAkB;;;AAE7D;EAAM;;;AAEN;EAAyB;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAEzB;EAA8B;;;AAE9B;EAA+C;;;AAE/C;EAAmB;;;AAEnB;EAA2C;EACA;EACA,OExT1B;EFyT0B;EACA;;;AAE3C;EAAyB;EACA;EACA;EACA;;;AAEzB;EAAmB;EACA,OElUF;EFmUE;;;AAEnB;EAA8C;;;AAE9C;EAAoB;;;AAEpB;EAAuC;IAA0B;;EAC1B;IAAW;;EACX;IAAK;;EACL;IAAS;;EACT;IAA0C;IACA;;EAC1C;IAAmC;IACA;IACA;;;AAG1E;EAAmC;;;AAEnC;EAA4B;EACjB;EACA;;;AAEX;EAA4B;;;AAEd;EAA2C;;;AAEzD;EAAkB;EACA;EACA;;;AAElB;EAAsB;;;AAEtB;EAAwB;;;AAExB;EAA4C;;;AAE5C;EAA4B;IAA4C;IAAiB,OExWxE;;EFyWW;IAAyB;;EACzB;AAAA;IACuC;IACA;IACA;IACA;IACA;IACA;IACA;IACA;;EACvC;IAAQ;;EACR;IAAoC;IACzB;;EACX;IAAiB;IACA;IACA;IACA;IACA;IACA;IACA;IACA;IACA;IACA;;EACjB;IAA2C;;EAC3C;IAAoB;;EACpB;IAAM;;;AAClC;AAEA;AAGA;EAAY,OExYK;;;AFyYjB;EAAW;;;AAGX;EACE;EACA;;;AAGF;EAA2B;;;AAC3B;EAA0B;;;AAE1B;EACE;EACA,aE3ZW;EF4ZX;EACA;EACA,OEzZe;EF0Zf;EACA;EACA;EACA;EACA;EACA;EACA;EAEA;;;AAEF;EACE;EACA;EACA;EACA;;;AAEF;EACE;EACA;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;;;AAKF;EACE,aE/bW;;AFgcT;EACE,OE5bW;;;AFocjB;EAAkE;EAC3D;EACA;EACA;;;AAGP;EACE;;;AAIF;EACE;EACA;EACA;EACA;;;AAGF;EACE;;;AAEF;EAEE;EACA;;;AAEF;EACE;;AACA;EACE;;;AAKJ;EACE;EACA;;;AAEF;EACE;EACA;;;AAEF;EACI;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAQJ;EACE;IACE;;;AAKJ;EACI;AAAA;AAAA;IAGI;IACA;IACA;IACA;;EAEJ;IACI;IACA;IAAW;;EAGf;IACI;;EAGJ;IACE,aE7hBO;IF8hBP;IACA;IACA;IACA;;EAGF;IACI;;EAGJ;AAAA;IAEI;;EAGJ;IACI;;EAGJ;AAAA;AAAA;IAGI;IACA;;EAEJ;IACI;;EAGJ;IACE;;;AAKN;AACA;EACE;EACA;EACA;EAIA;EACA;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA,OEhlBe;AFklBf;EACA;EACA;;;AAGF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF;EACE;;;AAEF","sourcesContent":["/*****************************************************************************/\n/*\n/* Tufte Jekyll blog theme\n/* Based on Tufte CSS by Dave Liepmann ( https://github.com/edwardtufte/tufte-css )\n/*\n/* The README.md will show you how to set up your site along with other goodies\n/*****************************************************************************/\n\n// Imports to create final\n\n@import \"../_sass/fonts\";\n@import \"../_sass/settings\";\n@import \"../_sass/syntax-highlighting\";\n\n// Global Resets\n//\n* { margin: 0; padding: 0; }\n\n/* clearfix hack after Cederholm (group class name) */\n.group:after {\n  content: \"\";\n  display: table;\n  clear: both;\n}\n\nhtml, body { height: 100%; }\n\n// First significant deviation from CSS on tufte.css site - variable font size as browser width expands or contracts\n// html { font-size: 15px; }\n\nhtml{\n    text-align: baseline;\n    font-size: 11px;\n    -webkit-font-smoothing: antialiased;\n    -moz-osx-font-smoothing: grayscale;\n\n  }\n\n@media screen and (min-width: 800px){ html{ font-size: 12px;} }\n\n@media screen and (min-width: 900px){ html{ font-size: 13px;} }\n\n@media screen and (min-width: 1000px){ html{ font-size: 14px;} }\n\n@media screen and (min-width: 1100px){ html{ font-size: 15px; } }\n\n// @media screen and (min-width: 1200px){ html{ font-size: 16px; } }\n//\n// @media screen and (min-width: 1300px){ html{ font-size: 17px; } }\n\n// sets link style according to values in _settings.scss\n.mathblock{\n  font-size: 1.5rem;\n}\na {\n  color: $contrast-color;\n  text-decoration: none;\n}\n\n// p > a { @if $link-style == underline\n//     {\n//       color: $text-color;\n//       text-decoration: none;\n//       border-bottom: 1px solid #777;\n//       padding-bottom: 1px;\n//     }\n//     @else\n//     {\n//       color: $contrast-color;\n//       text-decoration: none;\n//     }\n//   }\n\n/* Links: replicate underline that clears descenders */\np > a:link, p > a:visited { color: inherit; }\n\np > a:link { text-decoration: none;\n         background: -webkit-linear-gradient(#fffff8, #fffff8), -webkit-linear-gradient(#fffff8, #fffff8), -webkit-linear-gradient(#333, #333);\n         background: linear-gradient(#fffff8, #fffff8), linear-gradient(#fffff8, #fffff8), linear-gradient(#333, #333);\n         -webkit-background-size: 0.05em 1px, 0.05em 1px, 1px 1px;\n         -moz-background-size: 0.05em 1px, 0.05em 1px, 1px 1px;\n         background-size: 0.05em 1px, 0.05em 1px, 1px 1px;\n         background-repeat: no-repeat, no-repeat, repeat-x;\n         text-shadow: 0.03em 0 #fffff8, -0.03em 0 #fffff8, 0 0.03em #fffff8, 0 -0.03em #fffff8, 0.06em 0 #fffff8, -0.06em 0 #fffff8, 0.09em 0 #fffff8, -0.09em 0 #fffff8, 0.12em 0 #fffff8, -0.12em 0 #fffff8, 0.15em 0 #fffff8, -0.15em 0 #fffff8;\n         background-position: 0% 93%, 100% 93%, 0% 93%; }\n\n@media screen and (-webkit-min-device-pixel-ratio: 0) { p > a:link { background-position-y: 87%, 87%, 87%; } }\n\np > a:link::selection { text-shadow: 0.03em 0 #b4d5fe, -0.03em 0 #b4d5fe, 0 0.03em #b4d5fe, 0 -0.03em #b4d5fe, 0.06em 0 #b4d5fe, -0.06em 0 #b4d5fe, 0.09em 0 #b4d5fe, -0.09em 0 #b4d5fe, 0.12em 0 #b4d5fe, -0.12em 0 #b4d5fe, 0.15em 0 #b4d5fe, -0.15em 0 #b4d5fe;\n                    background: #b4d5fe; }\n\np > a:link::-moz-selection { text-shadow: 0.03em 0 #b4d5fe, -0.03em 0 #b4d5fe, 0 0.03em #b4d5fe, 0 -0.03em #b4d5fe, 0.06em 0 #b4d5fe, -0.06em 0 #b4d5fe, 0.09em 0 #b4d5fe, -0.09em 0 #b4d5fe, 0.12em 0 #b4d5fe, -0.12em 0 #b4d5fe, 0.15em 0 #b4d5fe, -0.15em 0 #b4d5fe;\n                         background: #b4d5fe; }\n\nbody { width: 87.5%;\n       margin-left: auto;\n       margin-right: auto;\n       padding-left: 12.5%;\n       font-family: et-book, Palatino, \"Palatino Linotype\", \"Palatino LT STD\", \"Book Antiqua\", Georgia, serif;\n       background-color: $bg-color;\n       color: $text-color;\n       max-width: 1400px;\n       counter-reset: sidenote-counter; }\n\n// --------- Typography stuff -----------//\n// added rational line height and margins ala http://webtypography.net/intro/\n\nh1 { font-weight: 400;\n     margin-top: 1.568rem;\n     margin-bottom: 1.568rem;\n     font-size: 2.5rem;\n     line-height: 0.784; }\n\n// h2 { font-style: italic;\n//      font-weight: 400;\n//      margin-top: 1.866666666666667rem;\n//      margin-bottom: 0;\n//      font-size: 2.1rem;\n//      line-height: 0.933333333333333; }\n//\n// h3 { font-style: italic;\n//      font-weight: 400;\n//      font-size: 1.8rem;\n//      margin-top: 2.1777777777777778rem;\n//      margin-bottom: 0;\n//      line-height: 1.08888888888889; }\n\n// h1 { font-weight: 400;\n//      margin-top: 4rem;\n//      margin-bottom: 1.5rem;\n//      font-size: 3.2rem;\n//      line-height: 1; }\n\nh2 { font-style: italic;\n     font-weight: 400;\n     margin-top: 4rem;\n     margin-bottom: 1rem;\n     font-size: 2.2rem;\n     line-height: 1; }\n\nh3 { font-style: italic;\n     font-weight: 400;\n     font-size: 1.7rem;\n     margin-top: 2rem;\n     margin-bottom: 0;\n     line-height: 1; }\n\n// ET says a need for more than 3 levels of headings is the sign of a diseased mind\n\n// p .subtitle { font-style: italic;\n//              margin-top: 2.1777777777777778rem;\n//              margin-bottom: 2.1777777777777778rem;\n//              font-size: 1.8rem;\n//              display: block;\n//              line-height: 1.08888888888889; }\n\np.subtitle { font-style: italic;\n             margin-top: 1rem;\n             margin-bottom: 1rem;\n             font-size: 1.8rem;\n             display: block;\n             line-height: 1; }\n\np, ol, ul { font-size: 1.4rem; }\n\n// p, li { line-height: 2rem;\n//         margin-top: 1.4rem;\n//         padding-right: 2rem; //removed because,  why?\n//         vertical-align: baseline; }\n\np { line-height: 2rem;\n    margin-top: 1.4rem;\n    margin-bottom: 1.4rem;\n    padding-right: 0;\n    vertical-align: baseline; }\n\nblockquote p {  font-size: 1.1rem;\n                line-height: 1.78181818;\n                margin-top: 1.78181818rem;\n                margin-bottom: 1.78181818rem;\n                width: 45%;\n                padding-left: 2.5%;\n                padding-right: 2.5%; }\n\n// blockquote footer { width: 45%;\n//                     text-align: right; }\n\nblockquote { font-size: 1.4rem; }\n\nblockquote p { width: 50%; }\n\nblockquote footer { width: 50%;\n                    font-size: 1.1rem;\n                    text-align: right; }\n\n\n.sans { font-family: $sans-font;\n        letter-spacing: .03em; }\n\npre, pre code, p code, p pre code { font-family: $code-font;  // removed .code 'class' since code is an actual html tag\n        font-size: 1.2rem;                   // also added p code, p pre code and pre selector to account for Markdown parsing\n        line-height: 1.71428571;        // of triple backticks plus rationalized line-heights and margins\n        margin-top: 1.71428571rem; }\n\n\nh1 code, h2 code, h3 code { font-size: 0.80em; } //left in for no real reason\n\n/*-- Table styling section - For Tufte-Jekyll, booktabs style is default for Markdown tables  --*/\n\ntable, table.booktabs { width: auto;  //making booktabs style tables the unstyled default in case someone uses Markdown styling\n                 margin: 0 auto;\n                 border-spacing: 0px;\n                 border-top: 2px solid $border-color;\n                 border-bottom: 2px solid $border-color; }\n\n.booktabs th, th { border-bottom: 1px solid $border-color;\n               padding: 0.65ex 0.5em 0.4ex 0.5em;\n               font-weight: normal;\n               text-align: center; }\n\nth, td{ font-size: 1.2rem;\n        line-height: 1.71428571;  }\n\n.booktabs th.cmid, th { border-bottom: 1px solid lighten($border-color, 25%); }\n\n.booktabs th.nocmid { border-bottom: none; }\n\n.booktabs tbody tr:first-child td,  tr:first-child td { padding-top: 0.65ex; } /* add space between thead row and tbody */\n\n.booktabs td, td {  padding-left: 0.5em;\n                    padding-right: 0.5em;\n                    text-align: left; }\n\n.booktabs caption, caption { font-size: 90%;\n                    text-align: left;\n                    width: auto;\n                    margin-left: auto;\n                    margin-right: auto;\n                    margin-top: 1ex;\n                    caption-side: top; }\n\n.booktabs tbody tr td.l { text-align: left !important; }\n.booktabs tbody tr td.c { text-align: center !important; }\n.booktabs tbody tr td.r { text-align: right !important; }\n\n.table-caption { float:right;\n                 clear:right;\n                 margin-right: -60%;\n                 width: 50%;\n                 margin-top: 0;\n                 margin-bottom: 0;\n                 font-size: 1.0rem;\n                 line-height: 1.96; }\n/* -- End of Table Styling section --*/\n\n\n/* Basic Layout stuff --*/\n\n// article { position: relative;\n//           padding: 1\n//\n//           rem 0rem 2.5rem 0rem; } // reduced top and bottom padding by 50%\n//\n// section { padding-top: 1rem;\n//           padding-bottom: 1rem; }\n\narticle { position: relative;\n          padding: 5rem 0rem; }\n\nsection { padding-top: 1rem;\n          padding-bottom: 1rem; }\n\np, ol, ul { font-size: 1.4rem; }\n\nul { width: 45%;\n     -webkit-padding-start: 5%;\n     -webkit-padding-end: 5%;\n     list-style-type: none; }\n\nol { -webkit-padding-start: 5%;\n     -webkit-padding-end: 5%;\n     list-style-type: decimal; }\n\nul li { padding: 0.5em 0; } //vertical padding on list items screws up vertical rhythym\n\nfigure, figure img.maincolumn { max-width: 55%;\n         -webkit-margin-start: 0;\n         -webkit-margin-end: 0;\n         margin-bottom: 3em; }\n\nfigcaption { float: right;\n             clear: right;\n             margin-right: -48%;\n             margin-top: 0;\n             margin-bottom: 0;\n             font-size: 1.0rem;\n             line-height: 1.6;\n             vertical-align: baseline;\n             position: relative;\n             max-width: 40%; }\n\nfigure.fullwidth figcaption { float: left; margin-right: 0%; margin-left: 36%; }\n\nimg { max-width: 100%; }\n\n.sidenote, .marginnote { float: right;\n                         clear: right;\n                         margin-right: -60%;\n                         width: 50%;\n                         margin-top: 0;\n                         margin-bottom: 1.96rem;\n                         font-size: 1.0rem;\n                         line-height: 1.96; //changed to bring line heights into rational pattern\n                         vertical-align: baseline;\n                         position: relative; }\n\nli .sidenote, li .marginnote{ margin-right: -80%; } //added to allow for the fact that lists are indented and marginnotes and sidenotes push to right\n\nblockquote .sidenote, blockquote .marginnote { margin-right: -79% }\n\n.sidenote-number { counter-increment: sidenote-counter; }\n\n.sidenote-number:after, .sidenote:before { content: counter(sidenote-counter) \" \";\n                                           font-family: et-book-roman-old-style;\n                                           color: $contrast-color; //added color\n                                           position: relative;\n                                           vertical-align: baseline; }\n\n.sidenote-number:after { content: counter(sidenote-counter);\n                         font-size: 1rem;\n                         top: -0.5rem;\n                         left: 0.1rem; }\n\n.sidenote:before { content: counter(sidenote-counter) \".\\000a0\"; // this is unicode for a non-breaking space\n                   color: $contrast-color;\n                   top: 0rem; } //removed superscripting for numerical reference in sidenote\n\np, footer, div.table-wrapper, div.mathblock { width: 55%; }\n\ndiv.table-wrapper { overflow-x: auto; } //changed all overflow values to 'auto' so scroll bars appear only as needed\n\n@media screen and (max-width: 760px) { p, footer,div.mathblock { width: 90%; }\n                                       pre code { width: 87.5%; }\n                                       ul { width: 85%; }\n                                       figure { max-width: 90%; }\n                                       figcaption, figure.fullwidth figcaption { margin-right: 0%;\n                                                                                 max-width: none; }\n                                       blockquote p, blockquote footer {  width: 80%;\n                                                                          padding-left: 5%;\n                                                                          padding-right: 5%;\n                                                                        }}\n\n.marginnote code, .sidenote code { font-size: 1rem; } //more .code class removal\n\npre, pre code, p pre code { width: 52.5%;\n           padding-left: 2.5%;\n           overflow-x: auto; }\n\n.fullwidth, li.listing div{ max-width: 90%; }\n\n.full-width { .sidenote, .sidenote-number, .marginnote { display: none; } }\n\nspan.newthought { font-variant: small-caps;\n                  font-size: 1.2em;\n                  letter-spacing: 0.05rem; }\n\ninput.margin-toggle { display: none; }\n\nlabel.sidenote-number { display: inline; }\n\nlabel.margin-toggle:not(.sidenote-number) { display: none; }\n\n@media (max-width: 760px) { label.margin-toggle:not(.sidenote-number) { display: inline; color: $contrast-color; }\n                            .sidenote, .marginnote { display: none; }\n                            .margin-toggle:checked + .sidenote,\n                            .margin-toggle:checked + .marginnote { display: block;\n                                                                   float: left;\n                                                                   left: 1rem;\n                                                                   clear: both;\n                                                                   width: 95%;\n                                                                   margin: 1rem 2.5%;\n                                                                   vertical-align: baseline;\n                                                                   position: relative; }\n                            label { cursor: pointer; }\n                            pre, pre code, p code, p pre code { width: 90%;\n                                       padding: 0; }\n                            .table-caption { display: block;\n                                             float: right;\n                                             clear: both;\n                                             width: 98%;\n                                             margin-top: 1rem;\n                                             margin-bottom: 0.5rem;\n                                             margin-left: 1%;\n                                             margin-right: 1%;\n                                             vertical-align: baseline;\n                                             position: relative; }\n                            div.table-wrapper, table, table.booktabs { width: 85%; }\n                            div.table-wrapper { border-right: 1px solid #efefef; }\n                            img { max-width: 100%; } }\n/*--- End of Basic Layout stuff from tufte.css ---*/\n\n/* -- Jekyll specific styling --*/\n//helper classes\n\n.contrast { color: $contrast-color;}\n.smaller { font-size: 80%;}\n//Nav and Footer styling area\n\nheader > nav.group, body footer {\n  width: 95%;\n  padding-top: 2rem;\n}\n\nnav.group a.active:before{ content:\"\\0003c\\000a0\";} // escaped unicode for the carats and then a space on active menu links\nnav.group a.active:after{ content:\"\\000a0\\0003e\" ;}\n\nheader > nav  a{\n  font-size: 1.2rem;\n  font-family: $sans-font;\n  letter-spacing: 0.15em;\n  text-transform: uppercase;\n  color: $contrast-color;\n  padding-top: 1.5rem;\n  text-decoration: none;\n  display: inline-block;\n  float: left;\n  margin-top: 0;\n  margin-bottom: 0;\n  padding-right: 2rem;\n  //margin-left: 1rem;\n  vertical-align: baseline;\n}\nheader > nav a img{\n  height: 5rem;\n  position: relative;\n  max-width: 100%;\n  top:-1.5rem;\n}\nul.footer-links, .credits{\n  list-style: none;\n  text-align: center;\n  margin: 0 auto;\n}\nul.footer-links li{\n  display: inline;\n  padding: 0.5rem 0.25rem;\n}\n.credits{\n  padding: 1rem 0rem;\n}\n\n//change font color for credit links in footer\n\n.credits{\n  font-family: $sans-font;\n    & a{\n      color: $contrast-color;\n    }\n}\n\n// End of Nav and Footer styling area\n\n//Full width page styling stuff\n\nbody.full-width, .content-listing, ul.content-listing li.listing{ width: 90%;\n       margin-left: auto;\n       margin-right: auto;\n       padding: 0% 5%;\n\n}\n.full-width article p{\n  width: 90%;\n}\n\n\nh1.content-listing-header{\n  font-style: normal;\n  text-transform: uppercase;\n  letter-spacing: 0.2rem;\n  font-size: 1.8rem;\n}\n\nli.listing hr{\n  width:100%;\n}\n.listing, .listing h3\n{\n  display: inline-block;\n  margin:0;\n}\nli.listing {\n  margin:0;\n  & p{\n    width: 100%\n  }\n}\n\n\nli.listing:last-of-type{\n  border-bottom: none;\n  margin-bottom: 1.4rem;\n}\nli.listing h3.new {\n  text-transform: uppercase;\n  font-style: normal;\n}\nhr.slender {\n    border: 0;\n    height: 1px;\n    margin-top: 1.4rem;\n    margin-bottom:1.4rem;\n    background-image: -webkit-linear-gradient(left, rgba(0,0,0,0), rgba(0,0,0,0.75), rgba(0,0,0,0));\n    background-image:    -moz-linear-gradient(left, rgba(0,0,0,0), rgba(0,0,0,0.75), rgba(0,0,0,0));\n    background-image:     -ms-linear-gradient(left, rgba(0,0,0,0), rgba(0,0,0,0.75), rgba(0,0,0,0));\n    background-image:      -o-linear-gradient(left, rgba(0,0,0,0), rgba(0,0,0,0.75), rgba(0,0,0,0));\n}\n// End of front listing page stuff\n\n\n// Printing ands screen media queries\n\n// Does not display a print-footer for screen display\n@media screen{\n  .print-footer{\n    display: none;\n  }\n}\n\n//printing stuff\n@media print {\n    *,\n    *:before,\n    *:after {\n        background: transparent !important;\n        color: #000 !important; // Black prints faster:http://www.sanbeiji.com/archives/953\n        box-shadow: none !important;\n        text-shadow: none !important;\n    }\n    @page {\n        margin: 0.75in 0.5in 0.75in 0.5in;\n        orphans:4; widows:2;\n    }\n\n    body {\n        font-size:  12pt;\n\n    }\n    html body span.print-footer{\n      font-family: $sans-font;\n      font-size: 9pt;\n      margin-top: 22.4pt;\n      padding-top: 4pt;\n      border-top: 1px solid #000;\n    }\n\n    thead {\n        display: table-header-group;\n    }\n\n    tr,\n    img {\n        page-break-inside: avoid;\n    }\n\n    img {\n        max-width: 100% !important;\n    }\n\n    p,\n    h2,\n    h3 {\n        orphans: 4;\n        widows: 4;\n    }\n    article h2, article h2 h3, article h3, article h3 h4, article h4, article h4 h5 {\n        page-break-after: avoid;\n    }\n\n    body header , body footer {\n      display:none;\n    }\n}\n\n\n/* --- Icomoon icon fonts CSS --*/\n@font-face {\n  font-family: 'icomoon';\n  src:url('../fonts/icomoon.eot?rgwlb8');\n  src:url('../fonts/icomoon.eot?#iefixrgwlb8') format('embedded-opentype'),\n    url('../fonts/icomoon.woff?rgwlb8') format('woff'),\n    url('../fonts/icomoon.ttf?rgwlb8') format('truetype'),\n    url('../fonts/icomoon.svg?rgwlb8#icomoon') format('svg');\n  font-weight: normal;\n  font-style: normal;\n}\n\n[class^=\"icon-\"], [class*=\" icon-\"] {\n  font-family: 'icomoon';\n  speak: none;\n  font-style: normal;\n  font-weight: normal;\n  font-variant: normal;\n  text-transform: none;\n  line-height: 1;\n  color: $contrast-color;\n\n  /* Better Font Rendering =========== */\n  -webkit-font-smoothing: antialiased;\n  -moz-osx-font-smoothing: grayscale;\n}\n\n.icon-pencil:before {\n  content: \"\\e600\";\n}\n.icon-film:before {\n  content: \"\\e60f\";\n}\n.icon-calendar:before {\n  content: \"\\e601\";\n}\n.icon-link:before {\n  content: \"\\e602\";\n}\n.icon-info:before {\n  content: \"\\e603\";\n}\n.icon-cancel-circle:before {\n  content: \"\\e604\";\n}\n.icon-checkmark-circle:before {\n  content: \"\\e605\";\n}\n.icon-spam:before {\n  content: \"\\e606\";\n}\n.icon-mail:before {\n  content: \"\\e607\";\n}\n.icon-googleplus:before {\n  content: \"\\e608\";\n}\n.icon-facebook:before {\n  content: \"\\e609\";\n}\n.icon-twitter:before {\n  content: \"\\e60a\";\n}\n.icon-feed:before {\n  content: \"\\e60b\";\n}\n.icon-flickr:before {\n  content: \"\\e60c\";\n}\n.icon-github:before {\n  content: \"\\e60d\";\n}\n.icon-box-add:before {\n  content: \"\\e60e\";\n}\n/*-- End of Icomoon icon font section --*/\n","// Font imports file. If you don't want these fonts, comment out these and add your own into the fonts directory \n// and point the src attribute to the file.\n// \n\n@charset \"UTF-8\";\n//\n// @font-face {\n//   font-family: ETBembo;\n//   src: url(\"../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.eot\");\n//   src: url(\"../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.woff\") format(\"woff\"), url(\"../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-bembo/et-bembo-roman-line-figures/et-bembo-roman-line-figures.svg#etbemboromanosf\") format(\"svg\");\n//   font-weight: normal;\n//   font-style: normal\n// }\n//\n// @font-face {\n//   font-family: ETBembo;\n//   src: url(\"../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.eot\");\n//   src: url(\"../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.woff\") format(\"woff\"), url(\"../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-bembo/et-bembo-display-italic-old-style-figures/et-bembo-display-italic-old-style-figures.svg#etbemboromanosf\") format(\"svg\");\n//   font-weight: normal;\n//   font-style: italic\n// }\n//\n// @font-face {\n//   font-family: ETBembo;\n//   src: url(\"../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.eot\");\n//   src: url(\"../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.woff\") format(\"woff\"), url(\"../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-bembo/et-bembo-bold-line-figures/et-bembo-bold-line-figures.svg#etbemboromanosf\") format(\"svg\");\n//   font-weight: bold;\n//   font-style: normal\n// }\n//\n// @font-face {\n//   font-family: ETBemboRomanOldStyle;\n//   src: url(\"../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.eot\");\n//   src: url(\"../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.woff\") format(\"woff\"), url(\"../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-bembo/et-bembo-roman-old-style-figures/et-bembo-roman-old-style-figures.svg#etbemboromanosf\") format(\"svg\");\n//   font-weight: normal;\n//   font-style: normal;\n// }\n\n\n@font-face {\n  font-family: \"et-book\";\n  src: url(\"../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot\");\n  src: url(\"../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff\") format(\"woff\"), url(\"../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-book/et-book-roman-line-figures/et-book-roman-line-figures.svg#etbookromanosf\") format(\"svg\");\n  font-weight: normal;\n  font-style: normal\n}\n\n@font-face {\n  font-family: \"et-book\";\n  src: url(\"../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot\");\n  src: url(\"../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.woff\") format(\"woff\"), url(\"../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.svg#etbookromanosf\") format(\"svg\");\n  font-weight: normal;\n  font-style: italic\n}\n\n@font-face {\n  font-family: \"et-book\";\n  src: url(\"../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot\");\n  src: url(\"../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.woff\") format(\"woff\"), url(\"../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-book/et-book-bold-line-figures/et-book-bold-line-figures.svg#etbookromanosf\") format(\"svg\");\n  font-weight: bold;\n  font-style: normal\n}\n\n@font-face {\n  font-family: \"et-book-roman-old-style\";\n  src: url(\"../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot\");\n  src: url(\"../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot?#iefix\") format(\"embedded-opentype\"), url(\"../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.woff\") format(\"woff\"), url(\"../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.ttf\") format(\"truetype\"), url(\"../fonts/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.svg#etbookromanosf\") format(\"svg\");\n  font-weight: normal;\n  font-style: normal;\n}\n","/* This file contains all the constants for colors and font styles */\n\n$body-font:   et-book, Palatino, \"Palatino Linotype\", \"Palatino LT STD\", \"Book Antiqua\", Georgia, serif;\n// $body-font:   ETBembo, Palatino, \"Palatino Linotype\", \"Palatino LT STD\", \"Book Antiqua\", Georgia, serif;\n// Note that Gill Sans is the top of the stack and corresponds to what is used in Tufte's books\n// However, it is not a free font, so if it is not present on the computer that is viewing the webpage\n// The free Google 'Lato' font is used instead. It is similar.\n$sans-font:  \"Gill Sans\", \"Gill Sans MT\", \"Lato\", Calibri, sans-serif;\n$code-font: Consolas, \"Liberation Mono\", Menlo, Courier, monospace;\n$url-font: \"Lucida Console\", \"Lucida Sans Typewriter\", Monaco, \"Bitstream Vera Sans Mono\", monospace;\n$text-color: #111;\n$bg-color: #fffff8;\n$contrast-color: #a00000;\n$border-color: #333333;\n$link-style: underline; // choices are 'color' or 'underline'. Default is color using $contrast-color set above\n\n\n\n","/**\n * Syntax highlighting styles\n */\n$spacing-unit:     30px;\n%vertical-rhythm {\n    margin-bottom: $spacing-unit / 2;\n}\n\n.highlight {\n    background: #fffff8;\n    @extend %vertical-rhythm;\n\n    .c     { color: #998; font-style: italic } // Comment\n    .err   { color: #a61717; background-color: #e3d2d2 } // Error\n    .k     { font-weight: bold } // Keyword\n    .o     { font-weight: bold } // Operator\n    .cm    { color: #998; font-style: italic } // Comment.Multiline\n    .cp    { color: #999; font-weight: bold } // Comment.Preproc\n    .c1    { color: #998; font-style: italic } // Comment.Single\n    .cs    { color: #999; font-weight: bold; font-style: italic } // Comment.Special\n    .gd    { color: #000; background-color: #fdd } // Generic.Deleted\n    .gd .x { color: #000; background-color: #faa } // Generic.Deleted.Specific\n    .ge    { font-style: italic } // Generic.Emph\n    .gr    { color: #a00 } // Generic.Error\n    .gh    { color: #999 } // Generic.Heading\n    .gi    { color: #000; background-color: #dfd } // Generic.Inserted\n    .gi .x { color: #000; background-color: #afa } // Generic.Inserted.Specific\n    .go    { color: #888 } // Generic.Output\n    .gp    { color: #555 } // Generic.Prompt\n    .gs    { font-weight: bold } // Generic.Strong\n    .gu    { color: #aaa } // Generic.Subheading\n    .gt    { color: #a00 } // Generic.Traceback\n    .kc    { font-weight: bold } // Keyword.Constant\n    .kd    { font-weight: bold } // Keyword.Declaration\n    .kp    { font-weight: bold } // Keyword.Pseudo\n    .kr    { font-weight: bold } // Keyword.Reserved\n    .kt    { color: #458; font-weight: bold } // Keyword.Type\n    .m     { color: #099 } // Literal.Number\n    .s     { color: #d14 } // Literal.String\n    .na    { color: #008080 } // Name.Attribute\n    .nb    { color: #0086B3 } // Name.Builtin\n    .nc    { color: #458; font-weight: bold } // Name.Class\n    .no    { color: #008080 } // Name.Constant\n    .ni    { color: #800080 } // Name.Entity\n    .ne    { color: #900; font-weight: bold } // Name.Exception\n    .nf    { color: #900; font-weight: bold } // Name.Function\n    .nn    { color: #555 } // Name.Namespace\n    .nt    { color: #000080 } // Name.Tag\n    .nv    { color: #008080 } // Name.Variable\n    .ow    { font-weight: bold } // Operator.Word\n    .w     { color: #bbb } // Text.Whitespace\n    .mf    { color: #099 } // Literal.Number.Float\n    .mh    { color: #099 } // Literal.Number.Hex\n    .mi    { color: #099 } // Literal.Number.Integer\n    .mo    { color: #099 } // Literal.Number.Oct\n    .sb    { color: #d14 } // Literal.String.Backtick\n    .sc    { color: #d14 } // Literal.String.Char\n    .sd    { color: #d14 } // Literal.String.Doc\n    .s2    { color: #d14 } // Literal.String.Double\n    .se    { color: #d14 } // Literal.String.Escape\n    .sh    { color: #d14 } // Literal.String.Heredoc\n    .si    { color: #d14 } // Literal.String.Interpol\n    .sx    { color: #d14 } // Literal.String.Other\n    .sr    { color: #009926 } // Literal.String.Regex\n    .s1    { color: #d14 } // Literal.String.Single\n    .ss    { color: #990073 } // Literal.String.Symbol\n    .bp    { color: #999 } // Name.Builtin.Pseudo\n    .vc    { color: #008080 } // Name.Variable.Class\n    .vg    { color: #008080 } // Name.Variable.Global\n    .vi    { color: #008080 } // Name.Variable.Instance\n    .il    { color: #099 } // Literal.Number.Integer.Long\n}\n"],"file":"tufte.css"}
\ No newline at end of file
diff --git a/docs/flow/index.html b/docs/flow/index.html
index d01ba1f..ff0eba6 100644
--- a/docs/flow/index.html
+++ b/docs/flow/index.html
@@ -77,27 +77,26 @@ <h1>Normalizing flow models</h1>
 </script>
 
 
-<p>We continue our study over another type of likelihood based generative models. As before, we assume we are given access to a dataset <script type="math/tex">\mathcal{D}</script> of <script type="math/tex">n</script>-dimensional datapoints <script type="math/tex">\mathbf{x}</script>. So far we have learned two types of likelihood based generative models:</p>
+<p>We continue our study over another type of likelihood based generative models. As before, we assume we are given access to a dataset \(\mathcal{D}\) of \(n\)-dimensional datapoints \(\mathbf{x}\). So far we have learned two types of likelihood based generative models:</p>
 
 <ol>
   <li>
-    <p>Autoregressive Models: <script type="math/tex">% <![CDATA[
-p_\theta(\mathbf{x}) = \prod_{i=1}^{N} p_\theta(x_i \vert \mathbf{x}_{<i}) %]]></script></p>
+    <p>Autoregressive Models: \(p_\theta(\mathbf{x}) = \prod_{i=1}^{N} p_\theta(x_i \vert \mathbf{x}_{&lt;i})\)</p>
   </li>
   <li>
-    <p>Variational autoencoders: <script type="math/tex">p_\theta(\mathbf{x}) = \int p_\theta(\mathbf{x}, \mathbf{z}) \text{d}\mathbf{z}</script></p>
+    <p>Variational autoencoders: \(p_\theta(\mathbf{x}) = \int p_\theta(\mathbf{x}, \mathbf{z}) \text{d}\mathbf{z}\)</p>
   </li>
 </ol>
 
 <p>The two methods have relative strengths and weaknesses. Autoregressive models provide tractable likelihoods but no direct mechanism for learning features, whereas variational autoencoders can learn feature representations but have intractable marginal likelihoods.</p>
 
-<p>In this section, we introduce normalizing flows a type of method that combines the best of both worlds, allowing both feature learning and tractable marginal likelihood estimation.</p>
+<p>In this section, we introduce normalizing flows: a type of method that combines the best of both worlds, allowing both feature learning and tractable marginal likelihood estimation.</p>
 
 <h1 id="change-of-variables-formula">Change of Variables Formula</h1>
 
 <p>In normalizing flows, we wish to map simple distributions (easy to sample and evaluate densities) to complex ones (learned via data). The change of variables formula describe how to evaluate densities of a random variable that is a deterministic transformation from another variable.</p>
 
-<p><strong>Change of Variables</strong>: <script type="math/tex">Z</script> and <script type="math/tex">X</script> be random variables which are related by a mapping <script type="math/tex">f: \mathbb{R}^n \to \mathbb{R}^n</script> such that <script type="math/tex">X = f(Z)</script> and <script type="math/tex">Z = f^{-1}(X)</script>. Then</p>
+<p><strong>Change of Variables</strong>: \(Z\) and \(X\) be random variables which are related by a mapping \(f: \mathbb{R}^n \to \mathbb{R}^n\) such that \(X = f(Z)\) and \(Z = f^{-1}(X)\). Then</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">p_X(\mathbb{x}) = p_Z(f^{-1}(\mathbb{x})) \left\vert \text{det}\left(\frac{\partial f^{-1}(\mathbb{x})}{\partial \mathbb{x}}\right) \right\vert
 
@@ -107,16 +106,16 @@ <h1 id="change-of-variables-formula">Change of Variables Formula</h1>
 
 <ol>
   <li>
-    <p><script type="math/tex">\mathbb{x}</script> and <script type="math/tex">\mathbb{z}</script> need to be continuous and have the same dimension.</p>
+    <p>\(\mathbb{x}\) and \(\mathbb{z}\) need to be continuous and have the same dimension.</p>
   </li>
   <li>
-    <p><script type="math/tex">\frac{\partial f^{-1}(\mathbb{x})}{\partial \mathbb{x}}</script> is a matrix of dimension <script type="math/tex">n \times n</script>, where each entry at location <script type="math/tex">(i, j)</script> is defined as <script type="math/tex">\frac{\partial f^{-1}(\mathbb{x})_i}{\partial x_j}</script>. This matrix is also known as the Jacobian matrix.</p>
+    <p>\(\frac{\partial f^{-1}(\mathbb{x})}{\partial \mathbb{x}}\) is a matrix of dimension \(n \times n\), where each entry at location \((i, j)\) is defined as \(\frac{\partial f^{-1}(\mathbb{x})_i}{\partial x_j}\). This matrix is also known as the Jacobian matrix.</p>
   </li>
   <li>
-    <p><script type="math/tex">\text{det}(A)</script> denotes the determinant of a square matrix <script type="math/tex">A</script>.</p>
+    <p>\(\text{det}(A)\) denotes the determinant of a square matrix \(A\).</p>
   </li>
   <li>
-    <p>For any invertible matrix <script type="math/tex">A</script>, <script type="math/tex">\text{det}(A^{-1}) = \text{det}(A)^{-1}</script>, so for <script type="math/tex">\mathbb{z} = f^{-1}(\mathbb{x})</script> we have</p>
+    <p>For any invertible matrix \(A\), \(\text{det}(A^{-1}) = \text{det}(A)^{-1}\), so for \(\mathbb{z} = f^{-1}(\mathbb{x})\) we have</p>
 
     <div class="mathblock"><script type="math/tex; mode=display">
 
@@ -125,17 +124,17 @@ <h1 id="change-of-variables-formula">Change of Variables Formula</h1>
 </script></div>
   </li>
   <li>
-    <p>If <script type="math/tex">\left \vert \text{det}\left(\frac{\partial f(\mathbb{z})}{\partial \mathbb{z}}\right) \right\vert = 1</script>, then the mappings is volume preserving, which means that the transformed distribution <script type="math/tex">p_X</script> will have the same “volume” compared to the original one <script type="math/tex">p_Z</script>.</p>
+    <p>If \(\left \vert \text{det}\left(\frac{\partial f(\mathbb{z})}{\partial \mathbb{z}}\right) \right\vert = 1\), then the mappings is volume preserving, which means that the transformed distribution \(p_X\) will have the same “volume” compared to the original one \(p_Z\).</p>
   </li>
 </ol>
 
 <h1 id="normalizing-flow-models">Normalizing Flow Models</h1>
 
-<p>We are ready to introduce normalizing flow models. Let us consider a directed, latent-variable model over observed variables <script type="math/tex">X</script> and latent variables <script type="math/tex">Z</script>. In a <strong>normalizing flow model</strong>, the mapping between <script type="math/tex">Z</script> and <script type="math/tex">X</script>, given by <script type="math/tex">f_\theta: \mathbb{R}^n \to \mathbb{R}^n</script>, is deterministic and invertible such that <script type="math/tex">X = f_\theta(Z)</script> and <script type="math/tex">Z  = f_\theta^{-1}(X)</script><sup id="fnref:nf"><a href="#fn:nf" class="footnote">1</a></sup>.</p>
+<p>We are ready to introduce normalizing flow models. Let us consider a directed, latent-variable model over observed variables \(X\) and latent variables \(Z\). In a <strong>normalizing flow model</strong>, the mapping between \(Z\) and \(X\), given by \(f_\theta: \mathbb{R}^n \to \mathbb{R}^n\), is deterministic and invertible such that \(X = f_\theta(Z)\) and \(Z  = f_\theta^{-1}(X)\)<sup id="fnref:nf"><a href="#fn:nf" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>
 
 <p><img src="flow-graphical.png" alt="" /></p>
 
-<p>Using change of variables, the marginal likelihood <script type="math/tex">p(x)</script> is given by</p>
+<p>Using change of variables, the marginal likelihood \(p(x)\) is given by</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 
@@ -166,7 +165,7 @@ <h1 id="normalizing-flow-models">Normalizing Flow Models</h1>
 \mathbf{x} = f_\theta(z) = \mathbf{z} + \mathbf{u} h(\mathbf{w}^\top \mathbf{z} + b)
 </script></div>
 
-<p>where <script type="math/tex">\mathbf{u}, \mathbf{w}, b</script> are parameters.</p>
+<p>where \(\mathbf{u}, \mathbf{w}, b\) are parameters.</p>
 
 <p>The absolute value of the determinant of the Jacobian is given by</p>
 
@@ -174,31 +173,31 @@ <h1 id="normalizing-flow-models">Normalizing Flow Models</h1>
 \left\vert \text{det}\left(\frac{\partial f(\mathbb{z})}{\partial \mathbb{z}}\right) \right\vert = \left\vert 1 + h'(\mathbf{w}^\top \mathbf{z} + b) \mathbf{u}^\top \mathbf{w} \right\vert
 </script></div>
 
-<p>However,  <script type="math/tex">\mathbf{u}, \mathbf{w}, b, h(\cdot)</script> need to be restricted in order to be invertible. For example, <script type="math/tex">h = \tanh</script> and <script type="math/tex">h'(\mathbf{w}^\top \mathbf{z} + b) \mathbf{u}^\top \mathbf{w} \geq -1</script>. Note that while <script type="math/tex">f_\theta(\mathbf{z})</script> is invertible, computing <script type="math/tex">f_\theta^{-1}(\mathbf{z})</script> could be difficult analytically.  The following models address this problem, where both <script type="math/tex">f_\theta</script> and <script type="math/tex">f_\theta^{-1}</script> have simple analytical forms.</p>
+<p>However,  \(\mathbf{u}, \mathbf{w}, b, h(\cdot)\) need to be restricted in order to be invertible. For example, \(h = \tanh\) and \(h'(\mathbf{w}^\top \mathbf{z} + b) \mathbf{u}^\top \mathbf{w} \geq -1\). Note that while \(f_\theta(\mathbf{z})\) is invertible, computing \(f_\theta^{-1}(\mathbf{z})\) could be difficult analytically.  The following models address this problem, where both \(f_\theta\) and \(f_\theta^{-1}\) have simple analytical forms.</p>
 
-<p>The Nonlinear Independent Components Estimation (NICE) model and Real Non-Volume Preserving (RealNVP) model composes two kinds of invertible transformations: additive coupling layers and rescaling layers. The coupling layer in NICE partitions a variable <script type="math/tex">\mathbf{z}</script> into two disjoints subsets, say <script type="math/tex">\mathbf{z}_1</script> and <script type="math/tex">\mathbf{z}_2</script>. Then it applies the following transformation:</p>
+<p>The Nonlinear Independent Components Estimation (NICE) model and Real Non-Volume Preserving (RealNVP) model composes two kinds of invertible transformations: additive coupling layers and rescaling layers. The coupling layer in NICE partitions a variable \(\mathbf{z}\) into two disjoint subsets, say \(\mathbf{z}_1\) and \(\mathbf{z}_2\). Then it applies the following transformation:</p>
 
-<p>Forward mapping <script type="math/tex">\mathbf{z} \to \mathbf{x}</script></p>
+<p>Forward mapping \(\mathbf{z} \to \mathbf{x}\)</p>
 <ol>
   <li>
-    <p><script type="math/tex">\mathbf{x}_1 = \mathbf{z}_1</script>, which is an identity mapping.</p>
+    <p>\(\mathbf{x}_1 = \mathbf{z}_1\), which is an identity mapping.</p>
   </li>
   <li>
-    <p><script type="math/tex">\mathbf{x}_2 = \mathbf{z}_2 + m_\theta(\mathbf{z_1})</script>, where <script type="math/tex">m_\theta</script> is a neural network.</p>
+    <p>\(\mathbf{x}_2 = \mathbf{z}_2 + m_\theta(\mathbf{z_1})\), where \(m_\theta\) is a neural network.</p>
   </li>
 </ol>
 
-<p>Inverse mapping <script type="math/tex">\mathbf{x} \to \mathbf{z}</script>:</p>
+<p>Inverse mapping \(\mathbf{x} \to \mathbf{z}\):</p>
 <ol>
   <li>
-    <p><script type="math/tex">\mathbf{z}_1 = \mathbf{x}_1</script>, which is an identity mapping.</p>
+    <p>\(\mathbf{z}_1 = \mathbf{x}_1\), which is an identity mapping.</p>
   </li>
   <li>
-    <p><script type="math/tex">\mathbf{z}_2 = \mathbf{x}_2 - m_\theta(\mathbf{x_1})</script>, which is the inverse of the forward transformation.</p>
+    <p>\(\mathbf{z}_2 = \mathbf{x}_2 - m_\theta(\mathbf{x_1})\), which is the inverse of the forward transformation.</p>
   </li>
 </ol>
 
-<p>Therefore, the Jacobian of the forward mapping is lower trangular, whose determinant is simply the product of the elements on the diagonal, which is 1. Therefore, this defines a volume preserving transformation. RealNVP adds scaling factors to the transformation:</p>
+<p>Therefore, the Jacobian of the forward mapping is lower triangular, whose determinant is simply the product of the elements on the diagonal, which is 1. Therefore, this defines a volume preserving transformation. RealNVP adds scaling factors to the transformation:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 
@@ -206,15 +205,15 @@ <h1 id="normalizing-flow-models">Normalizing Flow Models</h1>
 
 </script></div>
 
-<p>where <script type="math/tex">\odot</script> denotes elementwise product. This results in a non-volume preserving transformation.</p>
+<p>where \(\odot\) denotes elementwise product. This results in a non-volume preserving transformation.</p>
 
-<p>Some autoregressive models can also be interpreted as flow models. For a Gaussian autoregressive model, one receive some Gaussian noise for each dimension of <script type="math/tex">\mathbb{x}</script>, which can be treated as the latent variables <script type="math/tex">\mathbf{z}</script>. Such transformations are also invertible, meaning that given <script type="math/tex">\mathbf{x}</script> and the model parameters, we can obtain <script type="math/tex">\mathbf{z}</script> exactly.</p>
+<p>Some autoregressive models can also be interpreted as flow models. For a Gaussian autoregressive model, one receive some Gaussian noise for each dimension of \(\mathbb{x}\), which can be treated as the latent variables \(\mathbf{z}\). Such transformations are also invertible, meaning that given \(\mathbf{x}\) and the model parameters, we can obtain \(\mathbf{z}\) exactly.</p>
 
-<p>Masked Autoregressive Flow (MAF) uses this interpretation, where the forward mapping is an autoregressive model. However, sampling is sequential and slow, in <script type="math/tex">O(n)</script> time where <script type="math/tex">n</script> is the dimension of the samples.</p>
+<p>Masked Autoregressive Flow (MAF) uses this interpretation, where the forward mapping is an autoregressive model. However, sampling is sequential and slow, in \(O(n)\) time where \(n\) is the dimension of the samples.</p>
 
 <p><img src="maf.png" alt="" /></p>
 
-<p>To address the sampling problem, the Inverse Autoregressive Flow (IAF) simply inverts the generating process. In this case, generating <script type="math/tex">\mathbf{x}</script> from the noise can be parallelized, but computing the likelihood of new data points is slow. However, for generated points the likelihood can be computed efficiently (since the noise are already obtained).</p>
+<p>To address the sampling problem, the Inverse Autoregressive Flow (IAF) simply inverts the generating process. In this case, generating \(\mathbf{x}\) from the noise can be parallelized, but computing the likelihood of new data points is slow. However, for generated points the likelihood can be computed efficiently (since the noise are already obtained).</p>
 
 <p><img src="iaf.png" alt="" /></p>
 
@@ -222,10 +221,10 @@ <h1 id="normalizing-flow-models">Normalizing Flow Models</h1>
 
 <h1 id="footnotes">Footnotes</h1>
 
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
   <ol>
     <li id="fn:nf">
-      <p>Recall the conditions for change of variable formula. <a href="#fnref:nf" class="reversefootnote">&#8617;</a></p>
+      <p>Recall the conditions for change of variable formula. <a href="#fnref:nf" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
   </ol>
 </div>
@@ -261,8 +260,8 @@ <h1 id="footnotes">Footnotes</h1>
   <!--      -->
   <!-- </ul> -->
 <div class="credits">
-<!-- <span>&#38;copy; 2018 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
-<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2018</span> 
+<!-- <span>&#38;copy; 2025 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
+<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2025</span> 
 </div>  
 </footer>
 
diff --git a/docs/gan/index.html b/docs/gan/index.html
index 6b95745..39efa93 100644
--- a/docs/gan/index.html
+++ b/docs/gan/index.html
@@ -83,13 +83,13 @@ <h1 id="likelihood-free-learning">Likelihood-free learning</h1>
 
 <p>Why not? In fact, it is not so clear that better likelihood numbers necessarily correspond to higher sample quality. We know that the <em>optimal generative model</em> will give us the best sample quality and highest test log-likelihood. However, models with high test log-likelihoods can still yield poor samples, and vice versa. To see why, consider pathological cases in which our model is comprised almost entirely of noise, or our model simply memorizes the training set. Therefore, we turn to <em>likelihood-free training</em> with the hope that optimizing a different objective will allow us to disentangle our desiderata of obtaining high likelihoods as well as high-quality samples.</p>
 
-<p>Recall that maximum likelihood required us to evaluate the likelihood of the data under our model <script type="math/tex">p_\theta</script>. A natural way to set up a likelihood-free objective is to consider the <em>two-sample test</em>, a statistical test that determines whether or not a finite set of samples from two distributions are from the same distribution <em>using only samples from <script type="math/tex">P</script> and <script type="math/tex">Q</script></em>. Concretely, given <script type="math/tex">S_1 = \{\mathbf{x} \sim P\}</script> and <script type="math/tex">S_2 = \{\mathbf{x} \sim Q\}</script>, we compute a test statistic <script type="math/tex">T</script> according to the difference in <script type="math/tex">S_1</script> and <script type="math/tex">S_2</script> that, when less than a threshold <script type="math/tex">\alpha</script>, accepts the null hypothesis that <script type="math/tex">P = Q</script>.</p>
+<p>Recall that maximum likelihood required us to evaluate the likelihood of the data under our model \(p_\theta\). A natural way to set up a likelihood-free objective is to consider the <em>two-sample test</em>, a statistical test that determines whether or not a finite set of samples from two distributions are from the same distribution <em>using only samples from \(P\) and \(Q\)</em>. Concretely, given \(S_1 = \{\mathbf{x} \sim P\}\) and \(S_2 = \{\mathbf{x} \sim Q\}\), we compute a test statistic \(T\) according to the difference in \(S_1\) and \(S_2\) that, when less than a threshold \(\alpha\), accepts the null hypothesis that \(P = Q\).</p>
 
-<p>Analogously, we have in our generative modeling setup access to our training set <script type="math/tex">S_1 = \mathcal{D} = \{\mathbf{x} \sim p_{\textrm{data}} \}</script> and <script type="math/tex">S_2 = \{\mathbf{x} \sim p_{\theta} \}</script>. The key idea is to train the model to minimize a <em>two-sample test objective</em> between <script type="math/tex">S_1</script> and <script type="math/tex">S_2</script>. But this objective becomes extremely difficult to work with in high dimensions, so we choose to optimize a surrogate objective that instead <em>maximizes some distance</em> between <script type="math/tex">S_1</script> and <script type="math/tex">S_2</script>.</p>
+<p>Analogously, we have in our generative modeling setup access to our training set \(S_1 = \mathcal{D} = \{\mathbf{x} \sim p_{\textrm{data}} \}\) and \(S_2 = \{\mathbf{x} \sim p_{\theta} \}\). The key idea is to train the model to minimize a <em>two-sample test objective</em> between \(S_1\) and \(S_2\). But this objective becomes extremely difficult to work with in high dimensions, so we choose to optimize a surrogate objective that instead <em>maximizes some distance</em> between \(S_1\) and \(S_2\).</p>
 
 <h1 id="gan-objective">GAN Objective</h1>
 
-<p>We thus arrive at the generative adversarial network formulation. There are two components in a GAN: (1) a generator and (2) a discriminator. The generator <script type="math/tex">G_\theta</script> is a directed latent variable model that deterministically generates samples <script type="math/tex">\mathbf{x}</script> from <script type="math/tex">\mathbf{z}</script>, and the discriminator <script type="math/tex">D_\phi</script> is a function whose job is to distinguish samples from the real dataset and the generator. The image below is a graphical model of <script type="math/tex">G_\theta</script> and <script type="math/tex">D_\phi</script>. <script type="math/tex">\mathbf{x}</script> denotes samples (either from data or generator), <script type="math/tex">\mathbf{z}</script> denotes our noise vector, and <script type="math/tex">\mathbf{y}</script> denotes the discriminator’s prediction about <script type="math/tex">\mathbf{x}</script>.</p>
+<p>We thus arrive at the generative adversarial network formulation. There are two components in a GAN: (1) a generator and (2) a discriminator. The generator \(G_\theta\) is a directed latent variable model that deterministically generates samples \(\mathbf{x}\) from \(\mathbf{z}\), and the discriminator \(D_\phi\) is a function whose job is to distinguish samples from the real dataset and the generator. The image below is a graphical model of \(G_\theta\) and \(D_\phi\). \(\mathbf{x}\) denotes samples (either from data or generator), \(\mathbf{z}\) denotes our noise vector, and \(\mathbf{y}\) denotes the discriminator’s prediction about \(\mathbf{x}\).</p>
 
 <figure>
 <center><img src="gan.png" alt="drawing" width="300" class="center" /></center>
@@ -98,7 +98,7 @@ <h1 id="gan-objective">GAN Objective</h1>
  </figcaption> -->
 </figure>
 
-<p>The generator and discriminator both play a two player minimax game, where the generator minimizes a two-sample test objective (<script type="math/tex">p_{\textrm{data}} = p_\theta</script>) and the discriminator maximizes the objective (<script type="math/tex">p_{\textrm{data}} \neq p_\theta</script>). Intuitively, the generator tries to fool the discriminator to the best of its ability by generating samples that look indisginguishable from <script type="math/tex">p_{\textrm{data}}</script>.</p>
+<p>The generator and discriminator both play a two player minimax game, where the generator minimizes a two-sample test objective (\(p_{\textrm{data}} = p_\theta\)) and the discriminator maximizes the objective (\(p_{\textrm{data}} \neq p_\theta\)). Intuitively, the generator tries to fool the discriminator to the best of its ability by generating samples that look indistinguishable from \(p_{\textrm{data}}\).</p>
 
 <p>Formally, the GAN objective can be written as:</p>
 
@@ -107,40 +107,40 @@ <h1 id="gan-objective">GAN Objective</h1>
 \mathbb{E}_{\mathbf{z} \sim p(\textbf{z})}[\log (1-D_\phi(G_\theta(\textbf{z})))]
 </script></div>
 
-<p>Let’s unpack this expression. We know that the discriminator is maximizing this function with respect to its parameters <script type="math/tex">\phi</script>, where given a fixed generator <script type="math/tex">G_\theta</script> it is performing binary classification: it assigns probability 1 to data points from the training set <script type="math/tex">\mathbf{x} \sim p_{\textrm{data}}</script>, and assigns probability 0 to generated samples <script type="math/tex">\mathbf{x} \sim p_G</script>. In this setup, the optimal discriminator is:</p>
+<p>Let’s unpack this expression. We know that the discriminator is maximizing this function with respect to its parameters \(\phi\), where given a fixed generator \(G_\theta\) it is performing binary classification: it assigns probability 1 to data points from the training set \(\mathbf{x} \sim p_{\textrm{data}}\), and assigns probability 0 to generated samples \(\mathbf{x} \sim p_G\). In this setup, the optimal discriminator is:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 D^*_{G}(\mathbf{x}) = \frac{p_{\textrm{data}}(\mathbf{x})}{p_{\textrm{data}}(\mathbf{x}) + p_G(\mathbf{x})}
 </script></div>
 
-<p>On the other hand, the generator minimizes this objective for a fixed discriminator <script type="math/tex">D_\phi</script>. And after performing some algebra, plugging in the optimal discriminator <script type="math/tex">D^*_G(\cdot)</script> into the overall objective <script type="math/tex">V(G_\theta, D^*_G(\mathbf{x}))</script> gives us:</p>
+<p>On the other hand, the generator minimizes this objective for a fixed discriminator \(D_\phi\). And after performing some algebra, plugging in the optimal discriminator \(D^*_G(\cdot)\) into the overall objective \(V(G_\theta, D^*_G(\mathbf{x}))\) gives us:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 2D_{\textrm{JSD}}[p_{\textrm{data}}, p_G] - \log 4
 </script></div>
 
-<p>The <script type="math/tex">D_{\textrm{JSD}}</script> term is the <em>Jenson-Shannon Divergence</em>, which is also known as the symmetric form of the KL divergence:</p>
+<p>The \(D_{\textrm{JSD}}\) term is the <em>Jenson-Shannon Divergence</em>, which is also known as the symmetric form of the KL divergence:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 D_{\textrm{JSD}}[p, q] = \frac{1}{2} \left( D_{\textrm{KL}}\left[p, \frac{p+q}{2} \right] + D_{\textrm{KL}}\left[q, \frac{p+q}{2} \right] \right)
 </script></div>
 
-<p>The JSD satisfies all properties of the KL, and has the additional perk that <script type="math/tex">D_{\textrm{JSD}}[p,q] = D_{\textrm{JSD}}[q,p]</script>. With this distance metric, the optimal generator for the GAN objective becomces <script type="math/tex">p_G = p_{\textrm{data}}</script>, and the optimal objective value that we can achieve with optimal generators and discriminators <script type="math/tex">G^*(\cdot)</script> and <script type="math/tex">D^*_{G^*}(\mathbf{x})</script> is <script type="math/tex">-\log 4</script>.</p>
+<p>The JSD satisfies all properties of the KL, and has the additional perk that \(D_{\textrm{JSD}}[p,q] = D_{\textrm{JSD}}[q,p]\). With this distance metric, the optimal generator for the GAN objective becomes \(p_G = p_{\textrm{data}}\), and the optimal objective value that we can achieve with optimal generators and discriminators \(G^*(\cdot)\) and \(D^*_{G^*}(\mathbf{x})\) is \(-\log 4\).</p>
 
 <h1 id="gan-training-algorithm">GAN training algorithm</h1>
 
 <p>Thus, the way in which we train a GAN is as follows:</p>
 
-<p>For epochs <script type="math/tex">1, \ldots, N</script> do:</p>
+<p>For epochs \(1, \ldots, N\) do:</p>
 <ol>
-  <li>Sample minibatch of size <script type="math/tex">m</script> from data: <script type="math/tex">\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(m)} \sim \mathcal{D}</script></li>
-  <li>Sample minibatch of size <script type="math/tex">m</script> of noise: <script type="math/tex">\mathbf{z}^{(1)}, \ldots, \mathbf{z}^{(m)} \sim p_z</script></li>
-  <li>Take a gradient <em>descent</em> step on the generator parameters <script type="math/tex">\theta</script>:
+  <li>Sample minibatch of size \(m\) from data: \(\mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(m)} \sim \mathcal{D}\)</li>
+  <li>Sample minibatch of size \(m\) of noise: \(\mathbf{z}^{(1)}, \ldots, \mathbf{z}^{(m)} \sim p_z\)</li>
+  <li>Take a gradient <em>descent</em> step on the generator parameters \(\theta\):
     <div class="mathblock"><script type="math/tex; mode=display">
  \triangledown_\theta V(G_\theta, D_\phi) = \frac{1}{m} \triangledown_\theta \sum_{i=1}^m \log \left(1 - D_\phi(G_\theta(\mathbf{z}^{(i)})) \right)
  </script></div>
   </li>
-  <li>Take a gradient <em>ascent</em> step on the discriminator parameters <script type="math/tex">\phi</script>:
+  <li>Take a gradient <em>ascent</em> step on the discriminator parameters \(\phi\):
     <div class="mathblock"><script type="math/tex; mode=display">
  \triangledown_\phi V(G_\theta, D_\phi) = \frac{1}{m} \triangledown_\phi \sum_{i=1}^m \left[\log D_\phi(\mathbf{x}^{(i)}) + \log (1 - D_\phi(G_\theta(\mathbf{z}^{(i)}))) \right]
  </script></div>
@@ -158,20 +158,20 @@ <h1 id="selected-gans">Selected GANs</h1>
 <p>Next, we focus our attention to a few select types of GAN architectures and explore them in more detail.</p>
 
 <h3 id="f-gan">f-GAN</h3>
-<p>The <a href="https://arxiv.org/abs/1606.00709">f-GAN</a> optimizes the variant of the two-sample test objective that we have discussed so far, but using a very general notion of distance: the <script type="math/tex">f divergence</script>. Given two densities <script type="math/tex">p</script> and <script type="math/tex">q</script>, the <script type="math/tex">f</script>-divergence can be written as:</p>
+<p>The <a href="https://arxiv.org/abs/1606.00709">f-GAN</a> optimizes the variant of the two-sample test objective that we have discussed so far, but using a very general notion of distance: the \(f divergence\). Given two densities \(p\) and \(q\), the \(f\)-divergence can be written as:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 D_f(p,q) = \mathbb{E}_{\mathbf{x}\sim q}\left[f \left(\frac{p(\mathbf{x})}{q(\mathbf{x})} \right) \right]
 </script></div>
-<p>where <script type="math/tex">f</script> is any convex<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, lower-semicontinuous<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> function with <script type="math/tex">f(1) = 0</script>. Several of the distance “metrics” that we have seen so far fall under the class of f-divergences, such as KL, Jenson-Shannon, and total variation.</p>
+<p>where \(f\) is any convex<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, lower-semicontinuous<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> function with \(f(1) = 0\). Several of the distance “metrics” that we have seen so far fall under the class of f-divergences, such as KL, Jenson-Shannon, and total variation.</p>
 
-<p>To set up the f-GAN objective, we borrow two commonly used tools from convex optimization<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>: the Fenchel conjugate and duality. Specifically, we obtain a lower bound to any f-divergence via its Fenchel conjugate:</p>
+<p>To set up the f-GAN objective, we borrow two commonly used tools from convex optimization<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>: the Fenchel conjugate and duality. Specifically, we obtain a lower bound to any f-divergence via its Fenchel conjugate:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 D_f(p,q) \geq \sup_{T \in \mathcal{T}} \left(\mathbb{E}_{x \sim p}[T(\mathbf{x})] - \mathbb{E}_{x \sim q}[f^*(T(\mathbf{x}))] \right)
 </script></div>
 
-<p>Therefore we can choose any f-divergence that we desire, let <script type="math/tex">p = p_{\textrm{data}}</script> and <script type="math/tex">q = p_G</script>, parameterize <script type="math/tex">T</script> by <script type="math/tex">\phi</script> and <script type="math/tex">G</script> by <script type="math/tex">\theta</script>, and obtain the following fGAN objective:</p>
+<p>Therefore we can choose any f-divergence that we desire, let \(p = p_{\textrm{data}}\) and \(q = p_G\), parameterize \(T\) by \(\phi\) and \(G\) by \(\theta\), and obtain the following fGAN objective:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \min_\theta \max_\phi F(\theta,\phi) =  \mathbb{E}_{x \sim p_{\textrm{data}}}[T_\phi(\mathbf{x})] - \mathbb{E}_{x \sim p_{G_\theta}}[f^*(T_\phi(\mathbf{x}))]
@@ -183,9 +183,9 @@ <h3 id="bigan">BiGAN</h3>
 <p>We won’t worry too much about the <a href="https://arxiv.org/abs/1605.09782">BiGAN</a> in these notes. However, we can think about this model as one that allows us to infer latent representations even within a GAN framework.</p>
 
 <h3 id="cyclegan">CycleGAN</h3>
-<p><a href="https://arxiv.org/abs/1703.10593">CycleGAN</a> is a type of GAN that allows us to do unsupervised image-to-image translation, from two domains <script type="math/tex">\mathcal{X} \leftrightarrow \mathcal{Y}</script>.</p>
+<p><a href="https://arxiv.org/abs/1703.10593">CycleGAN</a> is a type of GAN that allows us to do unsupervised image-to-image translation, from two domains \(\mathcal{X} \leftrightarrow \mathcal{Y}\).</p>
 
-<p>Specifically, we learn two conditional generative models: <script type="math/tex">G: \mathcal{X} \leftrightarrow \mathcal{Y}</script> and <script type="math/tex">F: \mathcal{Y} \leftrightarrow \mathcal{X}</script>. There is a discriminator <script type="math/tex">D_\mathcal{Y}</script> associated with <script type="math/tex">G</script> that compares the true <script type="math/tex">Y</script> with the generated samples <script type="math/tex">\hat{Y} = G(X)</script>. Similarly, there is another discriminator <script type="math/tex">D_\mathcal{X}</script> associated with <script type="math/tex">F</script> that compares the true <script type="math/tex">X</script> with the generated samples <script type="math/tex">\hat{X} = F(Y)</script>. The figure below illustrates the CycleGAN setup:</p>
+<p>Specifically, we learn two conditional generative models: \(G: \mathcal{X} \leftrightarrow \mathcal{Y}\) and \(F: \mathcal{Y} \leftrightarrow \mathcal{X}\). There is a discriminator \(D_\mathcal{Y}\) associated with \(G\) that compares the true \(Y\) with the generated samples \(\hat{Y} = G(X)\). Similarly, there is another discriminator \(D_\mathcal{X}\) associated with \(F\) that compares the true \(X\) with the generated samples \(\hat{X} = F(Y)\). The figure below illustrates the CycleGAN setup:</p>
 
 <figure>
 <center><img src="cyclegan_gendisc.png" alt="drawing" width="300" class="center" /></center>
@@ -194,23 +194,23 @@ <h3 id="cyclegan">CycleGAN</h3>
  </figcaption> -->
 </figure>
 
-<p>CycleGAN enforces a property known as <em>cycle consistency</em>, which states that if we can go from <script type="math/tex">X</script> to <script type="math/tex">\hat{Y}</script> via <script type="math/tex">G</script>, then we should also be able to go from <script type="math/tex">\hat{Y}</script> to <script type="math/tex">X</script> via <script type="math/tex">F</script>. The overall loss function can be written as:</p>
+<p>CycleGAN enforces a property known as <em>cycle consistency</em>, which states that if we can go from \(X\) to \(\hat{Y}\) via \(G\), then we should also be able to go from \(\hat{Y}\) to \(X\) via \(F\). The overall loss function can be written as:</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \min_{F, G, D_\mathcal{X}, D_\mathcal{Y}} \mathcal{L}_{GAN}(G, D_\mathcal{Y}, X, Y) + \mathcal{L}_{GAN}(F, D_\mathcal{X}, X, Y) + \lambda \left(\mathbb{E}_X [||F(G(X)) - X||_1] + \mathbb{E}_Y [||G(F(Y)) - Y||_1] \right)
 </script></div>
 
 <h1 id="footnotes">Footnotes</h1>
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
   <ol>
     <li id="fn:1">
-      <p>In this context, convex means a line joining any two points that lies above the function. <a href="#fnref:1" class="reversefootnote">&#8617;</a></p>
+      <p>In this context, convex means a line joining any two points that lies above the function. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
     <li id="fn:2">
-      <p>The function value at any point <script type="math/tex">\mathbf{x}_0</script> is close to or greater than <script type="math/tex">f(\mathbf{x}_0)</script>. <a href="#fnref:2" class="reversefootnote">&#8617;</a></p>
+      <p>The function value at any point \(\mathbf{x}_0\) is close to or greater than \(f(\mathbf{x}_0)\). <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
     <li id="fn:3">
-      <p>This <a href="http://web.stanford.edu/~boyd/cvxbook/">book</a> is an excellent resource to learn more about these topics. <a href="#fnref:3" class="reversefootnote">&#8617;</a></p>
+      <p>This <a href="http://web.stanford.edu/~boyd/cvxbook/">book</a> is an excellent resource to learn more about these topics. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
   </ol>
 </div>
@@ -246,8 +246,8 @@ <h1 id="footnotes">Footnotes</h1>
   <!--      -->
   <!-- </ul> -->
 <div class="credits">
-<!-- <span>&#38;copy; 2018 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
-<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2018</span> 
+<!-- <span>&#38;copy; 2025 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
+<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2025</span> 
 </div>  
 </footer>
 
diff --git a/docs/index.html b/docs/index.html
index 3c1c7c5..1d5e3c0 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -133,8 +133,8 @@ <h1>Contents</h1>
   <!--      -->
   <!-- </ul> -->
 <div class="credits">
-<!-- <span>&#38;copy; 2018 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
-<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2018</span> 
+<!-- <span>&#38;copy; 2025 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
+<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2025</span> 
 </div>  
 </footer>
 
diff --git a/docs/introduction/index.html b/docs/introduction/index.html
index f0f5990..0755b30 100644
--- a/docs/introduction/index.html
+++ b/docs/introduction/index.html
@@ -87,24 +87,24 @@ <h1>Introduction</h1>
 
 <p>In this course, we will study generative models that view the world under the lens of probability.
 In such a worldview, we can think of any kind of
-observed data, say <script type="math/tex">\mathcal{D}</script>, as a finite set of samples from an
-underlying distribution, say <script type="math/tex">p_{\mathrm{data}}</script>. At its very core, the
+observed data, say \(\mathcal{D}\), as a finite set of samples from an
+underlying distribution, say \(p_{\mathrm{data}}\). At its very core, the
 goal of any generative model is then to approximate this data
-distribution given access to the dataset <script type="math/tex">\mathcal{D}</script>. The hope is that
+distribution given access to the dataset \(\mathcal{D}\). The hope is that
 if we are able to <em>learn</em> a good generative model, we can use the
 learned model for downstream <em>inference</em>.</p>
 
 <h2 id="learning">Learning</h2>
 
 <p>We will be primarily interested in parametric approximations to the data
-distribution, which summarize all the information about the dataset <script type="math/tex">\mathcal{D}</script> in
+distribution, which summarize all the information about the dataset \(\mathcal{D}\) in
 a finite set of parameters. In contrast with non-parametric models,
 parametric models scale more efficiently with large datasets but are
 limited in the family of distributions they can represent.</p>
 
 <p>In the parametric setting, we can think of the task of learning a
 generative model as picking the parameters within a family of model
-distributions that minimizes some notion of distance<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> between the
+distributions that minimizes some notion of distance<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> between the
 model distribution and the data distribution.</p>
 
 <p><img src="learning_1.png" alt="drawing" width="200" class="center" /></p>
@@ -113,24 +113,24 @@ <h2 id="learning">Learning</h2>
 <!-- ![given](learning_1.png =100x20)
 ![goal](learning_2.png =100x20) --></p>
 
-<p>For instance, we might be given access to a dataset of dog images <script type="math/tex">\mathcal{D}</script> and
-our goal is to learn the paraemeters of  a generative model <script type="math/tex">\theta</script> within a model family <script type="math/tex">\mathcal{M}</script> such that
-the model distribution <script type="math/tex">p_\theta</script> is close to the data distribution over dogs
-<script type="math/tex">p_{\mathrm{data}}</script>. Mathematically, we can specify our goal as the
-following optimization problem: <script type="math/tex"></script>\begin{equation}
+<p>For instance, we might be given access to a dataset of dog images \(\mathcal{D}\) and
+our goal is to learn the parameters of  a generative model \(\theta\) within a model family \(\mathcal{M}\) such that
+the model distribution \(p_\theta\) is close to the data distribution over dogs
+\(p_{\mathrm{data}}\). Mathematically, we can specify our goal as the
+following optimization problem: \(\)\begin{equation}
 \min_{\theta\in \mathcal{M}}d(p_{\mathrm{data}}, p_{\theta})
 \label{eq:learning_gm}
 \tag{1}
-\end{equation}<script type="math/tex"></script>where <script type="math/tex">p_{\mathrm{data}}</script> is accessed via the dataset
-<script type="math/tex">\mathcal{D}</script> and <script type="math/tex">d(\cdot)</script> is a notion of distance between probability distributions.</p>
+\end{equation}\(\)where \(p_{\mathrm{data}}\) is accessed via the dataset
+\(\mathcal{D}\) and \(d(\cdot)\) is a notion of distance between probability distributions.</p>
 
 <p>As we navigate through this course, it is interesting to take note of
 the difficulty of the problem at hand. A typical image from a modern
-phone camera has a resolution of approximately <script type="math/tex">700 \times 1400</script> pixels.
+phone camera has a resolution of approximately \(700 \times 1400\) pixels.
 Each pixel has three channels: R(ed), G(reen) and B(lue) and each
 channel can take a value between 0 to 255. Hence, the number of possible
-images is given by <script type="math/tex">256^{700 \times 1400 \times 3}\approx 10 ^{800000}</script>.
-In contrast, Imagenet, one of the largest publicly available datasets,
+images is given by \(256^{700 \times 1400 \times 3}\approx 10 ^{800000}\).
+In contrast, ImageNet, one of the largest publicly available datasets,
 consists of only about 15 million images. Hence, learning a generative
 model with such a limited dataset is a highly underdetermined problem.</p>
 
@@ -142,13 +142,13 @@ <h2 id="learning">Learning</h2>
 learns the underlying structure directly from data. There is no free
 lunch however, and indeed successful learning of generative models will
 involve instantiating the optimization problem in
-<script type="math/tex">(\ref{eq:learning_gm})</script> in a suitable way. In this course, we will be
+\((\ref{eq:learning_gm})\) in a suitable way. In this course, we will be
 primarily interested in the following questions:</p>
 
 <ul>
-  <li>What is the representation for the model family <script type="math/tex">\mathcal{M}</script>?</li>
-  <li>What is the objective function <script type="math/tex">d(\cdot)</script>?</li>
-  <li>What is the optimization procedure for minimizing <script type="math/tex">d(\cdot)</script>?</li>
+  <li>What is the representation for the model family \(\mathcal{M}\)?</li>
+  <li>What is the objective function \(d(\cdot)\)?</li>
+  <li>What is the optimization procedure for minimizing \(d(\cdot)\)?</li>
 </ul>
 
 <p>In the next few set of lectures, we will take a deeper dive into certain
@@ -161,7 +161,7 @@ <h2 id="inference">Inference</h2>
 <p>For a discriminative model such as logistic regression, the fundamental
 inference task is to predict a label for any given datapoint. Generative
 models, on the other hand, learn a joint distribution over the entire
-data.<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></p>
+data.<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup></p>
 
 <p>While the range of applications to which generative models have been
 used continue to grow, we can identify three fundamental inference
@@ -169,23 +169,23 @@ <h2 id="inference">Inference</h2>
 
 <ol>
   <li>
-    <p><em>Density estimation:</em> Given a datapoint <script type="math/tex">\mathbf{x}</script>, what is the
-probability assigned by the model, i.e., <script type="math/tex">p_\theta(\mathbf{x})</script>?</p>
+    <p><em>Density estimation:</em> Given a datapoint \(\mathbf{x}\), what is the
+probability assigned by the model, i.e., \(p_\theta(\mathbf{x})\)?</p>
   </li>
   <li>
     <p><em>Sampling:</em> How can we <em>generate</em> novel data from the model
 distribution, i.e.,
-<script type="math/tex">\mathbf{x}_{\mathrm{new}} \sim p_\theta(\mathbf{x})</script>?</p>
+\(\mathbf{x}_{\mathrm{new}} \sim p_\theta(\mathbf{x})\)?</p>
   </li>
   <li>
     <p><em>Unsupervised representation learning:</em> How can we learn meaningful
-feature representations for a datapoint <script type="math/tex">\mathbf{x}</script>?</p>
+feature representations for a datapoint \(\mathbf{x}\)?</p>
   </li>
 </ol>
 
 <p>Going back to our example of learning a generative model over dog
 images, we can intuitively expect a good generative model to work as
-follows. For density estimation, we expect <script type="math/tex">p_\theta(\mathbf{x})</script> to be
+follows. For density estimation, we expect \(p_\theta(\mathbf{x})\) to be
 high for dog images and low otherwise. Alluding to the name <em>generative
 model</em>, sampling involves generating novel images of dogs beyond the
 ones we observe in our dataset. Finally, representation learning can
@@ -205,18 +205,18 @@ <h2 id="inference">Inference</h2>
 
 <h2 id="footnotes">Footnotes</h2>
 
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
   <ol>
     <li id="fn:1">
       <p>As we shall see later, functions that do not satisfy all
 properties of a distance metric are also used in practice, e.g., KL
-divergence. <a href="#fnref:1" class="reversefootnote">&#8617;</a></p>
+divergence. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
     <li id="fn:2">
       <p>Technically, a probabilistic discriminative model is also a
 generative model of the labels conditioned on the data. However, the
 usage of the term generative models is typically reserved for high
-dimensional data. <a href="#fnref:2" class="reversefootnote">&#8617;</a></p>
+dimensional data. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
   </ol>
 </div>
@@ -252,8 +252,8 @@ <h2 id="footnotes">Footnotes</h2>
   <!--      -->
   <!-- </ul> -->
 <div class="credits">
-<!-- <span>&#38;copy; 2018 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
-<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2018</span> 
+<!-- <span>&#38;copy; 2025 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
+<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2025</span> 
 </div>  
 </footer>
 
diff --git a/docs/vae/index.html b/docs/vae/index.html
index c9d2f14..e83567a 100644
--- a/docs/vae/index.html
+++ b/docs/vae/index.html
@@ -115,12 +115,12 @@ <h1 id="representation">Representation</h1>
  </figcaption>
 </figure>
 
-<p>In the model above, <script type="math/tex">\bz</script> and <script type="math/tex">\bx</script> denote the latent and observed variables respectively. The joint distribution expressed by this model is given as</p>
+<p>In the model above, \(\bz\) and \(\bx\) denote the latent and observed variables respectively. The joint distribution expressed by this model is given as</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 p_\theta(\bx, \bz) = p(\bx \giv \bz)p(\bz).
 </script></div>
 
-<p>From a generative modeling perspective, this model describes a generative process for the observed data <script type="math/tex">\bx</script> using the following procedure</p>
+<p>From a generative modeling perspective, this model describes a generative process for the observed data \(\bx\) using the following procedure</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \bz &\sim p(\bz) \\
@@ -128,18 +128,18 @@ <h1 id="representation">Representation</h1>
 \end{align}
 </script></div>
 
-<p>If one adopts the belief that the latent variables <script type="math/tex">\bz</script> somehow encode semantically meaningful information about <script type="math/tex">\bx</script>, it is natural to view this generative process as first generating the “high-level” semantic information about <script type="math/tex">\bx</script> first before fully generating <script type="math/tex">\bx</script>. Such a perspective motivates generative models with rich latent variable structures such as hierarchical generative models <script type="math/tex">p(\bx, \bz_1, \ldots, \bz_m) = p(\bx \giv \bz_1)\prod_i p(\bz_i \giv \bz_{i+1})</script>—where information about <script type="math/tex">\bx</script> is generated hierarchically—and temporal models such as the Hidden Markov Model—where temporally-related high-level information is generated first before constructing <script type="math/tex">\bx</script>.</p>
+<p>If one adopts the belief that the latent variables \(\bz\) somehow encode semantically meaningful information about \(\bx\), it is natural to view this generative process as first generating the “high-level” semantic information about \(\bx\) first before fully generating \(\bx\). Such a perspective motivates generative models with rich latent variable structures such as hierarchical generative models \(p(\bx, \bz_1, \ldots, \bz_m) = p(\bx \giv \bz_1)\prod_i p(\bz_i \giv \bz_{i+1})\)—where information about \(\bx\) is generated hierarchically—and temporal models such as the Hidden Markov Model—where temporally-related high-level information is generated first before constructing \(\bx\).</p>
 
-<p>We now consider a family of distributions <script type="math/tex">\P_\bz</script> where <script type="math/tex">p(\bz) \in \P_\bz</script> describes a probability distribution over <script type="math/tex">\bz</script>. Next, consider a family of conditional distributions <script type="math/tex">\P_{\bx\giv \bz}</script> where <script type="math/tex">p_\theta(\bx \giv \bz) \in \P_{\bx\giv \bz}</script> describes a conditional probability distribution over <script type="math/tex">\bx</script> given <script type="math/tex">\bz</script>. Then our hypothesis class of generative models is the set of all possible combinations</p>
+<p>We now consider a family of distributions \(\P_\bz\) where \(p(\bz) \in \P_\bz\) describes a probability distribution over \(\bz\). Next, consider a family of conditional distributions \(\P_{\bx\giv \bz}\) where \(p_\theta(\bx \giv \bz) \in \P_{\bx\giv \bz}\) describes a conditional probability distribution over \(\bx\) given \(\bz\). Then our hypothesis class of generative models is the set of all possible combinations</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \P_{\bx,\bz} = \set{p(\bx, \bz) \giv p(\bz) \in \P_\bz, p(\bx \giv \bz) \in \P_{\bx\giv\bz}}.
 \end{align}
 </script></div>
-<p>Given a dataset <script type="math/tex">\D = \set{\bx^{(1)}, \ldots, \bx^{(n)}}</script>, we are interested in the following learning and inference tasks</p>
+<p>Given a dataset \(\D = \set{\bx^{(1)}, \ldots, \bx^{(n)}}\), we are interested in the following learning and inference tasks</p>
 <ul>
-  <li>Selecting <script type="math/tex">p \in \P_{\bx,\bz}</script> that “best” fits <script type="math/tex">\D</script>.</li>
-  <li>Given a sample <script type="math/tex">\bx</script> and a model <script type="math/tex">p \in \P_{\bx,\bz}</script>, what is the posterior distribution over the latent variables <script type="math/tex">\bz</script>?
+  <li>Selecting \(p \in \P_{\bx,\bz}\) that “best” fits \(\D\).</li>
+  <li>Given a sample \(\bx\) and a model \(p \in \P_{\bx,\bz}\), what is the posterior distribution over the latent variables \(\bz\)?
 <!-- - Approximate marginal inference of $$\bx$$: given partial access to certain dimensions of the vector $$\bx$$, how do we impute the missing parts? --></li>
 </ul>
 
@@ -149,7 +149,7 @@ <h1 id="representation">Representation</h1>
 
 <h1 id="learning-directed-latent-variable-models">Learning Directed Latent Variable Models</h1>
 
-<p>One way to measure how closely <script type="math/tex">p(\bx, \bz)</script> fits the observed dataset <script type="math/tex">\D</script> is to measure the Kullback-Leibler (KL) divergence between the data distribution (which we denote as <script type="math/tex">p_{\mathrm{data}}(\bx)</script>) and the model’s marginal distribution <script type="math/tex">p(\bx) = \int p(\bx, \bz) \d \bz</script>. The distribution that ``best’’ fits the data is thus obtained by minimizing the KL divergence.</p>
+<p>One way to measure how closely \(p(\bx, \bz)\) fits the observed dataset \(\D\) is to measure the Kullback-Leibler (KL) divergence between the data distribution (which we denote as \(p_{\mathrm{data}}(\bx)\)) and the model’s marginal distribution \(p(\bx) = \int p(\bx, \bz) \d \bz\). The distribution that ``best’’ fits the data is thus obtained by minimizing the KL divergence.</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
@@ -157,14 +157,14 @@ <h1 id="learning-directed-latent-variable-models">Learning Directed Latent Varia
 \end{align}
 </script></div>
 
-<p>As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood <script type="math/tex">\log p(\bx)</script> over <script type="math/tex">\D</script></p>
+<p>As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood \(\log p(\bx)\) over \(\D\)</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \max_{p \in \P_{\bx, \bz}} \sum_{\bx \in \D} \log p(\bx) = \sum_{\bx \in \D} \log\int p(\bx, \bz) \d \bz.
 \end{align}
 </script></div>
 
-<p>However, it turns out this problem is generally intractable for high-dimensional <script type="math/tex">\bz</script> as it involves an integration (or sums in the case <script type="math/tex">\bz</script> is discrete) over all the possible latent sources of variation <script type="math/tex">\bz</script>. One option is to estimate the objective via Monte Carlo. For any given datapoint <script type="math/tex">\bf x</script>, we can obtain the following estimate for its marginal log-likelihood</p>
+<p>However, it turns out this problem is generally intractable for high-dimensional \(\bz\) as it involves an integration (or sums in the case \(\bz\) is discrete) over all the possible latent sources of variation \(\bz\). One option is to estimate the objective via Monte Carlo. For any given datapoint \(\bf x\), we can obtain the following estimate for its marginal log-likelihood</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \log p(\bx) \approx \log \frac{1}{k} \sum_{i=1}^k p(\bx \vert \bz^{(i)}) \text{, where } \bz^{(i)} \sim p(\bz)
@@ -172,11 +172,11 @@ <h1 id="learning-directed-latent-variable-models">Learning Directed Latent Varia
 
 <p>In practice however, optimizing the above estimate suffers from high variance in gradient estimates.</p>
 
-<p>Rather than maximizing the log-likelihood directly, an alternate is to instead construct a lower bound that is more amenable to optimization. To do so, we note that evaluating the marginal likelihood <script type="math/tex">p(\bx)</script> is at least as difficult as as evaluating the posterior <script type="math/tex">p(\bz \mid \bx)</script> for any latent vector <script type="math/tex">\bz</script> since by definition <script type="math/tex">p(\bz \mid \bx) = p(\bx, \bz) / p(\bx)</script>.</p>
+<p>Rather than maximizing the log-likelihood directly, an alternate is to instead construct a lower bound that is more amenable to optimization. To do so, we note that evaluating the marginal likelihood \(p(\bx)\) is at least as difficult as as evaluating the posterior \(p(\bz \mid \bx)\) for any latent vector \(\bz\) since by definition \(p(\bz \mid \bx) = p(\bx, \bz) / p(\bx)\).</p>
 
-<p>Next, we introduce a variational family <script type="math/tex">\Q</script> of distributions that approximate the true, but intractable posterior <script type="math/tex">p(\bz \mid \bx)</script>. Further henceforth, we will assume a parameteric setting where any distribution in the model family <script type="math/tex">\P_{\bx, \bz}</script> is specified via a set of parameters <script type="math/tex">\theta \in \Theta</script> and distributions in the variational family <script type="math/tex">\Q</script> are specified via a set of parameters <script type="math/tex">\lambda \in \Lambda</script>.</p>
+<p>Next, we introduce a variational family \(\Q\) of distributions that approximate the true, but intractable posterior \(p(\bz \mid \bx)\). Further henceforth, we will assume a parameteric setting where any distribution in the model family \(\P_{\bx, \bz}\) is specified via a set of parameters \(\theta \in \Theta\) and distributions in the variational family \(\Q\) are specified via a set of parameters \(\lambda \in \Lambda\).</p>
 
-<p>Given <script type="math/tex">\P_{\bx, \bz}</script> and <script type="math/tex">\Q</script>, we note that the following relationships hold true<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> for any <script type="math/tex">\bx</script> and all variational distributions <script type="math/tex">q_\lambda(\bz) \in \Q</script></p>
+<p>Given \(\P_{\bx, \bz}\) and \(\Q\), we note that the following relationships hold true<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> for any \(\bx\) and all variational distributions \(q_\lambda(\bz) \in \Q\)</p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
@@ -193,9 +193,9 @@ <h1 id="learning-directed-latent-variable-models">Learning Directed Latent Varia
 \frac{1}{k} \sum_{i=1}^k \log \frac{p_\theta(\bx, \bz^{(i)})}{q_\lambda(\bz^{(i)})} \text{, where } \bz^{(i)} \sim q_\lambda(\bz),
 \end{align}
 </script></div>
-<p>so long as it is easy to sample from and evaluate densities for <script type="math/tex">q_\lambda(\bz)</script>.</p>
+<p>so long as it is easy to sample from and evaluate densities for \(q_\lambda(\bz)\).</p>
 
-<p>Which variational distribution should we pick? Even though the above derivation holds for any choice of variational parameters <script type="math/tex">\lambda</script>, the tightness of the lower bound depends on the specific choice of <script type="math/tex">q</script>.</p>
+<p>Which variational distribution should we pick? Even though the above derivation holds for any choice of variational parameters \(\lambda\), the tightness of the lower bound depends on the specific choice of \(q\).</p>
 
 <figure>
 <img src="klgap.png" alt="drawing" width="400" class="center" />
@@ -204,9 +204,9 @@ <h1 id="learning-directed-latent-variable-models">Learning Directed Latent Varia
  </figcaption>
 </figure>
 
-<p>In particular, the gap between the original objective(marginal log-likelihood <script type="math/tex">\log p_\theta(\bx)</script>) and the ELBO equals the KL divergence between the approximate posterior <script type="math/tex">q(\bz)</script> and the true posterior <script type="math/tex">p(\bz \giv \bx)</script>. The gap is zero when the variational distribution <script type="math/tex">q_\lambda(\bz)</script> exactly matches <script type="math/tex">p_\theta(\bz \giv \bx)</script>.</p>
+<p>In particular, the gap between the original objective(marginal log-likelihood \(\log p_\theta(\bx)\)) and the ELBO equals the KL divergence between the approximate posterior \(q(\bz)\) and the true posterior \(p(\bz \giv \bx)\). The gap is zero when the variational distribution \(q_\lambda(\bz)\) exactly matches \(p_\theta(\bz \giv \bx)\).</p>
 
-<p>In summary, we can learn a latent variable model by maximizing the ELBO with respect to both the model parameters <script type="math/tex">\theta</script> and the variational parameters <script type="math/tex">\lambda</script> for any given datapoint <script type="math/tex">\bx</script></p>
+<p>In summary, we can learn a latent variable model by maximizing the ELBO with respect to both the model parameters \(\theta\) and the variational parameters \(\lambda\) for any given datapoint \(\bx\)</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \max_{\theta} \sum_{\bx \in \D} \max_{\lambda} \Expect_{q_\lambda(\bz)} \left[\log \frac{p_\theta(\bx, \bz)}{q_\lambda(\bz)}\right].
@@ -217,17 +217,17 @@ <h1 id="black-box-variational-inference">Black-Box Variational Inference</h1>
 
 <p>In this post, we shall focus on first-order stochastic gradient methods for optimizing the ELBO. These optimization techniques are desirable in that they allow us to sub-sample the dataset during optimization—but require our objective function to be differentiable with respect to the optimization variables. 
 <!-- As such, we shall posit for now that any $$p(\bx, \bz) \in \P_{\bx, \bz}$$ and $$q(\bz) \in \Q$$ are alternatively parameterizable as $$p_\theta(\bx, \bz)$$ and $$q_\lambda(\bz)$$ and that these distributions are differentiable with respect to $$\theta$$ and $$\lambda$$. -->
-This inspires Black-Box Variational Inference (BBVI), a general-purpose Expectation-Maximization-like algorithm for variational learning of latent variable models, where, for each mini-batch <script type="math/tex">\M = \set{\bx^{(1)}, \ldots, \bx^{(m)}}</script>, the following two steps are performed.</p>
+This inspires Black-Box Variational Inference (BBVI), a general-purpose Expectation-Maximization-like algorithm for variational learning of latent variable models, where, for each mini-batch \(\M = \set{\bx^{(1)}, \ldots, \bx^{(m)}}\), the following two steps are performed.</p>
 
 <p><strong>Step 1</strong></p>
 
-<p>We first do <em>per-sample</em> optimization of <script type="math/tex">q</script> by iteratively applying the update</p>
+<p>We first do <em>per-sample</em> optimization of \(q\) by iteratively applying the update</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \lambda^{(i)} \gets \lambda^{(i)} + \tilde{\nabla}_\lambda \ELBO(\bx^{(i)}; \theta, \lambda^{(i)}),
 \end{align}
 </script></div>
-<p>where <script type="math/tex">\text{ELBO}(\bx; \theta, \lambda) = \Expect_{q_\lambda(\bz)} \left[\log \frac{p_\theta(\bx, \bz)}{q_\lambda(\bz)}\right]</script>, and <script type="math/tex">\tilde{\nabla}_\lambda</script> denotes an unbiased estimate of the ELBO gradient. This step seeks to approximate the log-likelihood <script type="math/tex">\log p_\theta(\bx^{(i)})</script>.</p>
+<p>where \(\text{ELBO}(\bx; \theta, \lambda) = \Expect_{q_\lambda(\bz)} \left[\log \frac{p_\theta(\bx, \bz)}{q_\lambda(\bz)}\right]\), and \(\tilde{\nabla}_\lambda\) denotes an unbiased estimate of the ELBO gradient. This step seeks to approximate the log-likelihood \(\log p_\theta(\bx^{(i)})\).</p>
 
 <p><strong>Step 2</strong></p>
 
@@ -237,30 +237,30 @@ <h1 id="black-box-variational-inference">Black-Box Variational Inference</h1>
 \theta \gets \theta + \tilde{\nabla}_\theta \sum_{i} \ELBO(\bx^{(i)}; \theta, \lambda^{(i)}),
 \end{align}
 </script></div>
-<p>which corresponds to the step that hopefully moves <script type="math/tex">p_\theta</script> closer to <script type="math/tex">p_{\mathrm{data}}</script>.</p>
+<p>which corresponds to the step that hopefully moves \(p_\theta\) closer to \(p_{\mathrm{data}}\).</p>
 
 <h1 id="gradient-estimation">Gradient Estimation</h1>
 
-<p>The gradients <script type="math/tex">\nabla_\lambda \ELBO</script> and <script type="math/tex">\nabla_\theta \ELBO</script> can be estimated via Monte Carlo sampling. While it is straightforward to construct an unbiased estimate of <script type="math/tex">\nabla_\theta \ELBO</script> by simply pushing <script type="math/tex">\nabla_\theta</script> through the expectation operator, the same cannot be said for <script type="math/tex">\nabla_\lambda</script>. Instead, we see that</p>
+<p>The gradients \(\nabla_\lambda \ELBO\) and \(\nabla_\theta \ELBO\) can be estimated via Monte Carlo sampling. While it is straightforward to construct an unbiased estimate of \(\nabla_\theta \ELBO\) by simply pushing \(\nabla_\theta\) through the expectation operator, the same cannot be said for \(\nabla_\lambda\). Instead, we see that</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \nabla_\lambda \Expect_{q_\lambda(\bz)} \left[\log \frac{p_\theta(\bx, \bz)}{q_\lambda(\bz)} \right]= \Expect_{q_\lambda(\bz)} \brac{\paren{\log \frac{p_\theta(\bx, \bz)}{q_\lambda(\bz)}} \cdot \nabla_\lambda \log q_\lambda(\bz)}.
 \end{align}
 </script></div>
-<p>This equality follows from the log-derivative trick (also commonly referred to as the REINFORCE trick). The full derivation involves some simple algebraic manipulations and is left as an exercise for the reader. The gradient estimator <script type="math/tex">\tilde{\nabla}_\lambda \ELBO</script> is thus</p>
+<p>This equality follows from the log-derivative trick (also commonly referred to as the REINFORCE trick). The full derivation involves some simple algebraic manipulations and is left as an exercise for the reader. The gradient estimator \(\tilde{\nabla}_\lambda \ELBO\) is thus</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \frac{1}{k}\sum_{i=1}^k \brac{\paren{\log \frac{p_\theta(\bx, \bz^{(i)})}{q_\lambda(\bz^{(i)})}} \cdot \nabla_\lambda \log q_\lambda(\bz^{(i)})} \text{, where } \bz^{(i)} \sim q_\lambda(\bz).
 \end{align}
 </script></div>
-<p>However, it is often noted that this estimator suffers from high variance. One of the key contributions of the variational autoencoder paper is the reparameterization trick, which introduces a fixed, auxiliary distribution <script type="math/tex">p(\veps)</script> and a differentiable function <script type="math/tex">T(\veps; \lambda)</script> such that the procedure</p>
+<p>However, it is often noted that this estimator suffers from high variance. One of the key contributions of the variational autoencoder paper is the reparameterization trick, which introduces a fixed, auxiliary distribution \(p(\veps)\) and a differentiable function \(T(\veps; \lambda)\) such that the procedure</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \veps &\sim p(\veps)\\
 \bz &\gets T(\veps; \lambda),
 \end{align}
 </script></div>
-<p>is equivalent to sampling from <script type="math/tex">q_\lambda(\bz)</script>. By the <a href="https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician">Law of the Unconscious Statistician</a>, we can see that</p>
+<p>is equivalent to sampling from \(q_\lambda(\bz)\). By the <a href="https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician">Law of the Unconscious Statistician</a>, we can see that</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \nabla_\lambda \Expect_{q_\lambda(\bz)} \left[\log \frac{p_\theta(\bx, \bz)}{q_\lambda(\bz)}\right] = \Expect_{p(\veps)} \left[\nabla_\lambda \log \frac{p_\theta(\bx, T(\veps; \lambda))}{q_\lambda(T(\veps; \lambda))}\right].
@@ -271,60 +271,60 @@ <h1 id="gradient-estimation">Gradient Estimation</h1>
 
 <h1 id="parameterizing-distributions-via-deep-neural-networks">Parameterizing Distributions via Deep Neural Networks</h1>
 
-<p>So far, we have described <script type="math/tex">p_\theta(\bx, \bz)</script> and <script type="math/tex">q_\lambda(\bz)</script> in the abstract. To instantiate these objects, we consider choices of parametric distributions for <script type="math/tex">p_\theta(\bz)</script>, <script type="math/tex">p_\theta(\bx \giv \bz)</script>, and <script type="math/tex">q_\lambda(\bz)</script>. A popular choice for <script type="math/tex">p_\theta(\bz)</script> is the unit Gaussian</p>
+<p>So far, we have described \(p_\theta(\bx, \bz)\) and \(q_\lambda(\bz)\) in the abstract. To instantiate these objects, we consider choices of parametric distributions for \(p_\theta(\bz)\), \(p_\theta(\bx \giv \bz)\), and \(q_\lambda(\bz)\). A popular choice for \(p_\theta(\bz)\) is the unit Gaussian</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 p_\theta(\bz) = \Normal(\bz \giv \0, \I).
 \end{align}
 </script></div>
-<p>in which case <script type="math/tex">\theta</script> is simply the empty set since the prior is a fixed distribution. Another alternative often used in practice is a mixture of Gaussians with trainable mean and covariance parameters.</p>
+<p>in which case \(\theta\) is simply the empty set since the prior is a fixed distribution. Another alternative often used in practice is a mixture of Gaussians with trainable mean and covariance parameters.</p>
 
-<p>The conditional distribution <script type="math/tex">p_\theta(\bx \giv \bz)</script> is where we introduce a deep neural network. We note that a conditional distribution can be constructed by defining a distribution family (parameterized by <script type="math/tex">\omega \in \Omega</script>) in the target space <script type="math/tex">\bx</script> (i.e. <script type="math/tex">p_\omega(\bx)</script> defines an unconditional distribution over <script type="math/tex">\bx</script>) and a mapping function <script type="math/tex">g_\theta: \Z \to \Omega</script>. 
+<p>The conditional distribution \(p_\theta(\bx \giv \bz)\) is where we introduce a deep neural network. We note that a conditional distribution can be constructed by defining a distribution family (parameterized by \(\omega \in \Omega\)) in the target space \(\bx\) (i.e. \(p_\omega(\bx)\) defines an unconditional distribution over \(\bx\)) and a mapping function \(g_\theta: \Z \to \Omega\). 
 <!-- It is natural to call $$g_\theta$$ the decoder that is parameterized by $$\theta$$. The act of conditioning on $$\bz$$ is thus equivalent to using the choice of $$\omega = g(\bz)$$. --> 
-In other words, <script type="math/tex">g_\theta(\cdot)</script> defines the conditional distribution</p>
+In other words, \(g_\theta(\cdot)\) defines the conditional distribution</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
     p_\theta(\bx \giv \bz) = p_\omega(\bx) \text{ , where } \omega = g_\theta(\bz).
 \end{align}
 </script></div>
-<p>The function <script type="math/tex">g_\theta</script> is also referred to as the decoding distribution since it maps a latent <em>code</em> <script type="math/tex">\bz</script> to the parameters of a distribution over observed variables <script type="math/tex">\bx</script>. In practice, it is typical to specify <script type="math/tex">g_\theta</script> as a deep neural network.<br />
+<p>The function \(g_\theta\) is also referred to as the decoding distribution since it maps a latent <em>code</em> \(\bz\) to the parameters of a distribution over observed variables \(\bx\). In practice, it is typical to specify \(g_\theta\) as a deep neural network.<br />
 <!-- The generative model $$p_\theta(\bx, \bz)$$ is called a *deep* generative model since we will be using a neural network to instantiate the function $$g_\theta$$.  -->
-In the case where <script type="math/tex">p_\theta(\bx \giv \bz)</script> is a Gaussian distribution, we can thus represent it as</p>
+In the case where \(p_\theta(\bx \giv \bz)\) is a Gaussian distribution, we can thus represent it as</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
     p_\theta(\bx \giv \bz) = \Normal(\bx \giv \mu_\theta(\bz), \Sigma_\theta(\bz)),
 \end{align}
 </script></div>
-<p>where <script type="math/tex">\mu_\theta(\bz)</script> and <script type="math/tex">\Sigma_\theta(\bz)</script> are neural networks that specify the mean and covariance matrix for the Gaussian distribution over <script type="math/tex">\bx</script> when conditioned on <script type="math/tex">\bz</script>.</p>
+<p>where \(\mu_\theta(\bz)\) and \(\Sigma_\theta(\bz)\) are neural networks that specify the mean and covariance matrix for the Gaussian distribution over \(\bx\) when conditioned on \(\bz\).</p>
 
-<p>Finally, the variational family for the proposal distribution <script type="math/tex">q_\lambda(\bz)</script> needs to be chosen judiciously so that the reparameterization trick is possible. Many continuous distributions in the <a href="https://en.wikipedia.org/wiki/Location%E2%80%93scale_family">location-scale family</a> can be reparameterized. In practice, a popular choice is again the Gaussian distribution, where</p>
+<p>Finally, the variational family for the proposal distribution \(q_\lambda(\bz)\) needs to be chosen judiciously so that the reparameterization trick is possible. Many continuous distributions in the <a href="https://en.wikipedia.org/wiki/Location%E2%80%93scale_family">location-scale family</a> can be reparameterized. In practice, a popular choice is again the Gaussian distribution, where</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
     \lambda &= (\mu, \Sigma) \\
     q_\lambda(\bz) &= \Normal(\bz \giv \mu, \Sigma)\\
-    p(\veps) &= \Normal(\bz \giv \0, \I) \\
+    p(\veps) &= \Normal(\veps \giv \0, \I) \\
     T(\veps; \lambda) &= \mu + \Sigma^{1/2}\veps,
 \end{align}
 </script></div>
-<p>where <script type="math/tex">\Sigma^{1/2}</script> is the Cholesky decomposition of <script type="math/tex">\Sigma</script>. For simplicity, practitioners often restrict <script type="math/tex">\Sigma</script> to be a diagonal matrix (which restricts the distribution family to that of factorized Gaussians).</p>
+<p>where \(\Sigma^{1/2}\) is the Cholesky decomposition of \(\Sigma\). For simplicity, practitioners often restrict \(\Sigma\) to be a diagonal matrix (which restricts the distribution family to that of factorized Gaussians).</p>
 
 <h1 id="amortized-variational-inference">Amortized Variational Inference</h1>
 
-<p>A noticable limitation of black-box variational inference is that <strong>Step 1</strong> executes an optimization subroutine that is computationally expensive. Recall that the goal of the <strong>Step 1</strong> is to find</p>
+<p>A noticeable limitation of black-box variational inference is that <strong>Step 1</strong> executes an optimization subroutine that is computationally expensive. Recall that the goal of the <strong>Step 1</strong> is to find</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
     \lambda^* = \argmax_{\lambda\in \Lambda} \ELBO(\bx; \theta, \lambda).
 \end{align}
 </script></div>
-<p>For a given choice of <script type="math/tex">\theta</script>, there is a well-defined mapping from <script type="math/tex">\bx \mapsto \lambda^\ast</script>. A key realization is that this mapping can be <em>learned</em>. In particular, one can train an encoding function (parameterized by <script type="math/tex">\phi</script>) <script type="math/tex">f_\phi: \X \to \Lambda</script> 
-(where <script type="math/tex">\Lambda</script> is the space of <script type="math/tex">\lambda</script> parameters) 
+<p>For a given choice of \(\theta\), there is a well-defined mapping from \(\bx \mapsto \lambda^\ast\). A key realization is that this mapping can be <em>learned</em>. In particular, one can train an encoding function (parameterized by \(\phi\)) \(f_\phi: \X \to \Lambda\) 
+(where \(\Lambda\) is the space of \(\lambda\) parameters) 
 on the following objective</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
     \max_{\phi } \sum_{\bx \in \D} \ELBO(\bx; \theta, f_\phi(\bx)).
 \end{align}
 </script></div>
-<p>It is worth noting at this point that <script type="math/tex">f_\phi(\bx)</script> can be interpreted as defining the conditional distribution <script type="math/tex">q_\phi(\bz \giv \bx)</script>. With a slight abuse of notation, we define</p>
+<p>It is worth noting at this point that \(f_\phi(\bx)\) can be interpreted as defining the conditional distribution \(q_\phi(\bz \giv \bx)\). With a slight abuse of notation, we define</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
     \ELBO(\bx; \theta, \phi) = \Expect_{q_\phi(\bz \mid \bx)} \left[\log \frac{p_\theta(\bx, \bz)}{q_\phi(\bz \giv \bx)}\right].
@@ -336,20 +336,20 @@ <h1 id="amortized-variational-inference">Amortized Variational Inference</h1>
     \max_{\phi } \sum_{\bx \in \D} \ELBO(\bx; \theta, \phi).
 \end{align}
 </script></div>
-<p>It is also worth noting that optimizing <script type="math/tex">\phi</script> over the entire dataset as a <em>subroutine</em> everytime we sample a new mini-batch is clearly not reasonable. However, if we believe that <script type="math/tex">f_\phi</script> is capable of quickly adapting to a close-enough approximation of <script type="math/tex">\lambda^\ast</script> given the current choice of <script type="math/tex">\theta</script>, then we can interleave the optimization <script type="math/tex">\phi</script> and <script type="math/tex">\theta</script>. The yields the following procedure, where for each mini-batch <script type="math/tex">\M = \set{\bx^{(1)}, \ldots, \bx^{(m)}}</script>, we perform the following two updates jointly</p>
+<p>It is also worth noting that optimizing \(\phi\) over the entire dataset as a <em>subroutine</em> every time we sample a new mini-batch is clearly not reasonable. However, if we believe that \(f_\phi\) is capable of quickly adapting to a close-enough approximation of \(\lambda^\ast\) given the current choice of \(\theta\), then we can interleave the optimization \(\phi\) and \(\theta\). This yields the following procedure, where for each mini-batch \(\M = \set{\bx^{(1)}, \ldots, \bx^{(m)}}\), we perform the following two updates jointly</p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
     \phi &\gets \phi + \tilde{\nabla}_\phi \sum_{\bx \in \M} \ELBO(\bx; \theta, \phi) \\
     \theta &\gets \theta + \tilde{\nabla}_\theta \sum_{\bx \in \M} \ELBO(\bx; \theta, \phi),
 \end{align}
 </script></div>
-<p>rather than running BBVI’s <strong>Step 1</strong> as a subroutine. By leveraging the learnability of <script type="math/tex">\bx \mapsto \lambda^\ast</script>, this optimization procedure amortizes the cost of variational inference. If one further chooses to define <script type="math/tex">f_\phi</script> as a neural network, the result is the variational autoencoder.</p>
+<p>rather than running BBVI’s <strong>Step 1</strong> as a subroutine. By leveraging the learnability of \(\bx \mapsto \lambda^\ast\), this optimization procedure amortizes the cost of variational inference. If one further chooses to define \(f_\phi\) as a neural network, the result is the variational autoencoder.</p>
 
 <h1 id="footnotes">Footnotes</h1>
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
   <ol>
     <li id="fn:1">
-      <p>The first equality only holds if the support of <script type="math/tex">q</script> includes that of <script type="math/tex">p</script>. If not, it is an inequality. <a href="#fnref:1" class="reversefootnote">&#8617;</a></p>
+      <p>The first equality only holds if the support of \(q\) includes that of \(p\). If not, it is an inequality. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
     </li>
   </ol>
 </div>
@@ -385,8 +385,8 @@ <h1 id="footnotes">Footnotes</h1>
   <!--      -->
   <!-- </ul> -->
 <div class="credits">
-<!-- <span>&#38;copy; 2018 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
-<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2018</span> 
+<!-- <span>&#38;copy; 2025 <!&#45;&#45; &#38;#38;nbsp;&#38;#38;nbsp;ADITYA GROVER &#45;&#45;></span></br> <br> -->
+<span>Site created with <a href="//jekyllrb.com">Jekyll</a> using the <a href="//github.com/clayh53/tufte-jekyll">Tufte theme</a>. &copy; 2025</span> 
 </div>  
 </footer>