updates up to db-vae

avaamini · avaamini · commit daec4f7a182a · 2023-01-08T13:57:25.000-05:00
diff --git a/lab2/solutions/Part2_Debiasing_Solution.ipynb b/lab2/solutions/Part2_Debiasing_Solution.ipynb
@@ -370,15 +370,15 @@
       "source": [
         "## 2.4 Variational autoencoder (VAE) for learning latent structure\n",
         "\n",
-        "As you saw, the accuracy of the CNN varies across the four demographics we looked at. To think about why this may be, consider the dataset the model was trained on, CelebA. If certain features, such as dark skin or hats, are *rare* in CelebA, the model may end up biased against these as a result of training with a biased dataset. That is to say, its classification accuracy will be worse on faces that have under-represented features, such as dark-skinned faces or faces with hats, relevative to faces with features well-represented in the training data! This is a problem. \n",
+        "The accuracy of facial detection classifiers can vary significantly across different demographics. Consider the dataset the CNN model was trained on, CelebA. If certain features, such as dark skin or hats, are *rare* in CelebA, the model may end up biased against these as a result of training with a biased dataset. That is to say, its classification accuracy will be worse on faces that have under-represented features, such as dark-skinned faces or faces with hats, relevative to faces with features well-represented in the training data! This is a problem.\n",
         "\n",
-        "Our goal is to train a *debiased* version of this classifier -- one that accounts for potential disparities in feature representation within the training data. Specifically, to build a debiased facial classifier, we'll train a model that **learns a representation of the underlying latent space** to the face training data. The model then uses this information to mitigate unwanted biases by sampling faces with rare features, like dark skin or hats, *more frequently* during training. The key design requirement for our model is that it can learn an *encoding* of the latent features in the face data in an entirely *unsupervised* way. To achieve this, we'll turn to variational autoencoders (VAEs).\n",
+        "Our goal is to train a model that **learns a representation of the underlying latent space** to the face training data. Such a learned representation will provide information on what features are under-represented or over-represented in the data. The key design requirement for our model is that it can learn an *encoding* of the latent features in the face data in an entirely *unsupervised* way, without any supervised annotation by us humans. To achieve this, we turn to variational autoencoders (VAEs).\n",
         "\n",
         "![The concept of a VAE](https://i.ibb.co/3s4S6Gc/vae.jpg)\n",
         "\n",
         "As shown in the schematic above and in Lecture 4, VAEs rely on an encoder-decoder structure to learn a latent representation of the input data. In the context of computer vision, the encoder network takes in input images, encodes them into a series of variables defined by a mean and standard deviation, and then draws from the distributions defined by these parameters to generate a set of sampled latent variables. The decoder network then \"decodes\" these variables to generate a reconstruction of the original image, which is used during training to help the model identify which latent variables are important to learn. \n",
         "\n",
-        "Let's formalize two key aspects of the VAE model and define relevant functions for each.\n"
+        "Let's formalize two key aspects of the VAE model and define relevant functions for each."
       ]
     },
     {
@@ -456,24 +456,15 @@
         "  return vae_loss"
       ]
     },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "E8mpb3pJorpu"
-      },
-      "source": [
-        "Great! Now that we have a more concrete sense of how VAEs work, let's explore how we can leverage this network structure to train a *debiased* facial classifier."
-      ]
-    },
     {
       "cell_type": "markdown",
       "metadata": {
         "id": "DqtQH4S5fO8F"
       },
       "source": [
-        "### Understanding VAEs: reparameterization \n",
+        "### Understanding VAEs: sampling and reparameterization \n",
         "\n",
-        "As you may recall from lecture, VAEs use a \"reparameterization  trick\" for sampling learned latent variables. Instead of the VAE encoder generating a single vector of real numbers for each latent variable, it generates a vector of means and a vector of standard deviations that are constrained to roughly follow Gaussian distributions. We then sample from the standard deviations and add back the mean to output this as our sampled latent vector. Formalizing this for a latent variable $z$ where we sample $\\epsilon \\sim \\mathcal{N}(0,(I))$ we have:\n",
+        "As you may recall from lecture, VAEs use a \"reparameterization  trick\" for sampling learned latent variables. Instead of the VAE encoder generating a single vector of real numbers for each latent variable, it generates a vector of means and a vector of standard deviations that are constrained to roughly follow Gaussian distributions. We then sample a noise value $\\epsilon$ from a Gaussian distribution, and then scale it by the standard deviation and add back the mean to output the result as our sampled latent vector. Formalizing this for a latent variable $z$ where we sample $\\epsilon \\sim N(0,(I))$ we have:\n",
         "\n",
         "$$z = \\mu + e^{\\left(\\frac{1}{2} \\cdot \\log{\\Sigma}\\right)}\\circ \\epsilon$$\n",
         "\n",
@@ -490,9 +481,9 @@
       },
       "outputs": [],
       "source": [
-        "### VAE Reparameterization ###\n",
+        "### VAE Sampling ###\n",
         "\n",
-        "\"\"\"Reparameterization trick by sampling from an isotropic unit Gaussian.\n",
+        "\"\"\"Sample latent variables via reparameterization with an isotropic unit Gaussian.\n",
         "# Arguments\n",
         "    z_mean, z_logsigma (tensor): mean and log of standard deviation of latent distribution (Q(z|X))\n",
         "# Returns\n",
@@ -510,6 +501,15 @@
         "  return z"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Great! Now that we have a more concrete sense of how VAEs work, let's explore how we can leverage this network structure to diagnoses hidden biases in facial detection classifiers."
+      ],
+      "metadata": {
+        "id": "bcpznUHHuR6I"
+      }
+    },
     {
       "cell_type": "markdown",
       "metadata": {