complete (near final) text and structure

avaamini · avaamini · commit 7940aaccfa96 · 2023-01-11T00:23:38.000-05:00
diff --git a/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb b/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb
@@ -50,7 +50,7 @@
         "\n",
         "In this lab, we'll explore different ways to make deep learning models more **robust** and **trustworthy**.\n",
         "\n",
-        "To achieve this it is critical to be able to identify and diagnose issues of bias and uncertainty in deep learning models, as we explored in the Facial Detection Lab 2. We need benchmarks that uniformly measure how uncertain a given model is, and we need principled ways of measuring bias and uncertainty. To that end, in this lab, we'll utilize [CAPSA](https://github.com/themis-ai/capsa), a risk-estimation wrapping library developed by [Themis AI](https://themisai.io/). CAPSA supports the estimation of three different types of ***risk***, defined as measures of how robust and trustworthy our model is. These are:\n",
+        "To achieve this it is critical to be able to identify and diagnose issues of bias and uncertainty in deep learning models, as we explored in the Facial Detection Lab 2. We need benchmarks that uniformly measure how uncertain a given model is, and we need principled ways of measuring bias and uncertainty. To that end, in this lab, we'll utilize [Capsa](https://github.com/themis-ai/capsa), a risk-estimation wrapping library developed by [Themis AI](https://themisai.io/). Capsa supports the estimation of three different types of ***risk***, defined as measures of how robust and trustworthy our model is. These are:\n",
         "1. **Representation bias**: reflects how likely combinations of features are to appear in a given dataset. Often, certain combinations of features are severely under-represented in datasets, which means models learn them less well and can thus lead to unwanted bias.\n",
         "2. **Data uncertainty**: reflects noise in the data, for example when sensors have noisy measurements, classes in datasets have low separations, and generally when very similar inputs lead to drastically different outputs. Also known as *aleatoric* uncertainty. \n",
         "3. **Model uncertainty**: captures the areas of our underlying data distribution that the model has not yet learned or has difficulty learning. Areas of high model uncertainty can be due to out-of-distribution (OOD) samples or data that is harder to learn. Also known as *epistemic* uncertainty."
@@ -64,13 +64,13 @@
       "source": [
         "## CAPSA overview\n",
         "\n",
-        "This lab introduces `CAPSA` and its functionalities, to next build automated tools that use `CAPSA` to mitigate the underlying issues of bias and uncertainty.\n",
+        "This lab introduces Capsa and its functionalities, to next build automated tools that use Capsa to mitigate the underlying issues of bias and uncertainty.\n",
         "\n",
-        "The core idea behind `CAPSA` is that any deep learning model of interest can be ***wrapped*** -- just like wrapping a gift -- to be made ***aware of its own risks***. Risk is captured in representation bias, data uncertainty, and model uncertainty.\n",
+        "The core idea behind Capsa is that any deep learning model of interest can be ***wrapped*** -- just like wrapping a gift -- to be made ***aware of its own risks***. Risk is captured in representation bias, data uncertainty, and model uncertainty.\n",
         "\n",
         "![alt text](https://raw.githubusercontent.com/aamini/introtodeeplearning/2023/lab3/img/capsa_overview.png)\n",
         "\n",
-        "This means that `CAPSA` takes the user's original model as input, and modifies it minimally to create a risk-aware variant while preserving the model's underlying structure and training pipeline. `CAPSA` is a one-line addition to any training workflow in TensorFlow. In this part of the lab, we'll apply `CAPSA`'s risk estimation methods to a simple regression problem to further explore the notions of bias and uncertainty. "
+        "This means that Capsa takes the user's original model as input, and modifies it minimally to create a risk-aware variant while preserving the model's underlying structure and training pipeline. Capsa is a one-line addition to any training workflow in TensorFlow. In this part of the lab, we'll apply Capsa's risk estimation methods to a simple regression problem to further explore the notions of bias and uncertainty. "
       ]
     },
     {
@@ -128,9 +128,9 @@
         "!pip install mitdeeplearning\n",
         "import mitdeeplearning as mdl\n",
         "\n",
-        "# Download and import CAPSA\n",
+        "# Download and import Capsa\n",
         "!pip install capsa\n",
-        "from capsa import *"
+        "import capsa"
       ]
     },
     {
@@ -317,8 +317,8 @@
         "\n",
         "Write short (~1 sentence) answers to the questions below to complete the `TODO`s:\n",
         "\n",
-        "1. Where does the model perform well? How does this relate to aleatoric and epistemic uncertainty?\n",
-        "2. Where does the model perform poorly? How does this relate to aleatoric and epistemic uncertainty?"
+        "1. Where does the model perform well?\n",
+        "2. Where does the model perform poorly?"
       ],
       "metadata": {
         "id": "7Vktjwfu0ReH"
@@ -334,9 +334,9 @@
         "\n",
         "Now that we've seen what the predictions from this model look like, we will identify and quantify bias and uncertainty in this problem. We first consider bias.\n",
         "\n",
-        "Recall that *representation bias* reflects how likely combinations of features are to appear in a given dataset. `Capsa` calculates how likely combinations of features are by using a histogram estimation approach: the `HistogramWrapper`. For low-dimensional data, the `HistogramWrapper` bins the input directly into discrete categories and measures the density. \n",
+        "Recall that *representation bias* reflects how likely combinations of features are to appear in a given dataset. Capsa calculates how likely combinations of features are by using a histogram estimation approach: the `capsa.HistogramWrapper`. For low-dimensional data, the `capsa.HistogramWrapper` bins the input directly into discrete categories and measures the density. \n",
         "\n",
-        "We start by taking our `dense_NN` and wrapping it with the `HistogramWrapper`:"
+        "We start by taking our `dense_NN` and wrapping it with the `capsa.HistogramWrapper`:"
       ]
     },
     {
@@ -350,11 +350,11 @@
         "### Wrap the dense network for bias estimation ###\n",
         "\n",
         "standard_dense_NN = create_dense_NN()\n",
-        "bias_wrapped_dense_NN = HistogramWrapper(\n",
+        "bias_wrapped_dense_NN = capsa.HistogramWrapper(\n",
         "    standard_dense_NN, # the original model\n",
         "    queue_size=2000, # how many samples to track\n",
         "    target_hidden_layer=False # for low-dimensional data, we can estimate densities directly from data\n",
-        "  ) \n"
+        "  )\n"
       ]
     },
     {
@@ -477,7 +477,7 @@
         "id": "_6iVeeqq0f_H"
       },
       "source": [
-        "We can now use our wrapped model to assess the bias for a given test input. With the wrapping capability, `Capsa` neatly allows us to output a *bias score* along with the predicted target value. This bias score reflects the density of data surrounding an input point -- the higher the score, the greater the data representation and density. The wrapped, risk-aware model outputs the predicted target and bias score after it is called!\n",
+        "We can now use our wrapped model to assess the bias for a given test input. With the wrapping capability, Capsa neatly allows us to output a *bias score* along with the predicted target value. This bias score reflects the density of data surrounding an input point -- the higher the score, the greater the data representation and density. The wrapped, risk-aware model outputs the predicted target and bias score after it is called!\n",
         "\n",
         "Let's see how it is done:"
       ]
@@ -554,9 +554,9 @@
         "\n",
         "As introduced in Lecture 5 on Robust & Trustworthy Deep Learning, in regression we can estimate aleatoric uncertainty by training the model to predict both a target value and a variance for every input. Because we estimate both a mean and variance for every input, this method is called Mean Variance Estimation (MVE). MVE involves modifying the output layer to predict both the mean and variance, and changing the loss to reflect the prediction likelihood.\n",
         "\n",
-        "`Capsa` automatically implements these changes for us: we can wrap a given model using `MVEWrapper` to use MVE to estimate aleatoric uncertainty. All we have to do is define the model and the loss function to evaluate its predictions!\n",
+        "Capsa automatically implements these changes for us: we can wrap a given model using `capsa.MVEWrapper` to use MVE to estimate aleatoric uncertainty. All we have to do is define the model and the loss function to evaluate its predictions!\n",
         "\n",
-        "Let's take our standard network, wrap it with `MVEWrapper`, build the wrapped model, and then train it for the regression task. Finally, we evaluate performance of the resulting model by quantifying the aleatoric uncertainty across the data space: "
+        "Let's take our standard network, wrap it with `capsa.MVEWrapper`, build the wrapped model, and then train it for the regression task. Finally, we evaluate performance of the resulting model by quantifying the aleatoric uncertainty across the data space: "
       ]
     },
     {
@@ -571,7 +571,7 @@
         "\n",
         "standard_dense_NN = create_dense_NN()\n",
         "# Wrap the dense network for aleatoric uncertainty estimation\n",
-        "mve_wrapped_NN = MVEWrapper(standard_dense_NN)\n",
+        "mve_wrapped_NN = capsa.MVEWrapper(standard_dense_NN)\n",
         "\n",
         "# Build the model for regression, defining the loss function and optimizer\n",
         "mve_wrapped_NN.compile(\n",
@@ -582,7 +582,7 @@
         "# Train the wrapped model for 30 epochs.\n",
         "loss_history_mve_wrap = mve_wrapped_NN.fit(x_train, y_train, epochs=30)\n",
         "\n",
-        "# Call the uncertainty-aware model to generate scores for the test data\n",
+        "# Call the uncertainty-aware model to generate outputs for the test data\n",
         "outputs = mve_wrapped_NN(x_test)\n",
         "# Capsa makes the aleatoric uncertainty an attribute of the prediction!\n",
         "aleatoric_unc = outputs.aleatoric\n",
@@ -612,8 +612,11 @@
         "id": "6FC5WPRT5lAb"
       },
       "source": [
-        "## 1.4 Epistemic Estimation\n",
-        "Finally, let's do the same thing but for epistemic estimation! In this example, we'll use ensembles, which essentially copy the model `N` times and average predictions across all runs for a more robust prediction, and also calculate the variance of the `N` runs. Feel free to play around with any of the epistemic methods shown in the github repository! Which methods perform the best? Why do you think this is?"
+        "# 1.5 Estimating model uncertainty\n",
+        "\n",
+        "Finally, we use Capsa for estimating the uncertainty underlying the model predictions -- the epistemic uncertainty. In this example, we'll use ensembles, which essentially copy the model `N` times and average predictions across all runs for a more robust prediction, and also calculate the variance of the `N` runs to estimate the uncertainty.\n",
+        "\n",
+        "Capsa provides a neat wrapper, `capsa.EnsembleWrapper`, to make an ensemble from an input model. Just like with aleatoric estimation, we can take our standard dense network model, wrap it with `capsa.EnsembleWrapper`, build the wrapped model, and then train it for the regression task. Finally, we evaluate the resulting model by quantifying the epistemic uncertainty on the test data:"
       ]
     },
     {
@@ -695,15 +698,29 @@
         }
       ],
       "source": [
-        "standard_classifier = create_standard_classifier()\n",
-        "ensemble_wrapper = EnsembleWrapper(standard_classifier, num_members=5)\n",
+        "### Estimating model uncertainty with Capsa wrapping ###\n",
+        "\n",
+        "standard_dense_NN = create_dense_NN()\n",
+        "# Wrap the dense network for epistemic uncertainty estimation with a 5-member Ensemble\n",
+        "ensemble_NN = capsa.EnsembleWrapper(standard_dense_NN, num_members=5)\n",
         "\n",
-        "ensemble_wrapper.compile(\n",
+        "# Build the model for regression, defining the loss function and optimizer\n",
+        "ensemble_NN.compile(\n",
         "  optimizer=tf.keras.optimizers.Adam(learning_rate=3e-3),\n",
-        "  loss=tf.keras.losses.MeanSquaredError(),\n",
+        "  loss=tf.keras.losses.MeanSquaredError(), # MSE loss for the regression task\n",
         ")\n",
         "\n",
-        "history = ensemble_wrapper.fit(x, y, epochs=30)"
+        "# Train the wrapped model for 30 epochs.\n",
+        "loss_history_ensemble = ensemble_NN.fit(x_train, y_train, epochs=30)\n",
+        "\n",
+        "# Call the uncertainty-aware model to generate outputs for the test data\n",
+        "outputs = ensemble_NN(x_test)\n",
+        "# Capsa makes the epistemic uncertainty an attribute of the prediction!\n",
+        "epistemic_unc = outputs.epistemic\n",
+        "\n",
+        "# Visualize the epistemic uncertainty across the data space\n",
+        "plt.scatter(x_test, epistemic, label='epistemic uncertainty', s=0.5)\n",
+        "plt.legend()"
       ]
     },
     {
@@ -749,22 +766,67 @@
     },
     {
       "cell_type": "markdown",
-      "metadata": {
-        "id": "VU6eMpYX9m9N"
-      },
       "source": [
-        "## Conclusion\n",
-        "As expected, areas where there is no training data have very high epistemic uncertainty, since all of the testing data is OOD. If our training data contained more samples from this region, would you expect the epistemic uncertainty to decrease?"
-      ]
+        "#### **TODO: Estimating epistemic uncertainty**\n",
+        "\n",
+        "Write short (~1 sentence) answers to the questions below to complete the `TODO`s:\n",
+        "\n",
+        "1. For what values of $x$ is the epistemic uncertainty high or increasing suddenly?\n",
+        "2. How does your answer in (1) relate to how the $x$ values are distributed (refer back to original plot)? Think about both the train and test data.\n",
+        "3. How could you reduce the epistemic uncertainty in regions where it is high?"
+      ],
+      "metadata": {
+        "id": "N4LMn2tLPBdg"
+      }
     },
     {
       "cell_type": "markdown",
       "metadata": {
         "id": "CkpvkOL06jRd"
       },
       "source": [
+        "# 1.6 Conclusion\n",
+        "\n",
+        "You've just analyzed the bias, aleatoric uncertainty, and epistemic uncertainty for your first risk-aware model! This is a task that data scientists do constantly to determine methods of improving their models and datasets.\n",
+        "\n",
+        "## NOTE TO ADDRESS: THIS CAN BE ELIMINATED COMPLETELY IF IT IS TOO MUCH FOR COMPETITION!\n",
+        "### 1.6.1 Submission information\n",
+        "To be eligible for the Debiasing Faces Lab prize, you must submit a document of your answers to the short-answer `TODO`s with your complete lab submission. **Name your file in the following format: `[FirstName]_[LastName]_Debiasing_Report.pdf`.**\n",
+        "\n",
+        "Upload your document write-up as part of your complete lab submission for the Debiasing Faces Lab ([submission upload link](https://www.dropbox.com/request/TTYz3Ikx5wIgOITmm5i2)).\n",
+        "\n",
+        "Please see the short-answer `TODO`s replicated again here:\n",
+        "\n",
+        "#### **TODO: Inspecting the 2D regression dataset**\n",
+        "\n",
+        "1. What are your observations about where the train data and test data lie relative to each other?\n",
+        "2. What, if any, areas do you expect to have high/low aleatoric (data) uncertainty?\n",
+        "3. What, if any, areas do you expect to have high/low epistemic (model) uncertainty?\n",
+        "\n",
+        "#### **TODO: Analyzing the performance of standard regression model**\n",
+        "\n",
+        "1. Where does the model perform well?\n",
+        "2. Where does the model perform poorly?\n",
+        "\n",
+        "#### **TODO: Evaluating bias**\n",
+        "\n",
+        "1. How does the bias score relate to the train/test data density from the first plot?\n",
+        "2. What is one limitation of the Histogram approach that simply bins the data based on frequency?\n",
+        "\n",
+        "#### **TODO: Estimating aleatoric uncertainty**\n",
+        "\n",
+        "1. For what values of $x$ is the aleatoric uncertainty high or increasing suddenly?\n",
+        "2. How does your answer in (1) relate to how the $x$ values are distributed?\n",
+        "\n",
+        "#### **TODO: Estimating epistemic uncertainty**\n",
+        "\n",
+        "1. For what values of $x$ is the epistemic uncertainty high or increasing suddenly?\n",
+        "2. How does your answer in (1) relate to how the $x$ values are distributed (refer back to original plot)? Think about both the train and test data.\n",
+        "3. How could you reduce the epistemic uncertainty in regions where it is high?\n",
+        "\n",
+        "### 1.6.2 Moving forward\n",
         "\n",
-        "You've just analyzed the bias, aleatoric uncertainty, and epistemic uncertainty for your first risk-aware model! This is a task that data scientists do constantly to determine methods of improving their models and datasets. In the next part, you'll continue to build off of these concepts to *mitigate* these risks, in addition to diagnosing them!\n",
+        "In the next part of the lab, you'll continue to build off of these concepts to *mitigate* these risks, in addition to diagnosing them!\n",
         "\n",
         "![alt text](https://raw.githubusercontent.com/aamini/introtodeeplearning/2023/lab3/img/solutions_toy.png)"
       ]