NGO-Algorithm-Audit
diff --git a/‎notebooks/unsupervised bias detection tool/COMPAS_FP.ipynb‎
Lines changed: 53 additions & 14 deletions b/‎notebooks/unsupervised bias detection tool/COMPAS_FP.ipynb‎
Lines changed: 53 additions & 14 deletions
@@ -466,7 +466,7 @@
    "source": [
     "**UI text #1**\n",
     "\n",
-    "In this example, we analyze which group is most adversely affected by the risk prediction algorithm. We do this by applying the clustering algorithm on the dataset previewed below. The column `is_recid` indicates whether a defendant reoffended or not (1: yes, 0: no). The `score_text` column indicates whether a defendant was predicted to reoffend (1: yes, 0: no). The column `false_positive` (FP) represents cases where a defendant was predicted to reoffended by the algorithm, but didn't do so (1: FP, 0: no FP). A preview of the data can be found below. The column `false_positive` is used as the `bias score`.\n",
+    "In this example, we analyze which group is most adversely affected by the risk prediction algorithm. We do this by applying the clustering algorithm on the dataset previewed below. The column `is_recid` indicates whether a defendant reoffended or not (1: yes, 0: no). The `score_text` column indicates whether a defendant was predicted to reoffend (1: yes, 0: no). The column `false_positive` (FP) represents cases where a defendant was predicted to reoffended by the algorithm, but didn't do so (1: FP, 0: no FP). A preview of the data can be found below. The column `false_positive` is used as the `bias variable`.\n",
     "\n",
     "**1. Preview of data**\n",
     "\n",
@@ -591,15 +591,24 @@
     "\n",
     "# Create a false_positive column\n",
     "filtered_df[\"false_positive\"] = ((filtered_df[\"is_recid\"] == 0) & (filtered_df[\"score_text\"] == 1))\n",
+    "\n",
+    "# Assign bias variable\n",
     "bias_variable = \"false_positive\"\n",
     "\n",
     "# Display the updated dataframe\n",
     "filtered_df.head()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Encode to original format (only for categorical data)"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
@@ -1391,6 +1400,13 @@
     "cluster_label_X_test"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Decode to original format (only for categorical data)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 14,
@@ -1700,7 +1716,10 @@
     }
    ],
    "source": [
+    "# attach bias variable again to the decoded test set (only for categorical data)\n",
     "decoded_X_test[bias_variable] = y_test.values\n",
+    "\n",
+    "# attach predicted cluster label to the decoded test set\n",
     "decoded_X_test[\"cluster_label\"] = cluster_label_X_test\n",
     "decoded_X_test.head()"
    ]
@@ -1769,8 +1788,8 @@
      "output_type": "stream",
      "text": [
       "The label indicating the most disavanteagous bias: -1\n",
-      "Most biased cluster: 83/1169 (0.071)\n",
-      "Rest of dataset: 4/274 (0.015)\n",
+      "Most biased cluster in test set: 83/1169 (0.071)\n",
+      "Rest of test set: 4/274 (0.015)\n",
       "Z-statistic: 3.5304\n",
       "P-value: 0.0002\n",
       "The bias variable occurs statistically significant more often than in the rest of the dataset.\n"
@@ -1794,8 +1813,8 @@
     "z_stat, p_val = proportions_ztest(counts, nobs, alternative='larger')\n",
     "\n",
     "print(f\"The label indicating the most disavanteagous bias: {most_biased_cluster_label}\")\n",
-    "print(f\"Most biased cluster: {most_biased_count}/{most_biased_total} ({most_biased_count/most_biased_total:.3f})\")\n",
-    "print(f\"Rest of dataset: {rest_count}/{rest_total} ({rest_count/rest_total:.3f})\")\n",
+    "print(f\"Most biased cluster in test set: {most_biased_count}/{most_biased_total} ({most_biased_count/most_biased_total:.3f})\")\n",
+    "print(f\"Rest of test set: {rest_count}/{rest_total} ({rest_count/rest_total:.3f})\")\n",
     "print(f\"Z-statistic: {z_stat:.4f}\")\n",
     "print(f\"P-value: {p_val:.4f}\")\n",
     "\n",
@@ -1826,7 +1845,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Accordion 1**\n",
+    "**Accordion 'Features per cluster'**\n",
     "\n",
     "[if p<0.05]\n",
     "\n",
@@ -1853,17 +1872,18 @@
    ],
    "source": [
     "# Group by cluster_label and count the occurrences\n",
-    "cluster_counts = decoded_X_test[\"cluster_label\"].value_counts()\n",
+    "df = decoded_X_test\n",
+    "cluster_counts = df[\"cluster_label\"].value_counts()\n",
     "\n",
     "# Create subplots for each column\n",
-    "columns_to_analyze = decoded_X_test.columns.drop(['cluster_label', bias_variable]) # exclude cluster_label and bias variable\n",
+    "columns_to_analyze = df.columns.drop(['cluster_label', bias_variable]) # exclude cluster_label and bias variable\n",
     "rows = (len(columns_to_analyze) + 2) // 3  # Calculate the number of rows needed\n",
     "fig, axes = plt.subplots(rows, min(len(columns_to_analyze), 3), figsize=(15, 3 * rows), squeeze=False)\n",
     "axes = axes.flatten()  # Flatten the axes array for easier indexing\n",
     "\n",
     "for i, column in enumerate(columns_to_analyze):\n",
     "    # Group by cluster_label and the column, then calculate percentages\n",
-    "    grouped_data = decoded_X_test.groupby([\"cluster_label\", column]).size().unstack(fill_value=0)\n",
+    "    grouped_data = df.groupby([\"cluster_label\", column]).size().unstack(fill_value=0)\n",
     "    percentages = grouped_data.div(grouped_data.sum(axis=1), axis=0) * 100\n",
     "    \n",
     "    # Plot the percentage data without legend\n",
@@ -1873,7 +1893,7 @@
     "    axes[i].set_xticklabels(percentages.T.index, rotation=0)\n",
     "    \n",
     "    # Calculate and plot the average percentage in the entire dataset for each category value\n",
-    "    overall_counts = decoded_X_test[column].value_counts(normalize=True) * 100\n",
+    "    overall_counts = df[column].value_counts(normalize=True) * 100\n",
     "    for cat_value, avg_pct in overall_counts.items():\n",
     "        # Find the x position for this category value\n",
     "        try:\n",
@@ -1908,7 +1928,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 20,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1979,9 +1999,28 @@
     "            print(f\"{var[0]}: '{var[1]}' doesn't occur statistically significant more or less often than in the rest of the dataset.\\033[0m\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Accordion 'Statistical significant difference wrt. cluster features'**\n",
+    "\n",
+    "[if p<0.05]\n",
+    "\n",
+    "**UI text #8**\n",
+    "\n",
+    "The following statistical test is conducted for each feature:\n",
+    "\n",
+    "$H_0$: feature doesn't occur more often in most deviating cluster compared to the rest of the dataset\n",
+    "\n",
+    "$H_1$: feature does occur more often in most deviating cluster compared to the rest of the dataset\n",
+    "\n",
+    "For categorical data a two-sided chi-squared-test, while for numerical data a two-sided t-test is used. To account for multiple hypothesis testing Bonferroni correction is applied."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 21,
    "metadata": {},
    "outputs": [
     {
@@ -2023,7 +2062,7 @@
     "\n",
     "**UI text #9**\n",
     "\n",
-    "**7. Bias report**\n",
+    "**7. Conclusion and bias report**\n",
     "\n",
     "[Download]"
    ]