Add explanation of DIR

ruivieira · ruivieira · commit deba7a7ad6f8 · 2022-12-09T11:54:18.000Z
diff --git a/examples/Group fairness metrics.ipynb b/examples/Group fairness metrics.ipynb
@@ -188,15 +188,15 @@
     "The _Statistical Parity Difference (SPD)_  is the difference in the probability of prediction between the privileged and unprivileged groups. Typically:\n",
     "\n",
     "- $SPD=0$ means that the model is behaving fairly in regards of the selected attribute (e.g. race, gender)\n",
-    "- $-0.1<SPD<0.1$ means that the model is _reasonably fair_ and the score can be attributed to other factor, such as sample size.\n",
+    "- Values between $-0.1<SPD<0.1$ mean that the model is _reasonably fair_ and the score can be attributed to other factors, such as sample size.\n",
     "- An $SPD$ outside this range would be an indicator of an _unfair_ model relatively to the protected attributes.\n",
     "    - A *negative* value of statistical parity difference indicates that the unprivileged group is at a disadvantage\n",
     "    - A *positive* value indicates that the privileged group is at a disadvantage.\n",
     "\n",
     "The formal definition of $SPD$ is\n",
     "\n",
     "$$\n",
-    "SPD=P(\\hat{y}=1|\\mathcal{D}_u)-P(\\hat{y}=1|\\mathcal{D}_p)\n",
+    "SPD=p(\\hat{y}=1|\\mathcal{D}_u)-p(\\hat{y}=1|\\mathcal{D}_p)\n",
     "$$\n",
     "\n",
     "where $\\hat{y}=1$ is the favorable outcome and $\\mathcal{D}_u$, $\\mathcal{D}_p$ are respectively the privileged and unpriviledge group data.\n",
@@ -207,7 +207,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 4,
    "id": "8bd3f51b",
    "metadata": {},
    "outputs": [
@@ -222,7 +222,7 @@
        "Name: income, dtype: int64"
       ]
      },
-     "execution_count": 6,
+     "execution_count": 4,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -234,7 +234,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 5,
    "id": "9e8978f6",
    "metadata": {},
    "outputs": [
@@ -244,7 +244,7 @@
        "<AxesSubplot:xlabel='gender'>"
       ]
      },
-     "execution_count": 7,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     },
@@ -265,7 +265,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 7,
    "id": "2b2c678a",
    "metadata": {},
    "outputs": [],
@@ -283,7 +283,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 8,
    "id": "9e548018",
    "metadata": {},
    "outputs": [
@@ -301,7 +301,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cd296c85",
+   "id": "a13a2ac3",
    "metadata": {},
    "source": [
     "We can see that the $SPD$ for this dataset is between the $[-0.1, 0.1]$ threshold, which classifies the model as _reasonably fair_."
@@ -312,12 +312,12 @@
    "id": "09bb7d45",
    "metadata": {},
    "source": [
-    "## Biased dataset"
+    "### Biased dataset"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 9,
    "id": "63b953c9",
    "metadata": {},
    "outputs": [
@@ -332,7 +332,7 @@
        "Name: income, dtype: int64"
       ]
      },
-     "execution_count": 10,
+     "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -344,7 +344,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 10,
    "id": "aed61b77",
    "metadata": {},
    "outputs": [
@@ -354,7 +354,7 @@
        "<AxesSubplot:xlabel='gender'>"
       ]
      },
-     "execution_count": 11,
+     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     },
@@ -375,7 +375,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 11,
    "id": "901e5720",
    "metadata": {},
    "outputs": [],
@@ -390,7 +390,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 12,
    "id": "7be544a7",
    "metadata": {},
    "outputs": [
@@ -408,7 +408,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "719cba51",
+   "id": "8e3f2bd4",
    "metadata": {},
    "source": [
     "This dataset, as expected, is outside the $[-0.1, 0.1]$ threshold, which classifies the model as _unfair_.\n",
@@ -420,12 +420,25 @@
    "id": "de0affcf",
    "metadata": {},
    "source": [
-    "# Disparate impact ratio"
+    "## Disparate impact ratio\n",
+    "\n",
+    "\n",
+    "Similarly to the _Statistical Parity Difference_, the _Disparate Impact Ratio (DIR)_ measures imbalances in positive outcome predictions across privliged and unpriviliged groups.\n",
+    "Instead of calculating the difference, this metric calculates the ration of such selection rates.Typically:\n",
+    "\n",
+    "- $DIR=1$ means that the model is fair with regards to the protected attribute.\n",
+    "- $0.8<DIR<1.2$ means that the model is _reasonably fair_.\n",
+    "\n",
+    "The formal definition of the _Disparate Impact Ratio_ is:\n",
+    "\n",
+    "$$\n",
+    "DIR=\\dfrac{p(\\hat{y}=1|\\mathcal{D}_u)}{p(\\hat{y}=1|\\mathcal{D}_p)}\n",
+    "$$\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 13,
    "id": "949fae2f",
    "metadata": {},
    "outputs": [],
@@ -439,7 +452,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 14,
    "id": "2e601762",
    "metadata": {},
    "outputs": [
@@ -455,9 +468,17 @@
     "print(score)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "7dfc3077-0739-4b19-bfc3-9c16c70e048c",
+   "metadata": {},
+   "source": [
+    "As with the $SPD$ we can see that the $DIR$ indicates a reasonably fair model (close to $1$) for the unbiased dataset."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 15,
    "id": "3231326d",
    "metadata": {},
    "outputs": [],
@@ -469,7 +490,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 16,
    "id": "4b88eec8",
    "metadata": {},
    "outputs": [
@@ -485,6 +506,14 @@
     "print(score)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "5b7c07d4-c216-41dc-b259-df8fd6b4d064",
+   "metadata": {},
+   "source": [
+    "And also, as expected, the $DIR$ indicates a biased model for the biased dataset."
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "7e9ca225",