Skip to content

Commit deba7a7

Browse files
committed
Add explanation of DIR
1 parent 4f36fe7 commit deba7a7

File tree

1 file changed

+51
-22
lines changed

1 file changed

+51
-22
lines changed

examples/Group fairness metrics.ipynb

Lines changed: 51 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -188,15 +188,15 @@
188188
"The _Statistical Parity Difference (SPD)_ is the difference in the probability of prediction between the privileged and unprivileged groups. Typically:\n",
189189
"\n",
190190
"- $SPD=0$ means that the model is behaving fairly in regards of the selected attribute (e.g. race, gender)\n",
191-
"- $-0.1<SPD<0.1$ means that the model is _reasonably fair_ and the score can be attributed to other factor, such as sample size.\n",
191+
"- Values between $-0.1<SPD<0.1$ mean that the model is _reasonably fair_ and the score can be attributed to other factors, such as sample size.\n",
192192
"- An $SPD$ outside this range would be an indicator of an _unfair_ model relatively to the protected attributes.\n",
193193
" - A *negative* value of statistical parity difference indicates that the unprivileged group is at a disadvantage\n",
194194
" - A *positive* value indicates that the privileged group is at a disadvantage.\n",
195195
"\n",
196196
"The formal definition of $SPD$ is\n",
197197
"\n",
198198
"$$\n",
199-
"SPD=P(\\hat{y}=1|\\mathcal{D}_u)-P(\\hat{y}=1|\\mathcal{D}_p)\n",
199+
"SPD=p(\\hat{y}=1|\\mathcal{D}_u)-p(\\hat{y}=1|\\mathcal{D}_p)\n",
200200
"$$\n",
201201
"\n",
202202
"where $\\hat{y}=1$ is the favorable outcome and $\\mathcal{D}_u$, $\\mathcal{D}_p$ are respectively the privileged and unpriviledge group data.\n",
@@ -207,7 +207,7 @@
207207
},
208208
{
209209
"cell_type": "code",
210-
"execution_count": 6,
210+
"execution_count": 4,
211211
"id": "8bd3f51b",
212212
"metadata": {},
213213
"outputs": [
@@ -222,7 +222,7 @@
222222
"Name: income, dtype: int64"
223223
]
224224
},
225-
"execution_count": 6,
225+
"execution_count": 4,
226226
"metadata": {},
227227
"output_type": "execute_result"
228228
}
@@ -234,7 +234,7 @@
234234
},
235235
{
236236
"cell_type": "code",
237-
"execution_count": 7,
237+
"execution_count": 5,
238238
"id": "9e8978f6",
239239
"metadata": {},
240240
"outputs": [
@@ -244,7 +244,7 @@
244244
"<AxesSubplot:xlabel='gender'>"
245245
]
246246
},
247-
"execution_count": 7,
247+
"execution_count": 5,
248248
"metadata": {},
249249
"output_type": "execute_result"
250250
},
@@ -265,7 +265,7 @@
265265
},
266266
{
267267
"cell_type": "code",
268-
"execution_count": 22,
268+
"execution_count": 7,
269269
"id": "2b2c678a",
270270
"metadata": {},
271271
"outputs": [],
@@ -283,7 +283,7 @@
283283
},
284284
{
285285
"cell_type": "code",
286-
"execution_count": 23,
286+
"execution_count": 8,
287287
"id": "9e548018",
288288
"metadata": {},
289289
"outputs": [
@@ -301,7 +301,7 @@
301301
},
302302
{
303303
"cell_type": "markdown",
304-
"id": "cd296c85",
304+
"id": "a13a2ac3",
305305
"metadata": {},
306306
"source": [
307307
"We can see that the $SPD$ for this dataset is between the $[-0.1, 0.1]$ threshold, which classifies the model as _reasonably fair_."
@@ -312,12 +312,12 @@
312312
"id": "09bb7d45",
313313
"metadata": {},
314314
"source": [
315-
"## Biased dataset"
315+
"### Biased dataset"
316316
]
317317
},
318318
{
319319
"cell_type": "code",
320-
"execution_count": 10,
320+
"execution_count": 9,
321321
"id": "63b953c9",
322322
"metadata": {},
323323
"outputs": [
@@ -332,7 +332,7 @@
332332
"Name: income, dtype: int64"
333333
]
334334
},
335-
"execution_count": 10,
335+
"execution_count": 9,
336336
"metadata": {},
337337
"output_type": "execute_result"
338338
}
@@ -344,7 +344,7 @@
344344
},
345345
{
346346
"cell_type": "code",
347-
"execution_count": 11,
347+
"execution_count": 10,
348348
"id": "aed61b77",
349349
"metadata": {},
350350
"outputs": [
@@ -354,7 +354,7 @@
354354
"<AxesSubplot:xlabel='gender'>"
355355
]
356356
},
357-
"execution_count": 11,
357+
"execution_count": 10,
358358
"metadata": {},
359359
"output_type": "execute_result"
360360
},
@@ -375,7 +375,7 @@
375375
},
376376
{
377377
"cell_type": "code",
378-
"execution_count": 12,
378+
"execution_count": 11,
379379
"id": "901e5720",
380380
"metadata": {},
381381
"outputs": [],
@@ -390,7 +390,7 @@
390390
},
391391
{
392392
"cell_type": "code",
393-
"execution_count": 13,
393+
"execution_count": 12,
394394
"id": "7be544a7",
395395
"metadata": {},
396396
"outputs": [
@@ -408,7 +408,7 @@
408408
},
409409
{
410410
"cell_type": "markdown",
411-
"id": "719cba51",
411+
"id": "8e3f2bd4",
412412
"metadata": {},
413413
"source": [
414414
"This dataset, as expected, is outside the $[-0.1, 0.1]$ threshold, which classifies the model as _unfair_.\n",
@@ -420,12 +420,25 @@
420420
"id": "de0affcf",
421421
"metadata": {},
422422
"source": [
423-
"# Disparate impact ratio"
423+
"## Disparate impact ratio\n",
424+
"\n",
425+
"\n",
426+
"Similarly to the _Statistical Parity Difference_, the _Disparate Impact Ratio (DIR)_ measures imbalances in positive outcome predictions across privliged and unpriviliged groups.\n",
427+
"Instead of calculating the difference, this metric calculates the ration of such selection rates.Typically:\n",
428+
"\n",
429+
"- $DIR=1$ means that the model is fair with regards to the protected attribute.\n",
430+
"- $0.8<DIR<1.2$ means that the model is _reasonably fair_.\n",
431+
"\n",
432+
"The formal definition of the _Disparate Impact Ratio_ is:\n",
433+
"\n",
434+
"$$\n",
435+
"DIR=\\dfrac{p(\\hat{y}=1|\\mathcal{D}_u)}{p(\\hat{y}=1|\\mathcal{D}_p)}\n",
436+
"$$\n"
424437
]
425438
},
426439
{
427440
"cell_type": "code",
428-
"execution_count": 14,
441+
"execution_count": 13,
429442
"id": "949fae2f",
430443
"metadata": {},
431444
"outputs": [],
@@ -439,7 +452,7 @@
439452
},
440453
{
441454
"cell_type": "code",
442-
"execution_count": 15,
455+
"execution_count": 14,
443456
"id": "2e601762",
444457
"metadata": {},
445458
"outputs": [
@@ -455,9 +468,17 @@
455468
"print(score)"
456469
]
457470
},
471+
{
472+
"cell_type": "markdown",
473+
"id": "7dfc3077-0739-4b19-bfc3-9c16c70e048c",
474+
"metadata": {},
475+
"source": [
476+
"As with the $SPD$ we can see that the $DIR$ indicates a reasonably fair model (close to $1$) for the unbiased dataset."
477+
]
478+
},
458479
{
459480
"cell_type": "code",
460-
"execution_count": 16,
481+
"execution_count": 15,
461482
"id": "3231326d",
462483
"metadata": {},
463484
"outputs": [],
@@ -469,7 +490,7 @@
469490
},
470491
{
471492
"cell_type": "code",
472-
"execution_count": 17,
493+
"execution_count": 16,
473494
"id": "4b88eec8",
474495
"metadata": {},
475496
"outputs": [
@@ -485,6 +506,14 @@
485506
"print(score)"
486507
]
487508
},
509+
{
510+
"cell_type": "markdown",
511+
"id": "5b7c07d4-c216-41dc-b259-df8fd6b4d064",
512+
"metadata": {},
513+
"source": [
514+
"And also, as expected, the $DIR$ indicates a biased model for the biased dataset."
515+
]
516+
},
488517
{
489518
"cell_type": "markdown",
490519
"id": "7e9ca225",

0 commit comments

Comments
 (0)