|
| 1 | +# Historical EqualizedOdds Metric Experiments |
| 2 | + |
| 3 | +This document provides a historical record of experiments that were conducted to test and validate the `EqualizedOddsImprovement` metric in SDMetrics. These experiments have since been removed from the codebase and can only be accessed through the git history. Thery are documented here to avoid repeating similar approaches in the future. |
| 4 | + |
| 5 | +## Background |
| 6 | + |
| 7 | +The EqualizedOdds metric measures fairness by ensuring that the true positive rate and false positive rate are similar across different groups defined by a sensitive attribute. The historical experiments documented below all used the Adult dataset with: |
| 8 | + |
| 9 | +- **Target variable**: `income` (where `'>50K'` is the positive class) |
| 10 | +- **Sensitive attribute**: `sex` (potential gender-based bias) |
| 11 | + |
| 12 | +These experiments were removed because they either failed to achieve a EqualizedOddsImprovement score of over 0.5, or because the experiments were deemed flawed. |
| 13 | + |
| 14 | +## Historical Experiment Overview |
| 15 | + |
| 16 | +### Experiment 1: Basic SDV Synthesis with Conditional Sampling |
| 17 | + |
| 18 | +**Objective**: Test whether conditional sampling could reduce bias in synthetic data. This is detailed in GitHub Issue #776. |
| 19 | + |
| 20 | +**Methodology**: |
| 21 | + |
| 22 | +1. Split the Adult dataset from single-table demo datasets into training and test sets |
| 23 | +2. Ensured both sets contained all combinations of prediction target and sensitive attributes |
| 24 | +3. Trained an SDV synthesizer (TVAESynthesizer) on the training set |
| 25 | +4. Generated synthetic data and measured EqualizedOddsImprovement against real data |
| 26 | +5. Applied conditional sampling to generate balanced synthetic data: |
| 27 | + - 25% with `income='>50K'` and `sex='Female'` |
| 28 | + - 25% with `income='<=50K'` and `sex='Male'` |
| 29 | + - 25% with `income='>50K'` and `sex='Female'` |
| 30 | + - 25% with `income='<=50K'` and `sex='Male'` |
| 31 | +6. Compared results between regular and conditionally sampled synthetic data |
| 32 | + |
| 33 | +### Experiment 2: Artificially Introducing Bias |
| 34 | + |
| 35 | +**Objective**: Create a more biased dataset to better demonstrate the metric's effectiveness. |
| 36 | + |
| 37 | +**Methodology**: |
| 38 | + |
| 39 | +1. Started with setup from Experiment 1 |
| 40 | +2. Introduced artificial bias by flipping income values for `sex='Female'` rows |
| 41 | +3. Tested synthetic data generation and conditional sampling on this biased dataset |
| 42 | + |
| 43 | +**Original Hypothesis**: |
| 44 | + |
| 45 | +- Sex would become a significant factor in income prediction |
| 46 | +- Baseline equalized odds would be poor |
| 47 | +- Synthetic data, especially conditionally sampled, would show improvement |
| 48 | + |
| 49 | +### Experiment 3: Refined Bias Introduction |
| 50 | + |
| 51 | +**Objective**: Create a bias scenario where the positive class remained minority. |
| 52 | + |
| 53 | +**Methodology**: |
| 54 | + |
| 55 | +1. Started with setup from Experiment 1 |
| 56 | +2. For `sex='Male'` rows only: |
| 57 | + - If `salary='<=50K'`, flipped to `'>50K'` with 25% probability |
| 58 | + - If `salary='>50K'`, kept unchanged |
| 59 | +3. Kept `sex='Female'` rows unchanged |
| 60 | +4. Ran EqualizedOddsImprovement metric with specified parameters |
| 61 | + |
| 62 | +**Original Hypothesis**: Make `'>50K'` several times more likely for males while keeping it as a minority label overall and making females the under-represented group in high earnings. |
| 63 | + |
| 64 | +### Experiment 4: Training/Validation Imbalance |
| 65 | + |
| 66 | +**Objective**: Test the metric with imbalanced training data but fair validation data. |
| 67 | + |
| 68 | +**Methodology**: |
| 69 | + |
| 70 | +1. Created class imbalance in training data |
| 71 | +2. Converted validation data into a fair set where the number of high/low earners was equal across Female/Male groups |
| 72 | +3. Measured how synthetic data performed under these conditions |
0 commit comments