Skip to content

Commit 8af100a

Browse files
authored
Merge pull request #89 from NGO-Algorithm-Audit/JFP_edits
Jfp edits
2 parents 61a105b + 4502e80 commit 8af100a

File tree

9 files changed

+3723
-2151
lines changed

9 files changed

+3723
-2151
lines changed

notebooks/synthetic data generation tool/CART_LawSchoolAdmissionBar.ipynb

Lines changed: 154 additions & 415 deletions
Large diffs are not rendered by default.

notebooks/synthetic data generation tool/GC_LawSchoolAdmissionBar.ipynb

Lines changed: 0 additions & 1679 deletions
This file was deleted.

notebooks/synthetic data generation tool/GC_drop_LawSchoolAdmissionBar.ipynb

Lines changed: 1734 additions & 0 deletions
Large diffs are not rendered by default.

notebooks/synthetic data generation tool/GC_impute_LawSchoolAdmissionBar.ipynb

Lines changed: 1746 additions & 0 deletions
Large diffs are not rendered by default.

notebooks/synthetic data generation tool/GC_SocialDiagnosis2011.ipynb renamed to notebooks/synthetic data generation tool/GC_impute_SocialDiagnosis2011.ipynb

File renamed without changes.

package-lock.json

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/assets/bias-detection-python-code.tsx

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,17 @@ def run():
565565
setOutputData("mostBiasedCluster", df_most_biased_cluster.to_json(orient='records'))
566566
setOutputData("otherClusters", df_other.to_json(orient='records'))
567567
568+
569+
setResult(json.dumps({
570+
'type': 'heading',
571+
'headingKey': 'biasAnalysis.conclusion'
572+
}))
573+
574+
setResult(json.dumps({
575+
'type': 'text',
576+
'key': 'biasAnalysis.conclusionDescription'
577+
}))
578+
568579
# Calculate the difference in percentage for each category value between cluster 0 and the entire dataset
569580
diff_percentages = {}
570581
@@ -590,15 +601,7 @@ def run():
590601
}))
591602
592603
593-
setResult(json.dumps({
594-
'type': 'heading',
595-
'headingKey': 'biasAnalysis.conclusion'
596-
}))
597-
598-
setResult(json.dumps({
599-
'type': 'text',
600-
'key': 'biasAnalysis.conclusionDescription'
601-
}))
604+
602605
603606
setResult(json.dumps({
604607
'type': 'export-button',

src/locales/en.ts

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ export const en = {
77
fileUploadError: 'Please upload a valid csv file.',
88
removeButton: 'Remove',
99
dropzoneLabel:
10-
'Drag and drop your csv file here, click to select one of your own files or use the "Demo dataset" button',
10+
'Drag and drop your csv file here, click to select a local file or use the "Demo dataset" button',
1111
datasetPreview: 'Dataset preview showing the first 5 rows.',
1212
error: 'Sorry, something went wrong.',
1313
loadingMessage: 'Setting up environment...',
@@ -30,14 +30,14 @@ export const en = {
3030
dataSet: 'Dataset',
3131
dataSetTooltip: `Preprocess your data such that:
3232
- missing values are removed or replaced;
33-
- all columns (except your bias metric column) should have the same datatypes, e.g., numerical or categorical;
34-
- the bias metric column is numerical`,
35-
performanceMetric: 'Bias metric',
33+
- all columns (except your outcome label column) should have the same datatypes, e.g., numerical or categorical;
34+
- the outcome label column is numerical`,
35+
performanceMetric: 'Outcome label',
3636
performanceMetricTooltip:
37-
'Clustering will be performed on the bias metrics. The bias metric should be numerical. Examples of bias metrics are "being classified as high risk" or "selected for an investigation"',
37+
'Clustering will be performed on the outcome labels. The outcome label should be numerical. Examples of outcome labels are "being classified as high risk" or "selected for an investigation"',
3838
dataType: 'Type of data',
3939
dataTypeTooltip:
40-
'Specify whether the data are categorical or numerical. All columns (except your bias metric column) should have the same data type',
40+
'Specify whether the data are categorical or numerical. All columns (except your outcome label column) should have the same data type',
4141
categoricalData: 'Categorical data',
4242
numericalData: 'Numerical data',
4343
filterSelect:
@@ -48,11 +48,11 @@ export const en = {
4848
iterations: 'Iterations',
4949
minClusterSize: 'Minimal cluster size',
5050
performanceInterpretation: {
51-
title: 'Bias metric interpretation',
52-
lower: 'Lower value of bias metric is better, such as error rate',
53-
higher: 'Higher value of bias metric is better, such as accuracy',
51+
title: 'Outcome label interpretation',
52+
lower: 'Lower value of outcome label is better, such as error rate',
53+
higher: 'Higher value of outcome label is better, such as accuracy',
5454
tooltip:
55-
'When error rate or misclassifications are chosen as the bias metric, a lower value is preferred, as the goal is to minimize errors. Conversely, when accuracy or precision is selected as the bias metric, a higher value is preferred, reflecting the aim to maximize performance.',
55+
'When error rate or misclassifications are chosen as the outcome label, a lower value is preferred, as the goal is to minimize errors. Conversely, when accuracy or precision is selected as the outcome label, a higher value is preferred, reflecting the aim to maximize performance.',
5656
},
5757
iterationsTooltip:
5858
'Number of times the dataset is split in smaller clusters until the minimal cluster size is reached',
@@ -62,7 +62,7 @@ export const en = {
6262
},
6363
errors: {
6464
csvRequired: 'Please upload a csv file.',
65-
targetColumnRequired: 'Please select a bias metric.',
65+
targetColumnRequired: 'Please select a outcome label.',
6666
dataTypeRequired: 'Please select a data type.',
6767
noNumericColumns:
6868
'No numeric columns found. Please upload a valid dataset.',
@@ -83,7 +83,8 @@ export const en = {
8383
},
8484
demoCard: {
8585
title: 'Try it out!',
86-
description: 'Alternatively, use our demo dataset.',
86+
description:
87+
"Use our demo dataset if you don't have a dataset at hand",
8788
},
8889
},
8990
syntheticData: {
@@ -163,7 +164,7 @@ export const en = {
163164
correlationRealdata: 'Correlation matrix',
164165
correlationSyntheticData: 'Correlation matrix',
165166
dataSetPreview: {
166-
heading: '0. Preview of real data',
167+
heading: '1. Preview of data',
167168
},
168169
columnsInDataset: '1. Data types detection',
169170
handlingMissingDataTitle: '2. Handling missing data',
@@ -255,17 +256,17 @@ we recommend to impute the missing data. For {tooltip:syntheticData.missingDataM
255256
},
256257

257258
biasAnalysis: {
258-
testingStatisticalSignificance: `**4. Testing statistical significance for the bias score difference between the most deviating cluster and the rest of the dataset**
259+
testingStatisticalSignificance: `**5. Testing cluster differences wrt. outcome labels**
259260
260-
- <i class="font-serif">H</i><sub>0</sub>: no difference in bias between the most deviating cluster and the rest of the dataset
261-
- <i class="font-serif">H</i><sub>1</sub>: difference in bias between the most deviating cluster and the rest of the dataset
261+
- <i class="font-serif">H</i><sub>0</sub>: no difference in outcome labels between the most deviating cluster and the rest of the dataset
262+
- <i class="font-serif">H</i><sub>1</sub>: difference in outcome labels between the most deviating cluster and the rest of the dataset
262263
263264
A two-sided t-test is performed to accept or reject <i class="font-serif">H</i><sub>0</sub>:.
264265
265266
p_value : {{p_val}}
266267
`,
267268
dataSetPreview: {
268-
heading: '0. Preview of real data',
269+
heading: '1. Preview of data',
269270
},
270271
demo: {
271272
heading: 'Information about demo dataset',
@@ -283,18 +284,20 @@ p_value : {{p_val}}
283284
| false_positive | Defendant predicted to reoffend, but didn't | 0: no FP, 1: FP |
284285
285286
286-
In this example, we analyze which group is most adversely affected by the risk prediction algorithm. We do this by applying the clustering algorithm on the dataset previewed below. The column "is_recid" indicates whether a defendant reoffended or not (1: yes, 0: no). The "score_text" column indicates whether a defendant was predicted to reoffend (1: yes, 0: no). The column "false_positive" (FP) represents cases where a defendant was predicted to reoffended by the algorithm, but didn't do so (1: FP, 0: no FP). A preview of the data can be found below. The column "false_positive" is used as the "bias metric".
287+
<br>
288+
289+
In this example, we analyze which group is most adversely affected by the risk prediction algorithm. We do this by applying the clustering algorithm on the dataset previewed below. The column "is_recid" indicates whether a defendant reoffended or not (1: yes, 0: no). The "score_text" column indicates whether a defendant was predicted to reoffend (1: yes, 0: no). The column "false_positive" (FP) represents cases where a defendant was predicted to reoffended by the algorithm, but didn't do so (1: FP, 0: no FP). A preview of the data can be found below. The column "false_positive" is used as the outcome label.
287290
`,
288291
},
289292
parameters: {
290-
heading: '1. Parameters selected for clustering',
293+
heading: '2. Parameters selected for clustering',
291294
iterations: 'Number of iterations: {{value}}',
292295
minClusterSize: 'Minimal cluster size: {{value}}',
293-
performanceMetric: 'Performance metric column: {{value}}',
296+
performanceMetric: 'Outcome label: {{value}}',
294297
dataType: 'Data type: {{value}}',
295298
description: `- Number of iterations: {{iterations}}
296299
- Minimal cluster size: {{minClusterSize}}
297-
- Performance metric column: {{performanceMetric}}
300+
- Outcome label: {{performanceMetric}}
298301
- Data type: {{dataType}}
299302
`,
300303
},
@@ -334,32 +337,32 @@ In this example, we analyze which group is most adversely affected by the risk p
334337
},
335338
nodifference: {
336339
heading:
337-
'No significant difference in average bias metric between the most biased cluster and the rest of the dataset.',
340+
'No significant difference in average uitkomstlabel between the most biased cluster and the rest of the dataset.',
338341
},
339342
distribution: {
340-
mainHeading: '5. Cluster characteristics',
343+
mainHeading: '6. Testing cluster differences wrt. features',
341344
heading:
342-
'The "{{variable}}" variable distribution across the different clusters:',
345+
'"{{variable}}" distribution across the different clusters:',
343346
},
344347
splittingDataset: {
345-
heading: '2. Splitting dataset',
346-
description: `To reduce the possibility that the clustering method detects noise, the dataset is split in a train (80%) and test dataset (20%). The clustering method is first fitted on the train dataset. Then, the presence of statistically significant bias in the most deviating clusters is evaluated using the test dataset.`,
348+
heading: '3. Splitting dataset',
349+
description: `To reduce the possibility that the clustering method detects noise, the dataset is split in a train (80%) and test dataset (20%). The clustering method is first fitted on the train dataset. Then, the presence of statistically significant signal in the most deviating clusters is evaluated using the test dataset.`,
347350
},
348351
clusterinResults: {
349-
heading: '3. Clustering results',
352+
heading: '4. Clustering results',
350353
description: `
351354
- Number of clusters detected: {{clusterCount}}
352355
`,
353356
label: 'Choose cluster to show number of datapoints for',
354357
valueText: 'Number of datapoints in cluster {{index}}: {{value}}',
355358
},
356-
higherAverage: `The most biased cluster has a statistically significant higher average bias score than the rest of the dataset.`,
357-
noSignificance: `No statistically significant difference in average bias score between the most biased cluster and the rest of the dataset.`,
359+
higherAverage: `The most deviating cluster has statistically significant different outcome labels than the rest of the dataset.`,
360+
noSignificance: `No statistically significant difference in outcome labels between the most biased cluster and the rest of the dataset.`,
358361

359-
conclusion: `6. Conclusion and bias report`,
362+
conclusion: `7. Conclusion and bias report`,
360363
conclusionDescription: `From the above figures and statistical tests, it can be concluded that:`,
361364

362-
moreInformationHeading: `7. More information`,
365+
moreInformationHeading: `8. More information`,
363366
moreInformationDescription: `- [Scientific article](https://arxiv.org/pdf/2502.01713)
364367
- [Github repository](https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection)`,
365368
},

src/locales/nl.ts

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -193,20 +193,33 @@ export const nl = {
193193
biasAnalysis: {
194194
demo: {
195195
heading: 'Informatie over de demodataset',
196-
description:
197-
'Als demo wordt de [Twitter15](https://www.dropbox.com/scl/fi/flgahafqckxtup2s9eez8/rumdetect2017.zip?dl=0&e=1&file_subpath=%2Frumor_detection_acl2017%2Ftwitter15&rlkey=b7v86v3q1dpvcutxqk0xi7oej) dataset hieronder geladen. De dataset bevat kenmerken van tweets en de voorspelling van een BERT-gebaseerd misinformatie detectie algoritme of een tweet nepnieuws is of niet. Fout-positieve classificaties zijn gemarkeerd als FP. Een FP geeft aan dat tweet ten onrechte door het misinformatie algoritme zijn geclassificeerd als nepnieuws. De FP-metriek wordt in dit voorbeeld gebruikt als metriek om bias te meten. \n \n&nbsp;&nbsp;\n\n In dit voorbeeld onderzoeken we welk type tweets vaker/minder vaak door het misinformatie algoritme worden geclassificeerd als nepnieuws.',
198-
},
199-
testingStatisticalSignificance: `**4. Testing statistical significance for the bias score difference between the most deviating cluster and the rest of the dataset**
196+
description: `Als demonstratie wordt de [COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset](https://github.com/propublica/compas-analysis/tree/master) geladen. De dataset bevat kenmerken van criminele verdachten en hun risico op recidive, zoals voorspeld door het COMPAS-algoritme. De dataset bevat demografische gegevens zoals leeftijd, geslacht en ras, evenals strafblad, details over de aanklacht en het voorspelde risicolabel. Deze dataset wordt gebruikt als benchmark voor het bestuderen van algoritmische discriminatie. Een beschrijving van alle variabelen is te vinden in de onderstaande tabel.
197+
198+
**Variabelebeschrijving**
200199
201-
- <i class="font-serif">H</i><sub>0</sub>: no difference in bias between the most deviating cluster and the rest of the dataset
202-
- <i class="font-serif">H</i><sub>1</sub>: difference in bias between the most deviating cluster and the rest of the dataset
200+
| Variabelenaam | Beschrijving | Waarden |
201+
| ----------------- | --------------------------------------------------------- | ------------------------------------------------------------------------ |
202+
| age_cat | Leeftijdscategorie | Jonger dan 25, 25-45, Ouder dan 45 |
203+
| sex | Geslacht | Man, Vrouw |
204+
| race | Ras | Afro-Amerikaans, Aziatisch, Blank, Spaans, Inheems-Amerikaans, Overig |
205+
| c_charge_degree | Ernst van de strafrechtelijke aanklacht | M: Overtreding – Minder ernstige feiten, F: Misdrijf – Ernstigere feiten |
206+
| is_recid | Of de verdachte opnieuw de fout in ging (recidive) | 0: Nee, 1: Ja |
207+
| score_text | Voorspeld risicolabel van de verdachte | 0: Geen hoog risico, 1: Hoog risico |
208+
| false_positive | Verdachte voorspeld om te recidiveren, maar deed dat niet | 0: geen valse positieve, 1: valse positieve |
203209
204-
A two-sided t-test is performed to accept or reject <i class="font-serif">H</i><sub>0</sub>:.
210+
<br>
211+
212+
In dit voorbeeld analyseren we welke groep het meest nadelig wordt beïnvloed door het risicovoorspellingsalgoritme. Dit doen we door het clusteralgoritme toe te passen op de onderstaande datasetweergave. De kolom "is_recid" geeft aan of een verdachte daadwerkelijk opnieuw de fout in ging (1: ja, 0: nee). De kolom "score_text" geeft aan of werd voorspeld dat een verdachte opnieuw de fout in zou gaan (1: ja, 0: nee). De kolom "false_positive" (FP) vertegenwoordigt gevallen waarin het algoritme voorspelde dat een verdachte opnieuw de fout in zou gaan, maar dit niet gebeurde (1: FP, 0: geen FP). Een voorbeeldweergave van de gegevens is hieronder te vinden. De kolom "false_positive" wordt gebruikt als uitkomstlabel.
213+
`,
214+
},
215+
testingStatisticalSignificance: `**5. Testen van clusterverschillen ten opzichte van uitkomstlabels**
205216
217+
- <i class="font-serif">H</i><sub>0</sub>: er is geen verschil in uitkomstlabels tussen het meest afwijkende cluster en de rest van de dataset
218+
- <i class="font-serif">H</i><sub>1</sub>: er is een verschil in uitkomstlabels tussen het meest afwijkende cluster en de rest van de dataset
206219
207-
T_statistic : {{t_stat}}
220+
Er wordt een tweezijdige t-toets uitgevoerd om <i class="font-serif">H</i><sub>0</sub> te aanvaarden of te verwerpen.
208221
209-
p_value : {{p_val}}
222+
p-waarde : {{p_val}}
210223
`,
211224
parameters: {
212225
heading: 'Geselecteerde parameters',
@@ -222,6 +235,15 @@ p_value : {{p_val}}
222235
- Gegevenstype: {{dataType}}
223236
`,
224237
},
238+
distribution: {
239+
mainHeading:
240+
'6. Testen van clusterverschillen ten opzichte van kenmerken',
241+
heading: '"{{variable}}" verdeling over de verschillende clusters:',
242+
},
243+
splittingDataset: {
244+
heading: '3. Splitsen dataset',
245+
description: `Om de kans te verkleinen dat de clusteringmethode ruis detecteert, wordt de dataset opgesplitst in een trainingsset (80%) en een testset (20%). De clusteringmethode wordt eerst getraind op de trainingsset. Vervolgens wordt met behulp van de testset beoordeeld of er sprake is van een statistisch significant signaal in de meest afwijkende clusters.`,
246+
},
225247
distributionOfFeaturesAcrossClustersAccordeonTitle:
226248
'Verdeling van kenmerken over clusters',
227249
numericalVariableDistributionAcrossClustersAccordeonTitle:
@@ -256,17 +278,21 @@ p_value : {{p_val}}
256278
`,
257279
},
258280
},
259-
distribution: {
260-
heading:
261-
'De "{{variable}}" variabele verdeling over de verschillende clusters:',
281+
clusterinResults: {
282+
heading: '4. Cluster resultaten',
283+
description: `
284+
- Aantal gevonden clusters: {{clusterCount}}
285+
`,
286+
label: 'Kies cluster om het aantal datapunten voor weer te geven',
287+
valueText: 'Aantal datapunten in cluster {{index}}: {{value}}',
262288
},
263289
higherAverage: `De meest bevooroordeelde cluster heeft een statistisch significant hogere gemiddelde bias score dan de rest van de dataset.`,
264290
noSignificance: `Geen statistisch significant verschil in gemiddelde bias score tussen de meest bevooroordeelde cluster en de rest van de dataset.`,
265291

266-
conclusion: `6. Conclusie en bias rapport`,
292+
conclusion: `7. Conclusie en bias rapport`,
267293
conclusionDescription: `Uit de bovenstaande figuren en statistische tests kan worden geconcludeerd dat:`,
268294

269-
moreInformationHeading: `7. Meer informatie`,
295+
moreInformationHeading: `8. Meer informatie`,
270296
moreInformationDescription: `- [Scientific article](https://arxiv.org/pdf/2502.01713)
271297
- [Github repository](https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection)`,
272298
},

0 commit comments

Comments
 (0)