Skip to content

Commit 961549e

Browse files
committed
Textual edits
1 parent c179256 commit 961549e

File tree

2 files changed

+36
-34
lines changed

2 files changed

+36
-34
lines changed

package-lock.json

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/locales/en.ts

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ export const en = {
77
fileUploadError: 'Please upload a valid csv file.',
88
removeButton: 'Remove',
99
dropzoneLabel:
10-
'Drag and drop your csv file here, click to select one of your own files or use the "Demo dataset" button',
10+
'Drag and drop your csv file here, click to select a local file or use the "Demo dataset" button',
1111
datasetPreview: 'Dataset preview showing the first 5 rows.',
1212
error: 'Sorry, something went wrong.',
1313
loadingMessage: 'Setting up environment...',
@@ -30,14 +30,14 @@ export const en = {
3030
dataSet: 'Dataset',
3131
dataSetTooltip: `Preprocess your data such that:
3232
- missing values are removed or replaced;
33-
- all columns (except your bias metric column) should have the same datatypes, e.g., numerical or categorical;
34-
- the bias metric column is numerical`,
35-
performanceMetric: 'Bias metric',
33+
- all columns (except your outcome label column) should have the same datatypes, e.g., numerical or categorical;
34+
- the outcome label column is numerical`,
35+
performanceMetric: 'Outcome label',
3636
performanceMetricTooltip:
37-
'Clustering will be performed on the bias metrics. The bias metric should be numerical. Examples of bias metrics are "being classified as high risk" or "selected for an investigation"',
37+
'Clustering will be performed on the outcome labels. The outcome label should be numerical. Examples of outcome labels are "being classified as high risk" or "selected for an investigation"',
3838
dataType: 'Type of data',
3939
dataTypeTooltip:
40-
'Specify whether the data are categorical or numerical. All columns (except your bias metric column) should have the same data type',
40+
'Specify whether the data are categorical or numerical. All columns (except your outcome label column) should have the same data type',
4141
categoricalData: 'Categorical data',
4242
numericalData: 'Numerical data',
4343
filterSelect:
@@ -48,11 +48,11 @@ export const en = {
4848
iterations: 'Iterations',
4949
minClusterSize: 'Minimal cluster size',
5050
performanceInterpretation: {
51-
title: 'Bias metric interpretation',
52-
lower: 'Lower value of bias metric is better, such as error rate',
53-
higher: 'Higher value of bias metric is better, such as accuracy',
51+
title: 'Outcome label interpretation',
52+
lower: 'Lower value of outcome label is better, such as error rate',
53+
higher: 'Higher value of outcome label is better, such as accuracy',
5454
tooltip:
55-
'When error rate or misclassifications are chosen as the bias metric, a lower value is preferred, as the goal is to minimize errors. Conversely, when accuracy or precision is selected as the bias metric, a higher value is preferred, reflecting the aim to maximize performance.',
55+
'When error rate or misclassifications are chosen as the outcome label, a lower value is preferred, as the goal is to minimize errors. Conversely, when accuracy or precision is selected as the outcome label, a higher value is preferred, reflecting the aim to maximize performance.',
5656
},
5757
iterationsTooltip:
5858
'Number of times the dataset is split in smaller clusters until the minimal cluster size is reached',
@@ -62,7 +62,7 @@ export const en = {
6262
},
6363
errors: {
6464
csvRequired: 'Please upload a csv file.',
65-
targetColumnRequired: 'Please select a bias metric.',
65+
targetColumnRequired: 'Please select a outcome label.',
6666
dataTypeRequired: 'Please select a data type.',
6767
noNumericColumns:
6868
'No numeric columns found. Please upload a valid dataset.',
@@ -83,7 +83,7 @@ export const en = {
8383
},
8484
demoCard: {
8585
title: 'Try it out!',
86-
description: 'Alternatively, use our demo dataset.',
86+
description: "Use our demo dataset if you don't have a dataset at hand",
8787
},
8888
},
8989
syntheticData: {
@@ -163,7 +163,7 @@ export const en = {
163163
correlationRealdata: 'Correlation matrix',
164164
correlationSyntheticData: 'Correlation matrix',
165165
dataSetPreview: {
166-
heading: '0. Preview of real data',
166+
heading: '1. Preview of data',
167167
},
168168
columnsInDataset: '1. Data types detection',
169169
handlingMissingDataTitle: '2. Handling missing data',
@@ -255,17 +255,17 @@ we recommend to impute the missing data. For {tooltip:syntheticData.missingDataM
255255
},
256256

257257
biasAnalysis: {
258-
testingStatisticalSignificance: `**4. Testing statistical significance for the bias score difference between the most deviating cluster and the rest of the dataset**
258+
testingStatisticalSignificance: `**5. Testing cluster differences wrt. outcome labels**
259259
260-
- <i class="font-serif">H</i><sub>0</sub>: no difference in bias between the most deviating cluster and the rest of the dataset
261-
- <i class="font-serif">H</i><sub>1</sub>: difference in bias between the most deviating cluster and the rest of the dataset
260+
- <i class="font-serif">H</i><sub>0</sub>: no difference in outcome labels between the most deviating cluster and the rest of the dataset
261+
- <i class="font-serif">H</i><sub>1</sub>: difference in outcome labels between the most deviating cluster and the rest of the dataset
262262
263263
A two-sided t-test is performed to accept or reject <i class="font-serif">H</i><sub>0</sub>:.
264264
265265
p_value : {{p_val}}
266266
`,
267267
dataSetPreview: {
268-
heading: '0. Preview of real data',
268+
heading: '1. Preview of data',
269269
},
270270
demo: {
271271
heading: 'Information about demo dataset',
@@ -283,18 +283,20 @@ p_value : {{p_val}}
283283
| false_positive | Defendant predicted to reoffend, but didn't | 0: no FP, 1: FP |
284284
285285
286-
In this example, we analyze which group is most adversely affected by the risk prediction algorithm. We do this by applying the clustering algorithm on the dataset previewed below. The column "is_recid" indicates whether a defendant reoffended or not (1: yes, 0: no). The "score_text" column indicates whether a defendant was predicted to reoffend (1: yes, 0: no). The column "false_positive" (FP) represents cases where a defendant was predicted to reoffended by the algorithm, but didn't do so (1: FP, 0: no FP). A preview of the data can be found below. The column "false_positive" is used as the "bias metric".
286+
<br>
287+
288+
In this example, we analyze which group is most adversely affected by the risk prediction algorithm. We do this by applying the clustering algorithm on the dataset previewed below. The column "is_recid" indicates whether a defendant reoffended or not (1: yes, 0: no). The "score_text" column indicates whether a defendant was predicted to reoffend (1: yes, 0: no). The column "false_positive" (FP) represents cases where a defendant was predicted to reoffended by the algorithm, but didn't do so (1: FP, 0: no FP). A preview of the data can be found below. The column "false_positive" is used as the outcome label.
287289
`,
288290
},
289291
parameters: {
290-
heading: '1. Parameters selected for clustering',
292+
heading: '2. Parameters selected for clustering',
291293
iterations: 'Number of iterations: {{value}}',
292294
minClusterSize: 'Minimal cluster size: {{value}}',
293-
performanceMetric: 'Performance metric column: {{value}}',
295+
performanceMetric: 'Outcome label: {{value}}',
294296
dataType: 'Data type: {{value}}',
295297
description: `- Number of iterations: {{iterations}}
296298
- Minimal cluster size: {{minClusterSize}}
297-
- Performance metric column: {{performanceMetric}}
299+
- Outcome label: {{performanceMetric}}
298300
- Data type: {{dataType}}
299301
`,
300302
},
@@ -334,32 +336,32 @@ In this example, we analyze which group is most adversely affected by the risk p
334336
},
335337
nodifference: {
336338
heading:
337-
'No significant difference in average bias metric between the most biased cluster and the rest of the dataset.',
339+
'No significant difference in average uitkomstlabel between the most biased cluster and the rest of the dataset.',
338340
},
339341
distribution: {
340-
mainHeading: '5. Cluster characteristics',
342+
mainHeading: '5. Testing cluster differences wrt. features',
341343
heading:
342-
'The "{{variable}}" variable distribution across the different clusters:',
344+
'"{{variable}}" distribution across the different clusters:',
343345
},
344346
splittingDataset: {
345-
heading: '2. Splitting dataset',
346-
description: `To reduce the possibility that the clustering method detects noise, the dataset is split in a train (80%) and test dataset (20%). The clustering method is first fitted on the train dataset. Then, the presence of statistically significant bias in the most deviating clusters is evaluated using the test dataset.`,
347+
heading: '3. Splitting dataset',
348+
description: `To reduce the possibility that the clustering method detects noise, the dataset is split in a train (80%) and test dataset (20%). The clustering method is first fitted on the train dataset. Then, the presence of statistically significant signal in the most deviating clusters is evaluated using the test dataset.`,
347349
},
348350
clusterinResults: {
349-
heading: '3. Clustering results',
351+
heading: '4. Clustering results',
350352
description: `
351353
- Number of clusters detected: {{clusterCount}}
352354
`,
353355
label: 'Choose cluster to show number of datapoints for',
354356
valueText: 'Number of datapoints in cluster {{index}}: {{value}}',
355357
},
356-
higherAverage: `The most biased cluster has a statistically significant higher average bias score than the rest of the dataset.`,
357-
noSignificance: `No statistically significant difference in average bias score between the most biased cluster and the rest of the dataset.`,
358+
higherAverage: `The most deviating cluster has statistically significant different outcome labels than the rest of the dataset.`,
359+
noSignificance: `No statistically significant difference in outcome labels between the most biased cluster and the rest of the dataset.`,
358360

359-
conclusion: `6. Conclusion and bias report`,
361+
conclusion: `7. Conclusion and bias report`,
360362
conclusionDescription: `From the above figures and statistical tests, it can be concluded that:`,
361363

362-
moreInformationHeading: `7. More information`,
364+
moreInformationHeading: `8. More information`,
363365
moreInformationDescription: `- [Scientific article](https://arxiv.org/pdf/2502.01713)
364366
- [Github repository](https://github.com/NGO-Algorithm-Audit/unsupervised-bias-detection)`,
365367
},

0 commit comments

Comments
 (0)