Skip to content

Commit 698ea45

Browse files
authored
Merge pull request #92 from NGO-Algorithm-Audit/JFP_edits
Textual edits web app SDG NL EN
2 parents 885628c + d11205d commit 698ea45

File tree

2 files changed

+45
-24
lines changed

2 files changed

+45
-24
lines changed

src/locales/en.ts

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ export const en = {
9797
description:
9898
'A subset of the [Law School Admission Bar](https://www.kaggle.com/datasets/danofer/law-school-admissions-bar-passage)* dataset is used as a demo. Synthetic data will be generated for the following variables:',
9999
'post.description':
100-
'The CART method is used to generate the synthetic data. CART generally produces high quality synthetic data, but might not work well on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases.\n \n*The original paper can be found [here](https://files.eric.ed.gov/fulltext/ED469370.pdf)',
100+
'<br> The CART method is used to generate the synthetic data. CART generally produces high quality synthetic data, but might not work well on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases.\n \n*The original paper can be found [here](https://files.eric.ed.gov/fulltext/ED469370.pdf)',
101101
'data.column.Variable_name': 'Variable name',
102102
'data.sex': 'sex',
103103
'data.race1': 'race1',
@@ -131,33 +131,33 @@ export const en = {
131131
columnsCountError: 'File may contain a maximum of 8 columns.',
132132
},
133133
fieldset: {
134-
sourceDataset: 'Source data',
134+
sourceDataset: 'Input',
135135
sdgMethod: {
136136
title: 'Method',
137137
cart: 'CART',
138138
gc: 'Gaussian Copula',
139139
tooltip:
140-
'By default, the CART method is used to generate synthetic data. CART generally produces higher quality synthetic data, but might not work well on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases.',
140+
'By default, the CART method is used to generate synthetic data. CART generally produces higher quality synthetic data, but might not work well on datasets with categorical variables with 20+ categories. Use Gaussian Copula in those cases',
141141
},
142142
nanTreatment: {
143143
title: 'NaN values treatment',
144144
drop: 'Drop rows with NaN values',
145145
impute: 'Impute NaN values',
146146
tooltip:
147-
'When using Gaussian Copula, you can choose how to handle missing values (NaN values) in your dataset. Dropping rows with NaN values removes them completely, while imputation replaces them with mean values for numerical columns and mode values for categorical columns.',
147+
"When using Gaussian Copula, you can choose how to handle missing values (NaN values) in your dataset. 'Drop rows with NaN values' removes them completely, while 'Imputate NaN values' replaces them with mean values for numerical columns and mode values for categorical columns",
148148
},
149149
samples: 'Number of synthetic datapoints',
150150
},
151151
actions: {
152-
tryItOut: 'Try it out',
152+
tryItOut: 'Demo dataset',
153153
runGeneration: 'Run synthetic data generation',
154154
analyzing: 'Analyzing...',
155155
initializing: 'Initialising...',
156156
},
157157
},
158158
demoCard: {
159159
title: 'Try it out!',
160-
description: 'No dataset at hand? Use our demo dataset.',
160+
description: 'No dataset at hand? Use our demo dataset',
161161
},
162162
columnsInDatasetInfo:
163163
'If the detected data types are incorrect, please change this locally in the source dataset before attaching it to the web app.',
@@ -168,7 +168,7 @@ export const en = {
168168
correlationRealdata: 'Correlation matrix',
169169
correlationSyntheticData: 'Correlation matrix',
170170
dataSetPreview: {
171-
heading: '1. Preview of data',
171+
heading: '0. Preview of data',
172172
},
173173
columnsInDataset: '1. Data types detection',
174174
handlingMissingDataTitle: '2. Handling missing data',
@@ -221,25 +221,35 @@ For categorical (or boolean) columns the following metrics are computed:
221221
efficacyMetricsDescription: `Efficacy metrics comparing real and synthetic datasets for downstream predictive tasks. The idea is to train a predictive model on synthetic data and evaluate its performance on real data. The type of metrics computed depends on the task:
222222
223223
For regression (when the target is numerical):
224-
- Mean Squared Error (MSE)
225-
- Mean Absolute Error (MAE)
226-
- R^2 Score
224+
- {tooltip:syntheticData.meanSquaredError}Mean squared error (MSE){/tooltip}
225+
- {tooltip:syntheticData.meanAbsoluteError}Mean Absolute Error (MAE){/tooltip}
226+
- {tooltip:syntheticData.R2}R² Score{/tooltip}
227227
228-
For classification (when the target is categorical/boolean):
229-
- Accuracy Score
230-
- Weighted F1 Score`,
228+
For classification (when the target is categorical):
229+
- {tooltip:syntheticData.accuracyScore}Accuracy Score{/tooltip}
230+
- {tooltip:syntheticData.weightedF1Score}Weighted F1 Score{/tooltip}`,
231231
disclosureProtectionTitle: 'Privacy metrics',
232-
disclosureProtectionDescription: `A class to compute the disclosure protection metric for synthetic data. This metric measures the proportion of synthetic records that are too similar (within a defined threshold) to real records, posing a disclosure risk.`,
233-
outputDataTitle: '5. Generated synthetic data',
232+
disclosureProtectionDescription: `The disclosure protection metric measures the proportion of synthetic data points that closely resemble real data points (within a predefined threshold), posing a risk of traceability to personal data. A low 'risk\_rate' and a high 'disclosure\_protection\_rate' indicate effective protection against the unintentional exposure of personal data.`,
233+
outputDataTitle: '5. Download synthetic data and evaluation report',
234234
moreInfoTitle: '6. More information',
235+
meanSquaredError:
236+
'Average squared difference between predicted and actual values, quantifying the accuracy of a model’s predictions by penalizing larger errors more heavily',
237+
meanAbsoluteError:
238+
'Average magnitude of the errors between predicted and actual values, providing a straightforward assessment of model accuracy without emphasizing large errors',
239+
R2:
240+
'Quantifies how well a model’s predictions match the actual data by measuring the proportion of variance in the target variable explained by the model',
241+
accuracyScore:
242+
'Measures the proportion of correctly predicted instances out of the total instances, providing an overall assessment of a model’s performance in classification tasks',
243+
weightedF1Score:
244+
'Harmonic mean of precision and recall, calculated for each class and weighted by the class’s support (number of true instances), providing a balanced performance measure for imbalanced datasets',
235245
correlationDifference:
236246
'Correlation difference: {{correlationDifference}}',
237247
univariateText:
238-
'{{samples}} synthetic data points are generated using CART. The figures below display the value frequency for each variable. The synthetic data is of high quality when the frequencies are approximately the same.',
248+
'<br>{{samples}} synthetic data points are generated using CART. The figures below display the value frequency for each variable. The synthetic data is of high quality when the frequencies are approximately the same.',
239249
bivariateText:
240250
'The figures below display the differences in value frequency for a combination of variables. For comparing two categorical variables, bar charts are plotted. For comparing a numerical and a categorical variables, a so called [violin plot](https://en.wikipedia.org/wiki/Violin_plot) is shown. For comparing two numercial variables, a [LOESS plot](https://en.wikipedia.org/wiki/Local_regression) is created. For all plots holds: the synthetic data is of high quality when the shape of the distributions in the synthetic data equal the distributions in the real data.',
241251
moreInfo:
242-
'Do you want to learn more about synthetic data?\n \n \n \n- [python-synthpop on Github](https://github.com/NGO-Algorithm-Audit/python-synthpop)\n- [local-first web app on Github](https://github.com/NGO-Algorithm-Audit/local-first-web-tool/tree/main)\n- [Synthetic Data: what, why and how?](https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/Synthetic_Data_Survey-24.pdf)\n- [Knowledge Network Synthetic Data](https://online.rijksinnovatiecommunity.nl/groups/399-kennisnetwerk-synthetischedata/welcome) (for Dutch public organizations)\n- [Synthetic data portal of Dutch Executive Agency for Education](https://duo.nl/open_onderwijsdata/footer/synthetische-data.jsp) (DUO)\n- [CART: synthpop resources](https://synthpop.org.uk/resources.html)\n- [Gaussian Copula - Synthetic Data Vault](https://docs.sdv.dev/sdv)',
252+
'Do you want to learn more about synthetic data?\n \n \n \n- [python-synthpop on Github](https://github.com/NGO-Algorithm-Audit/python-synthpop)\n- [local-first web app on Github](https://github.com/NGO-Algorithm-Audit/local-first-web-tool/tree/main)\n- [Synthetic Data: what, why and how?](https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/Synthetic_Data_Survey-24.pdf)\n- [Knowledge Network Synthetic Data](https://online.rijksinnovatiecommunity.nl/groups/399-kennisnetwerk-synthetischedata/welcome) (Dutch public organizations)\n- [Synthetic data portal of Dutch Executive Agency for Education](https://duo.nl/open_onderwijsdata/footer/synthetische-data.jsp) (DUO)\n- [CART: synthpop resources](https://synthpop.org.uk/resources.html)\n- [Gaussian Copula - Synthetic Data Vault](https://docs.sdv.dev/sdv)',
243253
missingData: `For {tooltip:syntheticData.missingDataMARTooltip}Missing At Random (MAR){/tooltip} and {tooltip:syntheticData.missingDataMNARTooltip}Missing Not At Random (MNAR){/tooltip} data,
244254
we recommend to impute the missing data. For {tooltip:syntheticData.missingDataMCARTooltip}Missing Completely At Random (MCAR){/tooltip}, we recommend to remove the missing data.`,
245255
missingDataMARTooltip: `**MAR (Missing At Random)**:

src/locales/nl.ts

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ export const nl = {
9494
description:
9595
'Een subset van de [Law School Admission Bar](https://www.kaggle.com/datasets/danofer/law-school-admissions-bar-passage)* dataset wordt gebruikt als demo. Synthetische data worden gegenereerd voor de volgende variablen:\n \n&nbsp;&nbsp;\n\n',
9696
'post.description':
97-
'De CART-methode wordt gebruikt om synthetische gegevens te genereren.\n CART produceert doorgaan een goede kwaliteit synthetische data, maar werkt minder goed voor data met categorische data met meer dan 20 categorieën. Gebruik in dit geval Gaussian Copula. \n&nbsp;&nbsp;\n\n*Het oorspronkelijke artikel is [hier](https://files.eric.ed.gov/fulltext/ED469370.pdf) te vinden.',
97+
'<br>De CART-methode wordt gebruikt om synthetische gegevens te genereren.\n CART produceert doorgaan een goede kwaliteit synthetische data, maar werkt minder goed voor data met categorische data met meer dan 20 categorieën. Gebruik in dit geval Gaussian Copula. \n&nbsp;&nbsp;\n\n*Het oorspronkelijke artikel is [hier](https://files.eric.ed.gov/fulltext/ED469370.pdf) te vinden.',
9898
'data.column.Variable_name': 'Variabele name',
9999
'data.sex': 'sex',
100100
'data.race1': 'race1',
@@ -131,7 +131,7 @@ export const nl = {
131131
columnsCountError: 'File mag maximaal 8 kolommen bevatten.',
132132
},
133133
fieldset: {
134-
sourceDataset: 'Brondata',
134+
sourceDataset: 'Input',
135135
sdgMethod: {
136136
title: 'Methode',
137137
cart: 'CART',
@@ -149,17 +149,17 @@ export const nl = {
149149
samples: 'Aantal synthetische datapunten',
150150
},
151151
actions: {
152-
tryItOut: 'Uitproberen',
152+
tryItOut: 'Demo dataset',
153153
runGeneration: 'Start synthetische data generatie',
154154
analyzing: 'Analyseren...',
155155
initializing: 'Initialiseren...',
156156
},
157157
univariateText:
158-
'{{samples}} synthetic datapunten via de CART-methode gegeneerd. De grafieken tonen de frequentie waarmee een variabele een bepaalde waarde aanneemt. De synthetische data is van hoge kwaliteit als de frequenties ongeveer gelijke zijn.',
158+
'<br> {{samples}} synthetic datapunten via de CART-methode gegeneerd. De grafieken tonen de frequentie waarmee een variabele een bepaalde waarde aanneemt. De synthetische data is van hoge kwaliteit als de frequenties ongeveer gelijke zijn.',
159159
},
160160
demoCard: {
161161
title: 'Probeer het uit!',
162-
description: 'Geen dataset bij de hand? Gebruik onze demodata.',
162+
description: 'Geen dataset bij de hand? Gebruik onze demo dataset',
163163
},
164164
columnsInDatasetInfo:
165165
'Als de gedetecteerd data types niet correct zijn, pas dit dan lokaal aan in de dataset voordat u deze opnieuw aan de app koppelt.',
@@ -188,11 +188,22 @@ export const nl = {
188188
correlationMatrixTitle: 'Correlatie matrix',
189189
efficacyMetricsTitle: 'Doeltreffendheid metrieken',
190190
disclosureProtectionTitle: 'Privacy metrieken',
191-
outputDataTitle: '5. Output data',
191+
disclosureProtectionDescription: `De onthullings beschermings metriek meet het aandeel synthetische datapunten die te veel lijkt op echte datapunten (binnen een vooraf gedefinieerde drempelwaarde), wat een risico op herleidbaarheid naar persoonsgegevens vormt. Een lage 'risk_rate' en hoge 'disclosure_protection_rate' duidt op een goede bescherming tegen het onbedoeld prijsgeven van persoonsgegevens.`,
192+
outputDataTitle: '5. Download synthetische data en evaluatierapport',
192193
moreInfoTitle: '6. Meer informatie',
194+
meanSquaredError:
195+
'Gemiddeld kwadraatverschil tussen voorspelde en werkelijke waarden, dat de nauwkeurigheid van de voorspellingen van een model kwantificeert door grotere fouten zwaarder te bestraffen',
196+
meanAbsoluteError:
197+
'Gemiddelde grootte van de fouten tussen voorspelde en werkelijke waarden, die een eenvoudige beoordeling van de nauwkeurigheid van het model biedt zonder de nadruk te leggen op grote fouten',
198+
R2:
199+
'Kwantificeert hoe goed de voorspellingen van een model overeenkomen met de werkelijke gegevens door het aandeel van de variantie in de doelvariabele te meten dat door het model wordt verklaard',
200+
accuracyScore:
201+
'Meet het aandeel correct voorspelde gevallen ten opzichte van het totaal, en geeft zo een algemene beoordeling van de prestaties van het classificatiemodel',
202+
weightedF1Score:
203+
'Het harmonisch gemiddelde van precisie en recall, berekend per klasse en gewogen naar het aantal echte gevallen per klasse, wat een metriek biedt voor datasets met ongelijke klassenverdeling',
193204
correlationDifference: 'Correlatie verschil: {{correlationDifference}}',
194205
moreInfo:
195-
'&nbsp;&nbsp;\n \n \n \nWil je meer weten over synthetische data?\n \n \n \n- [python-synthpop op Github](https://github.com/NGO-Algorithm-Audit/python-synthpop)\n- [local-first web app op Github](https://github.com/NGO-Algorithm-Audit/local-first-web-tool/tree/main)\n- [Synthetische Data: wat, waarom en hoe?](https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/Synthetic_Data_Survey-24.pdf)\n- [Kennis Netwerk Synthetische Data](https://online.rijksinnovatiecommunity.nl/groups/399-kennisnetwerk-synthetischedata/welcome) (for Dutch public organizations)\n- [Synthetische data portaal van DUO](https://duo.nl/open_onderwijsdata/footer/synthetische-data.jsp)\n- [CART: synthpop resources](https://synthpop.org.uk/resources.html)\n- [Gaussian Copula - Synthetic Data Vault](https://docs.sdv.dev/sdv)',
206+
'&nbsp;&nbsp;\n \n \n \nWil je meer weten over synthetische data?\n \n \n \n- [python-synthpop op Github](https://github.com/NGO-Algorithm-Audit/python-synthpop)\n- [local-first web app op Github](https://github.com/NGO-Algorithm-Audit/local-first-web-tool/tree/main)\n- [Synthetische Data: wat, waarom en hoe?](https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/Synthetic_Data_Survey-24.pdf)\n- [Kennis Netwerk Synthetische Data](https://online.rijksinnovatiecommunity.nl/groups/399-kennisnetwerk-synthetischedata/welcome) (Nederlandse organisaties)\n- [Synthetische data portaal van DUO](https://duo.nl/open_onderwijsdata/footer/synthetische-data.jsp)\n- [CART: synthpop resources](https://synthpop.org.uk/resources.html)\n- [Gaussian Copula - Synthetic Data Vault](https://docs.sdv.dev/sdv)',
196207
},
197208
biasAnalysis: {
198209
dataSetPreview: {

0 commit comments

Comments
 (0)