Skip to content

Commit 3bcbf0b

Browse files
committed
Add efficacy metrics description and pre-content for synthetic data reports
1 parent f57d73d commit 3bcbf0b

File tree

2 files changed

+14
-0
lines changed

2 files changed

+14
-0
lines changed

src/assets/synthetic-data.tsx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,6 +386,10 @@ def run():
386386
'reportType': 'table',
387387
'titleKey': 'syntheticData.efficacyMetricsTitle',
388388
'showIndex' : False,
389+
'preContent' : [{
390+
'contentType': 'text',
391+
'textKey': 'syntheticData.efficacyMetricsDescription'
392+
}],
389393
'data': metrics_df.to_json(orient="records"),
390394
},
391395
{

src/locales/en.ts

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,16 @@ For categorical (or boolean) columns the following metrics are computed:
188188
correlationMatrixDescription: `The matrix below illustrates the differences in pairwise correlations between variables in the original and synthetic data.
189189
Green cells signify that the pairwise correlation was accurately captured, with 0 representing the best possible score. Red cells indicate poor capture of the pairwise correlation.`,
190190
efficacyMetricsTitle: 'Efficacy metrics',
191+
efficacyMetricsDescription: `Efficacy metrics comparing real and synthetic datasets for downstream predictive tasks. The idea is to train a predictive model on synthetic data and evaluate its performance on real data. The type of metrics computed depends on the task:
192+
193+
For regression (when the target is numerical):
194+
- Mean Squared Error (MSE)
195+
- Mean Absolute Error (MAE)
196+
- R^2 Score
197+
198+
For classification (when the target is categorical/boolean):
199+
- Accuracy Score
200+
- Weighted F1 Score`,
191201
disclosureProtectionTitle: 'Privacy metrics',
192202
disclosureProtectionDescription: `A class to compute the disclosure protection metric for synthetic data. This metric measures the proportion of synthetic records that are too similar (within a defined threshold) to real records, posing a disclosure risk.`,
193203
outputDataTitle: '5. Generated synthetic data',

0 commit comments

Comments
 (0)