Skip to content

Commit b8a278a

Browse files
Merge pull request #11 from NGO-Algorithm-Audit/main
update branch
2 parents 7094ecb + 4261a8f commit b8a278a

File tree

2 files changed

+121
-100
lines changed

2 files changed

+121
-100
lines changed

README.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ python setup.py install
3535
# Examples
3636

3737
#### Social Diagnosis 2011 dataset
38-
We will use the Social Diagnosis 2011 dataset as an example, which is a comprehensive survey conducted in Poland. This dataset includes a wide range of variables related to the social and economic conditions of Polish households and individuals. It covers aspects such as income, employment, education, health, and overall quality of life.
38+
We will use the [Social Diagnosis 2011](https://search.r-project.org/CRAN/refmans/synthpop/html/SD2011.html) dataset as an example, which is a comprehensive survey conducted in Poland. This dataset includes a wide range of variables related to the social and economic conditions of Polish households and individuals. It covers aspects such as income, employment, education, health, and overall quality of life.
3939

4040
```
4141
In [1]: import pandas as pd
@@ -54,7 +54,7 @@ Out[2]:
5454

5555
### python-synthpop
5656

57-
Using default parameters the six steps are applied on the Social Diagnosis example to generate synthetic data. See also [link](./example_notebooks/00_readme.ipynb).
57+
Using default parameters the six steps are applied on the Social Diagnosis example to generate synthetic data. See also the corresponding [notebook](./example_notebooks/00_readme.ipynb).
5858

5959
```
6060
In [1]: from synthpop import MissingDataHandler, DataProcessor, CARTMethod
@@ -173,4 +173,36 @@ In [11]: # 6. Evaluate the synthetic data
173173
4 ls categorical 1.0 N/A N/A N/A 0.9224 N/A 0.857143 1.0
174174
5 smoke categorical 1.0 N/A N/A N/A 0.9754 N/A 1.0 1.0
175175
176+
In [12]: # 6.2 Efficacy metrics
177+
178+
# 6.2.1 Regression
179+
reg_efficacy = EfficacyMetrics(task='regression', target_column="income")
180+
reg_metrics = reg_efficacy.evaluate(real_df, synthetic_df)
181+
print("=== Regression Efficacy Metrics ===")
182+
print(reg_metrics)
183+
184+
=== Regression Efficacy Metrics ===
185+
{'mse': 1669726.6979087007, 'mae': 904.2202005090558, 'r2': -0.19619130295207743}
186+
187+
In [13]: # 6.2.2 Classification
188+
clf_efficacy = EfficacyMetrics(task='classification', target_column="smoke")
189+
clf_metrics = clf_efficacy.evaluate(real_df, synthetic_df)
190+
print("\n=== Classification Efficacy Metrics ===")
191+
print(clf_metrics)
192+
193+
=== Classification Efficacy Metrics ===
194+
{'accuracy': 0.6058, 'f1_score': 0.6184739077074358}
195+
196+
In [14]: # 6.3 Privacy
197+
dp = DisclosureProtection(real_df, synthetic_df)
198+
dp_score = dp.score()
199+
dp_report = dp.report()
200+
201+
print("\n=== Disclosure Protection ===")
202+
print(f"Score: {dp_score:.3f}")
203+
print("Detailed Report:", dp_report)
204+
205+
=== Disclosure Protection ===
206+
Score: 1.000
207+
Detailed Report: {'threshold': 0.0, 'risk_rate': 0.0, 'disclosure_protection_score': 1.0}
176208
```

0 commit comments

Comments
 (0)