Skip to content

Commit f3e540b

Browse files
committed
Update readme
1 parent 4bcb65f commit f3e540b

File tree

3 files changed

+11
-9
lines changed

3 files changed

+11
-9
lines changed

README.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
# Synthpop
1+
![image](https://github.com/NGO-Algorithm-Audit/python-synthpop/blob/main/images/header.png)
22

3-
Python implementation of the R package synthpop.
3+
# python-synthpop
4+
5+
Python implementation of the R package [synthpop](https://cran.r-project.org/web/packages/synthpop/index.html).
46

57
The R implementation of synthpop is a tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the dataset. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models.
68

@@ -11,24 +13,24 @@ This project is in Alpha status and the roadmap can be found here.
1113

1214
# Installation
1315

14-
Pip
16+
#### Pip
1517

1618
```
17-
pip install py-synthpop
19+
pip install python-synthpop
1820
```
1921

20-
Source
22+
#### Source
2123

2224
```
23-
git clone <url>
24-
cd synthpop
25+
git clone https://github.com/NGO-Algorithm-Audit/python-synthpop.git
26+
cd python-synthpop
2527
pip install -r requirements.txt
2628
python setup.py install
2729
```
2830

2931
# Examples
3032

31-
Adult dataset
33+
#### Adult dataset
3234
We will use the US adult census dataset, which is a freely available open dataset extracted from the US census bureau database. The dataset is initially designed for a binary classification problem and the task is to predict whether a person earns over $50,000 a year. The dataset is a mixture of discrete and continuous features, including age, working status (workclass), education, marital status, race, sex, relationship and hours worked per week.
3335

3436
```

images/Header.png

98.8 KB
Loading

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
numpy>=1.20.0
22
pandas>=1.3.0
33
scikit-learn>=1.0.0
4-
pytest>=7.0.0
4+
pytest>=7.0.0

0 commit comments

Comments
 (0)