Skip to content

Commit eacc077

Browse files
committed
Update README.md
1 parent 9f59d21 commit eacc077

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
Python implementation of the R package [synthpop](https://cran.r-project.org/web/packages/synthpop/index.html).
66

7-
The R implementation of synthpop is a tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the dataset. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models.
7+
This library produces synthetic versions of tabular data containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the dataset. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models.
88

99
This is a reimplementation in Python which allows synthetic data to be generated via the method .generate() after the algorithm had been fit to the original data via the method .fit(). The process can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthetic data.
1010

@@ -46,12 +46,12 @@ Out[2]:
4646
4 28 Private 338409 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K
4747
```
4848

49-
### synthpop
49+
### python-synthpop
5050

5151
Use default parameters for the Adult dataset:
5252

5353
```
54-
In [1]: from synthpop import Synthpop
54+
In [1]: from python-synthpop import Synthpop
5555
5656
In [2]: from datasets.adult import df, dtypes
5757
@@ -161,7 +161,7 @@ income 1 1 1 1 1
161161
### Define the visit sequence for the Adult dataset:
162162

163163
```
164-
In [1]: from synthpop import Synthpop
164+
In [1]: from python-synthpop import Synthpop
165165
166166
In [2]: from datasets.adult import df, dtypes
167167
@@ -226,5 +226,4 @@ workclass 1 0 0 0 0
226226
fnlwgt 1 1 0 1 1
227227
education 1 1 0 0 1
228228
marital-status 1 1 0 0 0
229-
```
230-
229+
```

0 commit comments

Comments
 (0)