You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/person_sampling.md
+24-16Lines changed: 24 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,20 +6,21 @@ Person sampling in Data Designer allows you to generate synthetic person data fo
6
6
7
7
Data Designer provides two ways to generate synthetic people:
8
8
9
-
1.**Faker-based sampling** - Quick, basic PII generation for testing
9
+
1.**Faker-based sampling** - Quick, basic PII generation for testing or when realistic demographic distributions are not relevant for your use case
10
10
2.**Nemotron Personas datasets** - Demographically accurate, rich persona data
11
11
12
12
---
13
13
14
14
## Approach 1: Faker-Based Sampling
15
15
16
16
### What It Does
17
-
Uses the Faker library to generate random personal information. The data is basic and not demographically accurate, but is useful for quick testing and prototyping.
17
+
Uses the Faker library to generate random personal information. The data is basic and not demographically accurate, but is useful for quick testing, prototyping, or when realistic demographic distributions are not relevant for your use case.
18
18
19
19
### Features
20
-
-Leverages all PII data features that Faker exposes
20
+
-Gives you access to person attributes that Faker exposes
21
21
- Quick to set up with no additional downloads
22
22
- Generates random names, emails, addresses, phone numbers, etc.
-**Not demographically grounded** - data patterns don't reflect real-world demographics
24
25
25
26
### Usage Example
@@ -35,21 +36,25 @@ config_builder.add_column(
35
36
name="customer",
36
37
sampler_type=SamplerType.PERSON_FROM_FAKER,
37
38
params=PersonFromFakerSamplerParams(
38
-
locale="en_US",# Any Faker-supported locale
39
-
age_range=[25, 65],# Optional: filter by age range
40
-
sex="Female",# Optional: filter by sex ("Male" or "Female")
39
+
locale="en_US",
40
+
age_range=[25, 65],
41
+
sex="Female",
41
42
),
42
43
)
43
44
)
44
45
```
45
46
47
+
See the [`SamplerColumnConfig`](../api/columns.md#samplercolumnconfig) documentation for more details.
48
+
46
49
---
47
50
48
51
## Approach 2: Nemotron Personas Datasets
49
52
50
53
### What It Does
51
54
Uses curated Nemotron Personas datasets from NVIDIA GPU Cloud (NGC) to generate demographically accurate person data with rich personality profiles and behavioral characteristics.
52
55
56
+
The NGC datasets are extended versions of the [open-source Nemotron Personas datasets on HuggingFace](https://huggingface.co/collections/nvidia/nemotron-personas), with additional fields and enhanced data quality.
57
+
53
58
### Features
54
59
-**Demographically accurate personal details**: Names, ages, sex, marital status, education, occupation based on census data
55
60
-**Rich persona details**: Comprehensive behavioral profiles including:
0 commit comments