You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/person_sampling.md
+24-13Lines changed: 24 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ Person sampling in Data Designer allows you to generate synthetic person data fo
7
7
Data Designer provides two ways to generate synthetic people:
8
8
9
9
1.**Faker-based sampling** - Quick, basic PII generation for testing or when realistic demographic distributions are not relevant for your use case
10
-
2.**NemotronPersonas datasets** - Demographically accurate, rich persona data
10
+
2.**Nemotron-Personas datasets** - Demographically accurate, rich persona data
11
11
12
12
---
13
13
@@ -44,18 +44,19 @@ config_builder.add_column(
44
44
)
45
45
```
46
46
47
-
See the [`SamplerColumnConfig`](../api/columns.md#samplercolumnconfig) documentation for more details.
47
+
For mor details, see the documentation for [`SamplerColumnConfig`](../code_reference/column_configs.md#data_designer.config.column_configs.SamplerColumnConfig) and [`PersonFromFakerSamplerParams`](../code_reference/sampler_params.md#data_designer.config.sampler_params.PersonFromFakerSamplerParams).
48
48
49
49
---
50
50
51
-
## Approach 2: NemotronPersonas Datasets
51
+
## Approach 2: Nemotron-Personas Datasets
52
52
53
53
### What It Does
54
-
Uses curated NemotronPersonas datasets from NVIDIA GPU Cloud (NGC) to generate demographically accurate person data with rich personality profiles and behavioral characteristics.
54
+
Uses curated Nemotron-Personas datasets from NVIDIA GPU Cloud (NGC) to generate demographically accurate person data with rich personality profiles and behavioral characteristics.
55
55
56
-
The NGC datasets are extended versions of the [open-source NemotronPersonas datasets on HuggingFace](https://huggingface.co/collections/nvidia/nemotron-personas), with additional fields and enhanced data quality.
56
+
The NGC datasets are extended versions of the [open-source Nemotron-Personas datasets on HuggingFace](https://huggingface.co/collections/nvidia/nemotron-personas), with additional fields and enhanced data quality.
57
57
58
58
Supported locales:
59
+
59
60
-`en_US`: United States
60
61
-`ja_JP`: Japan
61
62
-`en_IN`: India
@@ -75,19 +76,26 @@ Supported locales:
75
76
76
77
### Prerequisites
77
78
78
-
You need to download the Nemotron Personas datasets that you want to use from NGC, they are available [here](https://catalog.ngc.nvidia.com/search?orderBy=scoreDESC&query=nemotron+personas)
79
+
To use the extended Nemotron-Personas datasets with Data Designer, you need to download them [from NGC](https://catalog.ngc.nvidia.com/search?orderBy=scoreDESC&query=nemotron+personas) and move them to the Data Designer managed assets directory.
80
+
81
+
See below for step-by-step instructions.
82
+
83
+
### Nemotron-Personas Datasets Setup Instructions
84
+
85
+
#### Step 0: Obtain an NGC API Key and install the NGC CLI
86
+
87
+
To download the Nemotron-Personas datasets from NGC, you will need to obtain an NGC API key and install the NGC CLI.
79
88
80
89
1.**NGC API Key**: Obtain from [NVIDIA GPU Cloud](https://ngc.nvidia.com/)
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-en_us"
110
118
111
-
# For NemotronPersonas IN
119
+
# For Nemotron-Personas IN
112
120
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-hi_deva_in"
113
121
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-hi_latn_in"
114
122
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-en_in"
115
123
116
-
# For NemotronPersonas JP
124
+
# For Nemotron-Personas JP
117
125
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-ja_jp"
118
126
```
119
127
@@ -145,7 +153,7 @@ config_builder.add_column(
145
153
)
146
154
```
147
155
148
-
See the [`SamplerColumnConfig`](../api/columns.md#samplercolumnconfig) documentation for more details.
156
+
For more details, see the documentation for [`SamplerColumnConfig`](../code_reference/column_configs.md#data_designer.config.column_configs.SamplerColumnConfig) and [`PersonSamplerParams`](../code_reference/sampler_params.md#data_designer.config.sampler_params.PersonSamplerParams).
149
157
150
158
### Available Data Fields
151
159
@@ -176,9 +184,11 @@ See the [`SamplerColumnConfig`](../api/columns.md#samplercolumnconfig) documenta
0 commit comments