You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/person_sampling.md
+25-4Lines changed: 25 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,6 +55,13 @@ Uses curated Nemotron Personas datasets from NVIDIA GPU Cloud (NGC) to generate
55
55
56
56
The NGC datasets are extended versions of the [open-source Nemotron Personas datasets on HuggingFace](https://huggingface.co/collections/nvidia/nemotron-personas), with additional fields and enhanced data quality.
57
57
58
+
Supported locales:
59
+
-`en_US`: United States
60
+
-`ja_JP`: Japan
61
+
-`en_IN`: India
62
+
-`hi_Deva_IN`: India (Devanagari script)
63
+
-`hi_Latn_IN`: India (Latin script)
64
+
58
65
### Features
59
66
-**Demographically accurate personal details**: Names, ages, sex, marital status, education, occupation based on census data
60
67
-**Rich persona details**: Comprehensive behavioral profiles including:
@@ -80,7 +87,22 @@ You need to download the Nemotron Personas datasets that you want to use from NG
80
87
export NGC_API_KEY="your-ngc-api-key-here"
81
88
```
82
89
83
-
#### Step 2: Download Nemotron Personas Datasets
90
+
#### Step 2 (option 1): Download Nemotron Personas Datasets via the Data Designer CLI
91
+
92
+
Once you have the NGC CLI and your NGC API key set up, you can download the datasets via the Data Designer CLI.
93
+
94
+
You can pass the locales you want to download as arguments to the CLI command:
95
+
```bash
96
+
data-designer download personas --locale en_US --locale ja_JP
97
+
```
98
+
99
+
Or you can use the interactive mode to select the locales you want to download:
100
+
```bash
101
+
data-designer download personas
102
+
```
103
+
104
+
#### Step 2 (option 2): Download Nemotron Personas Datasets Directly
105
+
84
106
Use the NGC CLI to download the datasets:
85
107
```bash
86
108
# For Nemotron Personas USA
@@ -156,7 +178,7 @@ See the [`SamplerColumnConfig`](../api/columns.md#samplercolumnconfig) documenta
0 commit comments