Skip to content

Commit b635e41

Browse files
authored
update docs (#151)
1 parent 0a60f86 commit b635e41

File tree

1 file changed

+25
-4
lines changed

1 file changed

+25
-4
lines changed

docs/concepts/person_sampling.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,13 @@ Uses curated Nemotron Personas datasets from NVIDIA GPU Cloud (NGC) to generate
5555

5656
The NGC datasets are extended versions of the [open-source Nemotron Personas datasets on HuggingFace](https://huggingface.co/collections/nvidia/nemotron-personas), with additional fields and enhanced data quality.
5757

58+
Supported locales:
59+
- `en_US`: United States
60+
- `ja_JP`: Japan
61+
- `en_IN`: India
62+
- `hi_Deva_IN`: India (Devanagari script)
63+
- `hi_Latn_IN`: India (Latin script)
64+
5865
### Features
5966
- **Demographically accurate personal details**: Names, ages, sex, marital status, education, occupation based on census data
6067
- **Rich persona details**: Comprehensive behavioral profiles including:
@@ -80,7 +87,22 @@ You need to download the Nemotron Personas datasets that you want to use from NG
8087
export NGC_API_KEY="your-ngc-api-key-here"
8188
```
8289

83-
#### Step 2: Download Nemotron Personas Datasets
90+
#### Step 2 (option 1): Download Nemotron Personas Datasets via the Data Designer CLI
91+
92+
Once you have the NGC CLI and your NGC API key set up, you can download the datasets via the Data Designer CLI.
93+
94+
You can pass the locales you want to download as arguments to the CLI command:
95+
```bash
96+
data-designer download personas --locale en_US --locale ja_JP
97+
```
98+
99+
Or you can use the interactive mode to select the locales you want to download:
100+
```bash
101+
data-designer download personas
102+
```
103+
104+
#### Step 2 (option 2): Download Nemotron Personas Datasets Directly
105+
84106
Use the NGC CLI to download the datasets:
85107
```bash
86108
# For Nemotron Personas USA
@@ -156,7 +178,7 @@ See the [`SamplerColumnConfig`](../api/columns.md#samplercolumnconfig) documenta
156178
**Japan-Specific Fields (`ja_JP`):**
157179
- `area`
158180

159-
**India-Specific Fields (`en_IN`, `hi_IN`):**
181+
**India-Specific Fields (`en_IN`, `hi_IN`, `hi_Deva_IN`, `hi_Latn_IN`):**
160182
- `religion` - Census-reported religion
161183
- `education_degree` - Census-reported education degree
162184
- `first_language` - Native language
@@ -176,10 +198,9 @@ See the [`SamplerColumnConfig`](../api/columns.md#samplercolumnconfig) documenta
176198

177199
| Parameter | Type | Description |
178200
|-----------|------|-------------|
179-
| `locale` | str | Language/region code - must be one of: "en_US", "ja_JP", "en_IN", "hi_IN" |
201+
| `locale` | str | Language/region code - must be one of: "en_US", "ja_JP", "en_IN", "hi_Deva_IN", "hi_Latn_IN" |
180202
| `sex` | str (optional) | Filter by "Male" or "Female" |
181203
| `city` | str or list[str] (optional) | Filter by specific city or cities within locale |
182204
| `age_range` | list[int] (optional) | Two-element list [min_age, max_age] (default: [18, 114]) |
183205
| `with_synthetic_personas` | bool (optional) | Include rich personality profiles (default: False) |
184206
| `select_field_values` | dict (optional) | Custom field-based filtering (e.g., {"state": ["NY", "CA"], "education_level": ["bachelors"]}) |
185-

0 commit comments

Comments
 (0)