Skip to content

Commit 387fff1

Browse files
authored
Merge branch 'main' into ewt/extract-reasoning-content
2 parents 90f99ab + b6d400e commit 387fff1

25 files changed

+744
-411
lines changed

docs/blog/.authors.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
authors:
2+
nvidia:
3+
name: NVIDIA NeMo Data Designer Team
4+
description: NeMo Data Designer Core Team
5+
avatar: https://avatars.githubusercontent.com/u/1728152?s=200&v=4

docs/blog/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Dev Notes
2+
3+
Welcome to NeMo Data Designer Dev Notes! Here you'll find in-depth guides, tutorials, and insights about synthetic data generation.
4+
5+
<!-- Dev notes will automatically appear below -->

docs/blog/posts/welcome.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
date: 2026-01-22
3+
authors:
4+
- nvidia
5+
---
6+
7+
# Welcome to Data Designer Dev Notes
8+
9+
We're excited to launch the Data Designer Dev Notes section!
10+
11+
<!-- more -->
12+
13+
This space will feature in-depth technical articles, best practices, and real-world case studies from the Data Designer team and community.
14+
15+
## What to Expect
16+
17+
- **Deep Dives**: Detailed explorations of Data Designer's capabilities and architecture
18+
- **Best Practices**: Tips and patterns for building high-quality synthetic datasets
19+
- **Use Cases**: Real-world applications and success stories from the community
20+
- **Research Highlights**: Insights from cutting-edge work in synthetic data generation
21+
22+
## Stay Tuned
23+
24+
Watch this space for technical deep dives into synthetic data generation, model customization, and more. We'll be sharing insights from our work and the broader community.
25+
26+
In the meantime, check out our [Welcome guide](../../index.md) and [tutorial notebooks](../../notebooks/README.md) to get started with Data Designer.

docs/colab_notebooks/1-the-basics.ipynb

Lines changed: 32 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"id": "2af6c991",
5+
"id": "c79eea7a",
66
"metadata": {},
77
"source": [
88
"# 🎨 Data Designer Tutorial: The Basics\n",
@@ -14,7 +14,7 @@
1414
},
1515
{
1616
"cell_type": "markdown",
17-
"id": "2e365f58",
17+
"id": "2476f160",
1818
"metadata": {},
1919
"source": [
2020
"### 📦 Import Data Designer\n",
@@ -26,7 +26,7 @@
2626
},
2727
{
2828
"cell_type": "markdown",
29-
"id": "b6b60c35",
29+
"id": "3646f62e",
3030
"metadata": {},
3131
"source": [
3232
"### ⚡ Colab Setup\n",
@@ -37,7 +37,7 @@
3737
{
3838
"cell_type": "code",
3939
"execution_count": null,
40-
"id": "352ad006",
40+
"id": "3348e5c8",
4141
"metadata": {},
4242
"outputs": [],
4343
"source": [
@@ -48,7 +48,7 @@
4848
{
4949
"cell_type": "code",
5050
"execution_count": null,
51-
"id": "d48fcd7d",
51+
"id": "19cd9249",
5252
"metadata": {},
5353
"outputs": [],
5454
"source": [
@@ -66,7 +66,7 @@
6666
{
6767
"cell_type": "code",
6868
"execution_count": null,
69-
"id": "7fa5c652",
69+
"id": "5a6d13a9",
7070
"metadata": {},
7171
"outputs": [],
7272
"source": [
@@ -76,20 +76,20 @@
7676
},
7777
{
7878
"cell_type": "markdown",
79-
"id": "c70454d5",
79+
"id": "d445af5b",
8080
"metadata": {},
8181
"source": [
8282
"### ⚙️ Initialize the Data Designer interface\n",
8383
"\n",
84-
"- `DataDesigner` is the main object is responsible for managing the data generation process.\n",
84+
"- `DataDesigner` is the main object responsible for managing the data generation process.\n",
8585
"\n",
8686
"- When initialized without arguments, the [default model providers](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/models/default-model-settings/) are used.\n"
8787
]
8888
},
8989
{
9090
"cell_type": "code",
9191
"execution_count": null,
92-
"id": "bb2765d2",
92+
"id": "4df0031d",
9393
"metadata": {},
9494
"outputs": [],
9595
"source": [
@@ -98,7 +98,7 @@
9898
},
9999
{
100100
"cell_type": "markdown",
101-
"id": "8cd5279b",
101+
"id": "0f69b576",
102102
"metadata": {},
103103
"source": [
104104
"### 🎛️ Define model configurations\n",
@@ -115,7 +115,7 @@
115115
{
116116
"cell_type": "code",
117117
"execution_count": null,
118-
"id": "cd90811a",
118+
"id": "65d9be99",
119119
"metadata": {},
120120
"outputs": [],
121121
"source": [
@@ -145,7 +145,7 @@
145145
},
146146
{
147147
"cell_type": "markdown",
148-
"id": "ca306aca",
148+
"id": "72582d09",
149149
"metadata": {},
150150
"source": [
151151
"### 🏗️ Initialize the Data Designer Config Builder\n",
@@ -160,7 +160,7 @@
160160
{
161161
"cell_type": "code",
162162
"execution_count": null,
163-
"id": "8dda0de2",
163+
"id": "8d7992b4",
164164
"metadata": {},
165165
"outputs": [],
166166
"source": [
@@ -169,7 +169,7 @@
169169
},
170170
{
171171
"cell_type": "markdown",
172-
"id": "2bc4a65c",
172+
"id": "741a15a0",
173173
"metadata": {},
174174
"source": [
175175
"## 🎲 Getting started with sampler columns\n",
@@ -186,7 +186,7 @@
186186
{
187187
"cell_type": "code",
188188
"execution_count": null,
189-
"id": "5bc65ad1",
189+
"id": "c3879c70",
190190
"metadata": {},
191191
"outputs": [],
192192
"source": [
@@ -195,7 +195,7 @@
195195
},
196196
{
197197
"cell_type": "markdown",
198-
"id": "34f81d0b",
198+
"id": "1575ef81",
199199
"metadata": {},
200200
"source": [
201201
"Let's start designing our product review dataset by adding product category and subcategory columns.\n"
@@ -204,7 +204,7 @@
204204
{
205205
"cell_type": "code",
206206
"execution_count": null,
207-
"id": "2318c75a",
207+
"id": "87a88d7b",
208208
"metadata": {},
209209
"outputs": [],
210210
"source": [
@@ -285,7 +285,7 @@
285285
},
286286
{
287287
"cell_type": "markdown",
288-
"id": "f8c68da2",
288+
"id": "8c74b738",
289289
"metadata": {},
290290
"source": [
291291
"Next, let's add samplers to generate data related to the customer and their review.\n"
@@ -294,7 +294,7 @@
294294
{
295295
"cell_type": "code",
296296
"execution_count": null,
297-
"id": "b8cb78e5",
297+
"id": "4eb1da1f",
298298
"metadata": {},
299299
"outputs": [],
300300
"source": [
@@ -331,7 +331,7 @@
331331
},
332332
{
333333
"cell_type": "markdown",
334-
"id": "57b604d5",
334+
"id": "4324d869",
335335
"metadata": {},
336336
"source": [
337337
"## 🦜 LLM-generated columns\n",
@@ -346,7 +346,7 @@
346346
{
347347
"cell_type": "code",
348348
"execution_count": null,
349-
"id": "b615760d",
349+
"id": "1302a503",
350350
"metadata": {},
351351
"outputs": [],
352352
"source": [
@@ -382,7 +382,7 @@
382382
},
383383
{
384384
"cell_type": "markdown",
385-
"id": "eeb124f7",
385+
"id": "7cf8241b",
386386
"metadata": {},
387387
"source": [
388388
"### 🔁 Iteration is key – preview the dataset!\n",
@@ -399,7 +399,7 @@
399399
{
400400
"cell_type": "code",
401401
"execution_count": null,
402-
"id": "fffc9d8e",
402+
"id": "6fc6cf39",
403403
"metadata": {},
404404
"outputs": [],
405405
"source": [
@@ -409,7 +409,7 @@
409409
{
410410
"cell_type": "code",
411411
"execution_count": null,
412-
"id": "643ee6a1",
412+
"id": "c929e068",
413413
"metadata": {},
414414
"outputs": [],
415415
"source": [
@@ -420,7 +420,7 @@
420420
{
421421
"cell_type": "code",
422422
"execution_count": null,
423-
"id": "69dfb926",
423+
"id": "dfb04e2a",
424424
"metadata": {},
425425
"outputs": [],
426426
"source": [
@@ -430,7 +430,7 @@
430430
},
431431
{
432432
"cell_type": "markdown",
433-
"id": "4f38ca0d",
433+
"id": "adb879da",
434434
"metadata": {},
435435
"source": [
436436
"### 📊 Analyze the generated data\n",
@@ -443,7 +443,7 @@
443443
{
444444
"cell_type": "code",
445445
"execution_count": null,
446-
"id": "a044678d",
446+
"id": "ff58dd9f",
447447
"metadata": {},
448448
"outputs": [],
449449
"source": [
@@ -453,7 +453,7 @@
453453
},
454454
{
455455
"cell_type": "markdown",
456-
"id": "8c7147ab",
456+
"id": "57c7355d",
457457
"metadata": {},
458458
"source": [
459459
"### 🆙 Scale up!\n",
@@ -466,7 +466,7 @@
466466
{
467467
"cell_type": "code",
468468
"execution_count": null,
469-
"id": "d6a92b76",
469+
"id": "df49db99",
470470
"metadata": {},
471471
"outputs": [],
472472
"source": [
@@ -476,7 +476,7 @@
476476
{
477477
"cell_type": "code",
478478
"execution_count": null,
479-
"id": "fb888621",
479+
"id": "2bbc48dd",
480480
"metadata": {},
481481
"outputs": [],
482482
"source": [
@@ -489,7 +489,7 @@
489489
{
490490
"cell_type": "code",
491491
"execution_count": null,
492-
"id": "521d34ae",
492+
"id": "dc0673fa",
493493
"metadata": {},
494494
"outputs": [],
495495
"source": [
@@ -501,7 +501,7 @@
501501
},
502502
{
503503
"cell_type": "markdown",
504-
"id": "910b21a2",
504+
"id": "7688217b",
505505
"metadata": {},
506506
"source": [
507507
"## ⏭️ Next Steps\n",

0 commit comments

Comments
 (0)