|
2 | 2 | "cells": [ |
3 | 3 | { |
4 | 4 | "cell_type": "markdown", |
5 | | - "id": "709c75cf", |
| 5 | + "id": "9f804f90", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | 8 | "# 🎨 Data Designer Tutorial: The Basics\n", |
|
14 | 14 | }, |
15 | 15 | { |
16 | 16 | "cell_type": "markdown", |
17 | | - "id": "3cb2774e", |
| 17 | + "id": "9cb786eb", |
18 | 18 | "metadata": {}, |
19 | 19 | "source": [ |
20 | 20 | "### ⚡ Colab Setup\n", |
|
25 | 25 | { |
26 | 26 | "cell_type": "code", |
27 | 27 | "execution_count": null, |
28 | | - "id": "b886272b", |
| 28 | + "id": "7f45ea56", |
29 | 29 | "metadata": {}, |
30 | 30 | "outputs": [], |
31 | 31 | "source": [ |
|
36 | 36 | { |
37 | 37 | "cell_type": "code", |
38 | 38 | "execution_count": null, |
39 | | - "id": "f5cf20f9", |
| 39 | + "id": "ea86e81e", |
40 | 40 | "metadata": {}, |
41 | 41 | "outputs": [], |
42 | 42 | "source": [ |
|
53 | 53 | }, |
54 | 54 | { |
55 | 55 | "cell_type": "markdown", |
56 | | - "id": "e11a4288", |
| 56 | + "id": "16611c7b", |
57 | 57 | "metadata": {}, |
58 | 58 | "source": [ |
59 | 59 | "### 📦 Import the essentials\n", |
|
64 | 64 | { |
65 | 65 | "cell_type": "code", |
66 | 66 | "execution_count": null, |
67 | | - "id": "e8faecea", |
| 67 | + "id": "875342bb", |
68 | 68 | "metadata": {}, |
69 | 69 | "outputs": [], |
70 | 70 | "source": [ |
|
85 | 85 | }, |
86 | 86 | { |
87 | 87 | "cell_type": "markdown", |
88 | | - "id": "314d17c1", |
| 88 | + "id": "b58ac676", |
89 | 89 | "metadata": {}, |
90 | 90 | "source": [ |
91 | 91 | "### ⚙️ Initialize the Data Designer interface\n", |
|
98 | 98 | { |
99 | 99 | "cell_type": "code", |
100 | 100 | "execution_count": null, |
101 | | - "id": "be3b5c6f", |
| 101 | + "id": "3ce805ad", |
102 | 102 | "metadata": {}, |
103 | 103 | "outputs": [], |
104 | 104 | "source": [ |
|
107 | 107 | }, |
108 | 108 | { |
109 | 109 | "cell_type": "markdown", |
110 | | - "id": "1c2852e1", |
| 110 | + "id": "50e961ed", |
111 | 111 | "metadata": {}, |
112 | 112 | "source": [ |
113 | 113 | "### 🎛️ Define model configurations\n", |
|
124 | 124 | { |
125 | 125 | "cell_type": "code", |
126 | 126 | "execution_count": null, |
127 | | - "id": "5ad52a10", |
| 127 | + "id": "1b07a6a5", |
128 | 128 | "metadata": {}, |
129 | 129 | "outputs": [], |
130 | 130 | "source": [ |
131 | 131 | "# This name is set in the model provider configuration.\n", |
132 | 132 | "MODEL_PROVIDER = \"nvidia\"\n", |
133 | 133 | "\n", |
134 | 134 | "# The model ID is from build.nvidia.com.\n", |
135 | | - "MODEL_ID = \"nvidia/nvidia-nemotron-nano-9b-v2\"\n", |
| 135 | + "MODEL_ID = \"nvidia/nemotron-3-nano-30b-a3b\"\n", |
136 | 136 | "\n", |
137 | 137 | "# We choose this alias to be descriptive for our use case.\n", |
138 | 138 | "MODEL_ALIAS = \"nemotron-nano-v2\"\n", |
|
156 | 156 | }, |
157 | 157 | { |
158 | 158 | "cell_type": "markdown", |
159 | | - "id": "25cce9f7", |
| 159 | + "id": "6d873251", |
160 | 160 | "metadata": {}, |
161 | 161 | "source": [ |
162 | 162 | "### 🏗️ Initialize the Data Designer Config Builder\n", |
|
171 | 171 | { |
172 | 172 | "cell_type": "code", |
173 | 173 | "execution_count": null, |
174 | | - "id": "8ff7190c", |
| 174 | + "id": "d45fac13", |
175 | 175 | "metadata": {}, |
176 | 176 | "outputs": [], |
177 | 177 | "source": [ |
|
180 | 180 | }, |
181 | 181 | { |
182 | 182 | "cell_type": "markdown", |
183 | | - "id": "6bc3b23e", |
| 183 | + "id": "c35b0274", |
184 | 184 | "metadata": {}, |
185 | 185 | "source": [ |
186 | 186 | "## 🎲 Getting started with sampler columns\n", |
|
197 | 197 | { |
198 | 198 | "cell_type": "code", |
199 | 199 | "execution_count": null, |
200 | | - "id": "4cff01cb", |
| 200 | + "id": "14cb9967", |
201 | 201 | "metadata": {}, |
202 | 202 | "outputs": [], |
203 | 203 | "source": [ |
|
206 | 206 | }, |
207 | 207 | { |
208 | 208 | "cell_type": "markdown", |
209 | | - "id": "f981ec58", |
| 209 | + "id": "40945aea", |
210 | 210 | "metadata": {}, |
211 | 211 | "source": [ |
212 | 212 | "Let's start designing our product review dataset by adding product category and subcategory columns.\n" |
|
215 | 215 | { |
216 | 216 | "cell_type": "code", |
217 | 217 | "execution_count": null, |
218 | | - "id": "70ba24a6", |
| 218 | + "id": "a7d87e00", |
219 | 219 | "metadata": {}, |
220 | 220 | "outputs": [], |
221 | 221 | "source": [ |
|
296 | 296 | }, |
297 | 297 | { |
298 | 298 | "cell_type": "markdown", |
299 | | - "id": "6f1a6c59", |
| 299 | + "id": "48699878", |
300 | 300 | "metadata": {}, |
301 | 301 | "source": [ |
302 | 302 | "Next, let's add samplers to generate data related to the customer and their review.\n" |
|
305 | 305 | { |
306 | 306 | "cell_type": "code", |
307 | 307 | "execution_count": null, |
308 | | - "id": "d45b925f", |
| 308 | + "id": "df84faf3", |
309 | 309 | "metadata": {}, |
310 | 310 | "outputs": [], |
311 | 311 | "source": [ |
|
342 | 342 | }, |
343 | 343 | { |
344 | 344 | "cell_type": "markdown", |
345 | | - "id": "bf49c2b1", |
| 345 | + "id": "8288352d", |
346 | 346 | "metadata": {}, |
347 | 347 | "source": [ |
348 | 348 | "## 🦜 LLM-generated columns\n", |
|
357 | 357 | { |
358 | 358 | "cell_type": "code", |
359 | 359 | "execution_count": null, |
360 | | - "id": "669fe324", |
| 360 | + "id": "157919b4", |
361 | 361 | "metadata": {}, |
362 | 362 | "outputs": [], |
363 | 363 | "source": [ |
|
394 | 394 | }, |
395 | 395 | { |
396 | 396 | "cell_type": "markdown", |
397 | | - "id": "4d93ad9a", |
| 397 | + "id": "009646e4", |
398 | 398 | "metadata": {}, |
399 | 399 | "source": [ |
400 | 400 | "### 🔁 Iteration is key – preview the dataset!\n", |
|
411 | 411 | { |
412 | 412 | "cell_type": "code", |
413 | 413 | "execution_count": null, |
414 | | - "id": "7b2466d1", |
| 414 | + "id": "a9c90236", |
415 | 415 | "metadata": {}, |
416 | 416 | "outputs": [], |
417 | 417 | "source": [ |
|
421 | 421 | { |
422 | 422 | "cell_type": "code", |
423 | 423 | "execution_count": null, |
424 | | - "id": "508a2866", |
| 424 | + "id": "3cfe180e", |
425 | 425 | "metadata": {}, |
426 | 426 | "outputs": [], |
427 | 427 | "source": [ |
|
432 | 432 | { |
433 | 433 | "cell_type": "code", |
434 | 434 | "execution_count": null, |
435 | | - "id": "6fbdaf64", |
| 435 | + "id": "65b2f595", |
436 | 436 | "metadata": {}, |
437 | 437 | "outputs": [], |
438 | 438 | "source": [ |
|
442 | 442 | }, |
443 | 443 | { |
444 | 444 | "cell_type": "markdown", |
445 | | - "id": "154e8e71", |
| 445 | + "id": "2134fa0f", |
446 | 446 | "metadata": {}, |
447 | 447 | "source": [ |
448 | 448 | "### 📊 Analyze the generated data\n", |
|
455 | 455 | { |
456 | 456 | "cell_type": "code", |
457 | 457 | "execution_count": null, |
458 | | - "id": "7e031c7b", |
| 458 | + "id": "8a37dd61", |
459 | 459 | "metadata": {}, |
460 | 460 | "outputs": [], |
461 | 461 | "source": [ |
|
465 | 465 | }, |
466 | 466 | { |
467 | 467 | "cell_type": "markdown", |
468 | | - "id": "a60a1fab", |
| 468 | + "id": "b715bc3a", |
469 | 469 | "metadata": {}, |
470 | 470 | "source": [ |
471 | 471 | "### 🆙 Scale up!\n", |
|
478 | 478 | { |
479 | 479 | "cell_type": "code", |
480 | 480 | "execution_count": null, |
481 | | - "id": "e07c6718", |
| 481 | + "id": "565f03a1", |
482 | 482 | "metadata": {}, |
483 | 483 | "outputs": [], |
484 | 484 | "source": [ |
|
488 | 488 | { |
489 | 489 | "cell_type": "code", |
490 | 490 | "execution_count": null, |
491 | | - "id": "7a5406da", |
| 491 | + "id": "9d4c91ad", |
492 | 492 | "metadata": {}, |
493 | 493 | "outputs": [], |
494 | 494 | "source": [ |
|
501 | 501 | { |
502 | 502 | "cell_type": "code", |
503 | 503 | "execution_count": null, |
504 | | - "id": "f0360b0e", |
| 504 | + "id": "93c5a082", |
505 | 505 | "metadata": {}, |
506 | 506 | "outputs": [], |
507 | 507 | "source": [ |
|
513 | 513 | }, |
514 | 514 | { |
515 | 515 | "cell_type": "markdown", |
516 | | - "id": "d365dda0", |
| 516 | + "id": "13f7c942", |
517 | 517 | "metadata": {}, |
518 | 518 | "source": [ |
519 | 519 | "## ⏭️ Next Steps\n", |
|
0 commit comments