Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 39 additions & 36 deletions docs/colab_notebooks/1-the-basics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "a4ac4d55",
"id": "39d7d274",
"metadata": {},
"source": [
"# 🎨 Data Designer Tutorial: The Basics\n",
Expand All @@ -14,7 +14,7 @@
},
{
"cell_type": "markdown",
"id": "9e9f3c47",
"id": "60f1d002",
"metadata": {},
"source": [
"### ⚡ Colab Setup\n",
Expand All @@ -25,17 +25,18 @@
{
"cell_type": "code",
"execution_count": null,
"id": "41b31194",
"id": "99c42292",
"metadata": {},
"outputs": [],
"source": [
"!pip install -qU data-designer"
"%%capture\n",
"!pip install -U data-designer"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "502b3aba",
"id": "2c959ca9",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -52,7 +53,7 @@
},
{
"cell_type": "markdown",
"id": "8c512fbc",
"id": "bc185897",
"metadata": {},
"source": [
"### 📦 Import the essentials\n",
Expand All @@ -63,7 +64,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8fae521f",
"id": "dc3a2d9d",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -84,20 +85,20 @@
},
{
"cell_type": "markdown",
"id": "e71d0256",
"id": "36c5f571",
"metadata": {},
"source": [
"### ⚙️ Initialize the Data Designer interface\n",
"\n",
"- `DataDesigner` is the main object is responsible for managing the data generation process.\n",
"\n",
"- When initialized without arguments, the [default model providers](https://nvidia-nemo.github.io/DataDesigner/concepts/models/default-model-settings/) are used.\n"
"- When initialized without arguments, the [default model providers](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/models/default-model-settings/) are used.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68fc7172",
"id": "61b23c70",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -106,7 +107,7 @@
},
{
"cell_type": "markdown",
"id": "9a821a27",
"id": "3c9b7cb6",
"metadata": {},
"source": [
"### 🎛️ Define model configurations\n",
Expand All @@ -115,15 +116,15 @@
"\n",
"- The \"model alias\" is used to reference the model in the Data Designer config (as we will see below).\n",
"\n",
"- The \"model provider\" is the external service that hosts the model (see the [model config](https://nvidia-nemo.github.io/DataDesigner/concepts/models/default-model-settings/) docs for more details).\n",
"- The \"model provider\" is the external service that hosts the model (see the [model config](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/models/default-model-settings/) docs for more details).\n",
"\n",
"- By default, we use [build.nvidia.com](https://build.nvidia.com/models) as the model provider.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9515141",
"id": "b86f6217",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -155,7 +156,7 @@
},
{
"cell_type": "markdown",
"id": "3b940ab9",
"id": "1f089871",
"metadata": {},
"source": [
"### 🏗️ Initialize the Data Designer Config Builder\n",
Expand All @@ -170,7 +171,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ec21da7e",
"id": "3d666193",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -179,7 +180,7 @@
},
{
"cell_type": "markdown",
"id": "85b2324e",
"id": "e88c8881",
"metadata": {},
"source": [
"## 🎲 Getting started with sampler columns\n",
Expand All @@ -196,7 +197,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f49f435e",
"id": "79fb85c6",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -205,7 +206,7 @@
},
{
"cell_type": "markdown",
"id": "f582b642",
"id": "5106cc10",
"metadata": {},
"source": [
"Let's start designing our product review dataset by adding product category and subcategory columns.\n"
Expand All @@ -214,7 +215,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8cfc43b1",
"id": "22b97af1",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -295,7 +296,7 @@
},
{
"cell_type": "markdown",
"id": "2d0eea21",
"id": "4857b085",
"metadata": {},
"source": [
"Next, let's add samplers to generate data related to the customer and their review.\n"
Expand All @@ -304,7 +305,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b5e65724",
"id": "9e90b3cb",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -341,7 +342,7 @@
},
{
"cell_type": "markdown",
"id": "e6788771",
"id": "b36a153b",
"metadata": {},
"source": [
"## 🦜 LLM-generated columns\n",
Expand All @@ -356,7 +357,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a2705cd9",
"id": "4da88fe6",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -393,7 +394,7 @@
},
{
"cell_type": "markdown",
"id": "e3dd2f69",
"id": "5f1b9ac8",
"metadata": {},
"source": [
"### 🔁 Iteration is key – preview the dataset!\n",
Expand All @@ -410,7 +411,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c6e43147",
"id": "543e2f9c",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -420,7 +421,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "fab77d01",
"id": "26136a8a",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -431,7 +432,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "875ee6a6",
"id": "aca4360d",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -441,7 +442,7 @@
},
{
"cell_type": "markdown",
"id": "87b59e4b",
"id": "35ca0470",
"metadata": {},
"source": [
"### 📊 Analyze the generated data\n",
Expand All @@ -454,7 +455,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5d347f4c",
"id": "d55b402d",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -464,7 +465,7 @@
},
{
"cell_type": "markdown",
"id": "d2fb84f2",
"id": "245b48cf",
"metadata": {},
"source": [
"### 🆙 Scale up!\n",
Expand All @@ -477,7 +478,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "71a31e85",
"id": "fc803eb0",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -487,7 +488,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "501e9092",
"id": "881c2043",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -500,7 +501,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "6f217b4a",
"id": "d79860d4",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -512,16 +513,18 @@
},
{
"cell_type": "markdown",
"id": "4da82b0f",
"id": "b4b45176",
"metadata": {},
"source": [
"## ⏭️ Next Steps\n",
"\n",
"Now that you've seen the basics of Data Designer, check out the following notebooks to learn more about:\n",
"\n",
"- [Structured outputs and jinja expressions](/notebooks/2-structured-outputs-and-jinja-expressions/)\n",
"- [Structured outputs and jinja expressions](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/2-structured-outputs-and-jinja-expressions/)\n",
"\n",
"- [Seeding synthetic data generation with an external dataset](/notebooks/3-seeding-with-a-dataset/)\n"
"- [Seeding synthetic data generation with an external dataset](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/3-seeding-with-a-dataset/)\n",
"\n",
"- [Providing images as context](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/4-providing-images-as-context/)\n"
]
}
],
Expand Down
Loading