diff --git a/docs/colab_notebooks/1-the-basics.ipynb b/docs/colab_notebooks/1-the-basics.ipynb index 091200d7..e77c4a1b 100644 --- a/docs/colab_notebooks/1-the-basics.ipynb +++ b/docs/colab_notebooks/1-the-basics.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "3599c474", + "id": "709c75cf", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: The Basics\n", @@ -14,7 +14,7 @@ }, { "cell_type": "markdown", - "id": "ee8bed13", + "id": "3cb2774e", "metadata": {}, "source": [ "### ⚑ Colab Setup\n", @@ -25,7 +25,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f43069d1", + "id": "b886272b", "metadata": {}, "outputs": [], "source": [ @@ -36,7 +36,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c136bf4f", + "id": "f5cf20f9", "metadata": {}, "outputs": [], "source": [ @@ -53,7 +53,7 @@ }, { "cell_type": "markdown", - "id": "48739393", + "id": "e11a4288", "metadata": {}, "source": [ "### πŸ“¦ Import the essentials\n", @@ -64,7 +64,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e459cd98", + "id": "e8faecea", "metadata": {}, "outputs": [], "source": [ @@ -85,7 +85,7 @@ }, { "cell_type": "markdown", - "id": "b705d204", + "id": "314d17c1", "metadata": {}, "source": [ "### βš™οΈ Initialize the Data Designer interface\n", @@ -98,7 +98,7 @@ { "cell_type": "code", "execution_count": null, - "id": "aee62c85", + "id": "be3b5c6f", "metadata": {}, "outputs": [], "source": [ @@ -107,7 +107,7 @@ }, { "cell_type": "markdown", - "id": "ae65c557", + "id": "1c2852e1", "metadata": {}, "source": [ "### πŸŽ›οΈ Define model configurations\n", @@ -124,7 +124,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1079200d", + "id": "5ad52a10", "metadata": {}, "outputs": [], "source": [ @@ -156,7 +156,7 @@ }, { "cell_type": "markdown", - "id": "9f15426a", + "id": "25cce9f7", "metadata": {}, "source": [ "### πŸ—οΈ Initialize the Data Designer Config Builder\n", @@ -171,7 +171,7 @@ { "cell_type": "code", "execution_count": null, - "id": "79b8212c", + "id": "8ff7190c", "metadata": {}, "outputs": [], "source": [ @@ -180,7 +180,7 @@ }, { "cell_type": "markdown", - "id": "cd1d9e09", + "id": "6bc3b23e", "metadata": {}, "source": [ "## 🎲 Getting started with sampler columns\n", @@ -197,7 +197,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b3f469d6", + "id": "4cff01cb", "metadata": {}, "outputs": [], "source": [ @@ -206,7 +206,7 @@ }, { "cell_type": "markdown", - "id": "e44adc6c", + "id": "f981ec58", "metadata": {}, "source": [ "Let's start designing our product review dataset by adding product category and subcategory columns.\n" @@ -215,7 +215,7 @@ { "cell_type": "code", "execution_count": null, - "id": "82b32804", + "id": "70ba24a6", "metadata": {}, "outputs": [], "source": [ @@ -296,7 +296,7 @@ }, { "cell_type": "markdown", - "id": "bd65456c", + "id": "6f1a6c59", "metadata": {}, "source": [ "Next, let's add samplers to generate data related to the customer and their review.\n" @@ -305,7 +305,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6d6d4eef", + "id": "d45b925f", "metadata": {}, "outputs": [], "source": [ @@ -342,7 +342,7 @@ }, { "cell_type": "markdown", - "id": "eb7b415c", + "id": "bf49c2b1", "metadata": {}, "source": [ "## 🦜 LLM-generated columns\n", @@ -357,7 +357,7 @@ { "cell_type": "code", "execution_count": null, - "id": "ed811560", + "id": "669fe324", "metadata": {}, "outputs": [], "source": [ @@ -394,7 +394,7 @@ }, { "cell_type": "markdown", - "id": "fdc0a2c8", + "id": "4d93ad9a", "metadata": {}, "source": [ "### πŸ” Iteration is key – preview the dataset!\n", @@ -411,7 +411,7 @@ { "cell_type": "code", "execution_count": null, - "id": "59987c81", + "id": "7b2466d1", "metadata": {}, "outputs": [], "source": [ @@ -421,7 +421,7 @@ { "cell_type": "code", "execution_count": null, - "id": "0823ca7f", + "id": "508a2866", "metadata": {}, "outputs": [], "source": [ @@ -432,7 +432,7 @@ { "cell_type": "code", "execution_count": null, - "id": "eca4f0bc", + "id": "6fbdaf64", "metadata": {}, "outputs": [], "source": [ @@ -442,7 +442,7 @@ }, { "cell_type": "markdown", - "id": "edd57f85", + "id": "154e8e71", "metadata": {}, "source": [ "### πŸ“Š Analyze the generated data\n", @@ -455,7 +455,7 @@ { "cell_type": "code", "execution_count": null, - "id": "5c681eee", + "id": "7e031c7b", "metadata": {}, "outputs": [], "source": [ @@ -465,7 +465,7 @@ }, { "cell_type": "markdown", - "id": "14bf06f2", + "id": "a60a1fab", "metadata": {}, "source": [ "### πŸ†™ Scale up!\n", @@ -478,7 +478,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b7ffead1", + "id": "e07c6718", "metadata": {}, "outputs": [], "source": [ @@ -488,7 +488,7 @@ { "cell_type": "code", "execution_count": null, - "id": "aa966388", + "id": "7a5406da", "metadata": {}, "outputs": [], "source": [ @@ -501,7 +501,7 @@ { "cell_type": "code", "execution_count": null, - "id": "98e1085c", + "id": "f0360b0e", "metadata": {}, "outputs": [], "source": [ @@ -513,7 +513,7 @@ }, { "cell_type": "markdown", - "id": "e0b9c65a", + "id": "d365dda0", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb b/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb index fd9a2d69..84ece867 100644 --- a/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb +++ b/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "928a8d4a", + "id": "75360052", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: Structured Outputs and Jinja Expressions\n", @@ -16,7 +16,7 @@ }, { "cell_type": "markdown", - "id": "ad16de35", + "id": "5a028f03", "metadata": {}, "source": [ "### ⚑ Colab Setup\n", @@ -27,7 +27,7 @@ { "cell_type": "code", "execution_count": null, - "id": "cfb08d4c", + "id": "ba8c8f3f", "metadata": {}, "outputs": [], "source": [ @@ -38,7 +38,7 @@ { "cell_type": "code", "execution_count": null, - "id": "eddeceea", + "id": "b0825a6a", "metadata": {}, "outputs": [], "source": [ @@ -55,7 +55,7 @@ }, { "cell_type": "markdown", - "id": "08e2b3bf", + "id": "e18ab9a1", "metadata": {}, "source": [ "### πŸ“¦ Import the essentials\n", @@ -66,7 +66,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f057319b", + "id": "0cc3443c", "metadata": {}, "outputs": [], "source": [ @@ -87,7 +87,7 @@ }, { "cell_type": "markdown", - "id": "e7d5e529", + "id": "abfce2e0", "metadata": {}, "source": [ "### βš™οΈ Initialize the Data Designer interface\n", @@ -100,7 +100,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e30fdeb1", + "id": "e2a67d69", "metadata": {}, "outputs": [], "source": [ @@ -109,7 +109,7 @@ }, { "cell_type": "markdown", - "id": "07b8bfe7", + "id": "65cc7285", "metadata": {}, "source": [ "### πŸŽ›οΈ Define model configurations\n", @@ -126,7 +126,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4ae518af", + "id": "a78a0747", "metadata": {}, "outputs": [], "source": [ @@ -158,7 +158,7 @@ }, { "cell_type": "markdown", - "id": "a3fa2eaf", + "id": "94e33a9a", "metadata": {}, "source": [ "### πŸ—οΈ Initialize the Data Designer Config Builder\n", @@ -173,7 +173,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e1b59bd4", + "id": "840863dd", "metadata": {}, "outputs": [], "source": [ @@ -182,7 +182,7 @@ }, { "cell_type": "markdown", - "id": "5e220f78", + "id": "2451048a", "metadata": {}, "source": [ "### πŸ§‘β€πŸŽ¨ Designing our data\n", @@ -209,7 +209,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e814ca47", + "id": "fda11990", "metadata": {}, "outputs": [], "source": [ @@ -237,7 +237,7 @@ }, { "cell_type": "markdown", - "id": "303ebd6f", + "id": "b5f6ced7", "metadata": {}, "source": [ "Next, let's design our product review dataset using a few more tricks compared to the previous notebook.\n" @@ -246,7 +246,7 @@ { "cell_type": "code", "execution_count": null, - "id": "361afe50", + "id": "eb4538fd", "metadata": {}, "outputs": [], "source": [ @@ -355,7 +355,7 @@ }, { "cell_type": "markdown", - "id": "b18e511d", + "id": "5f003b9e", "metadata": {}, "source": [ "Next, we will use more advanced Jinja expressions to create new columns.\n", @@ -372,7 +372,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e669c009", + "id": "eb5d8fb1", "metadata": {}, "outputs": [], "source": [ @@ -426,7 +426,7 @@ }, { "cell_type": "markdown", - "id": "9ca31fb8", + "id": "edd7d429", "metadata": {}, "source": [ "### πŸ” Iteration is key – preview the dataset!\n", @@ -443,7 +443,7 @@ { "cell_type": "code", "execution_count": null, - "id": "402dede2", + "id": "ae4bf40d", "metadata": {}, "outputs": [], "source": [ @@ -453,7 +453,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c028223a", + "id": "ff96200d", "metadata": {}, "outputs": [], "source": [ @@ -464,7 +464,7 @@ { "cell_type": "code", "execution_count": null, - "id": "32d6fdaa", + "id": "8743034c", "metadata": {}, "outputs": [], "source": [ @@ -474,7 +474,7 @@ }, { "cell_type": "markdown", - "id": "3b76da46", + "id": "bc4f5aa2", "metadata": {}, "source": [ "### πŸ“Š Analyze the generated data\n", @@ -487,7 +487,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1a9ac292", + "id": "7e276b36", "metadata": {}, "outputs": [], "source": [ @@ -497,7 +497,7 @@ }, { "cell_type": "markdown", - "id": "8e4b20ed", + "id": "af921b28", "metadata": {}, "source": [ "### πŸ†™ Scale up!\n", @@ -510,7 +510,7 @@ { "cell_type": "code", "execution_count": null, - "id": "eed6e04b", + "id": "6053798d", "metadata": {}, "outputs": [], "source": [ @@ -520,7 +520,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b6d1d270", + "id": "49bf6b4f", "metadata": {}, "outputs": [], "source": [ @@ -533,7 +533,7 @@ { "cell_type": "code", "execution_count": null, - "id": "058d6b65", + "id": "e033facc", "metadata": {}, "outputs": [], "source": [ @@ -545,7 +545,7 @@ }, { "cell_type": "markdown", - "id": "d3affdac", + "id": "7b8641c8", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb b/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb index af97af89..3b0fd70f 100644 --- a/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb +++ b/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "ce777a5d", + "id": "738d16f1", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: Seeding Synthetic Data Generation with an External Dataset\n", @@ -16,7 +16,7 @@ }, { "cell_type": "markdown", - "id": "442b70e6", + "id": "64d9526d", "metadata": {}, "source": [ "### ⚑ Colab Setup\n", @@ -27,7 +27,7 @@ { "cell_type": "code", "execution_count": null, - "id": "9ee97c1e", + "id": "1767dc37", "metadata": {}, "outputs": [], "source": [ @@ -38,7 +38,7 @@ { "cell_type": "code", "execution_count": null, - "id": "18c06754", + "id": "e04a061e", "metadata": {}, "outputs": [], "source": [ @@ -55,7 +55,7 @@ }, { "cell_type": "markdown", - "id": "9394bcd4", + "id": "bb02bd01", "metadata": {}, "source": [ "### πŸ“¦ Import the essentials\n", @@ -66,7 +66,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6f1099d1", + "id": "f030c7a3", "metadata": {}, "outputs": [], "source": [ @@ -81,7 +81,7 @@ }, { "cell_type": "markdown", - "id": "bc74d436", + "id": "8fc24d87", "metadata": {}, "source": [ "### βš™οΈ Initialize the Data Designer interface\n", @@ -94,7 +94,7 @@ { "cell_type": "code", "execution_count": null, - "id": "0f9c59fa", + "id": "fbd4f8d1", "metadata": {}, "outputs": [], "source": [ @@ -103,7 +103,7 @@ }, { "cell_type": "markdown", - "id": "c92d4c3c", + "id": "2b84a1cf", "metadata": {}, "source": [ "### πŸŽ›οΈ Define model configurations\n", @@ -120,7 +120,7 @@ { "cell_type": "code", "execution_count": null, - "id": "543805b1", + "id": "a973341a", "metadata": {}, "outputs": [], "source": [ @@ -152,7 +152,7 @@ }, { "cell_type": "markdown", - "id": "29f69761", + "id": "56ed7cf4", "metadata": {}, "source": [ "### πŸ—οΈ Initialize the Data Designer Config Builder\n", @@ -167,7 +167,7 @@ { "cell_type": "code", "execution_count": null, - "id": "7b18a399", + "id": "031c5ca6", "metadata": {}, "outputs": [], "source": [ @@ -176,7 +176,7 @@ }, { "cell_type": "markdown", - "id": "5439e926", + "id": "f617bc93", "metadata": {}, "source": [ "## πŸ₯ Prepare a seed dataset\n", @@ -201,7 +201,7 @@ { "cell_type": "code", "execution_count": null, - "id": "d4cbb09e", + "id": "f2ad0c58", "metadata": {}, "outputs": [], "source": [ @@ -219,7 +219,7 @@ }, { "cell_type": "markdown", - "id": "367fd06d", + "id": "44a5f487", "metadata": {}, "source": [ "## 🎨 Designing our synthetic patient notes dataset\n", @@ -236,7 +236,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1cbcf034", + "id": "e1825523", "metadata": {}, "outputs": [], "source": [ @@ -326,7 +326,7 @@ }, { "cell_type": "markdown", - "id": "ef3d8dcf", + "id": "6aacdb7d", "metadata": {}, "source": [ "### πŸ” Iteration is key – preview the dataset!\n", @@ -343,7 +343,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6aa843f8", + "id": "7aa0d1fb", "metadata": {}, "outputs": [], "source": [ @@ -353,7 +353,7 @@ { "cell_type": "code", "execution_count": null, - "id": "2134a59d", + "id": "1cc1dcf2", "metadata": {}, "outputs": [], "source": [ @@ -364,7 +364,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3c3c39bf", + "id": "6369e0e2", "metadata": {}, "outputs": [], "source": [ @@ -374,7 +374,7 @@ }, { "cell_type": "markdown", - "id": "bf4b07b5", + "id": "d8af020a", "metadata": {}, "source": [ "### πŸ“Š Analyze the generated data\n", @@ -387,7 +387,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e7813016", + "id": "09c662e5", "metadata": {}, "outputs": [], "source": [ @@ -397,7 +397,7 @@ }, { "cell_type": "markdown", - "id": "db2d39b6", + "id": "35fe454e", "metadata": {}, "source": [ "### πŸ†™ Scale up!\n", @@ -410,7 +410,7 @@ { "cell_type": "code", "execution_count": null, - "id": "2fcc5cb3", + "id": "c59d0395", "metadata": {}, "outputs": [], "source": [ @@ -420,7 +420,7 @@ { "cell_type": "code", "execution_count": null, - "id": "036581fa", + "id": "fab7e2b7", "metadata": {}, "outputs": [], "source": [ @@ -433,7 +433,7 @@ { "cell_type": "code", "execution_count": null, - "id": "74e91fe3", + "id": "3c8e9a8e", "metadata": {}, "outputs": [], "source": [ @@ -445,7 +445,7 @@ }, { "cell_type": "markdown", - "id": "6f65cadc", + "id": "6591a0e0", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/colab_notebooks/4-providing-images-as-context.ipynb b/docs/colab_notebooks/4-providing-images-as-context.ipynb index ddc6488a..58b33d87 100644 --- a/docs/colab_notebooks/4-providing-images-as-context.ipynb +++ b/docs/colab_notebooks/4-providing-images-as-context.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "560882a6", + "id": "356b84c3", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: Providing Images as Context for Vision-Based Data Generation" @@ -10,7 +10,7 @@ }, { "cell_type": "markdown", - "id": "b443ea18", + "id": "01f2dc2a", "metadata": {}, "source": [ "#### πŸ“š What you'll learn\n", @@ -25,7 +25,7 @@ }, { "cell_type": "markdown", - "id": "d59495b4", + "id": "6c308378", "metadata": {}, "source": [ "### ⚑ Colab Setup\n", @@ -36,7 +36,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e80db6fd", + "id": "58ebaf5e", "metadata": {}, "outputs": [], "source": [ @@ -47,7 +47,7 @@ { "cell_type": "code", "execution_count": null, - "id": "15ac4714", + "id": "f6bde12b", "metadata": {}, "outputs": [], "source": [ @@ -64,7 +64,7 @@ }, { "cell_type": "markdown", - "id": "01741037", + "id": "96d77833", "metadata": {}, "source": [ "### πŸ“¦ Import the essentials\n", @@ -75,7 +75,7 @@ { "cell_type": "code", "execution_count": null, - "id": "097a9059", + "id": "e9f3ab9f", "metadata": {}, "outputs": [], "source": [ @@ -106,7 +106,7 @@ }, { "cell_type": "markdown", - "id": "4f6ca947", + "id": "a708b480", "metadata": {}, "source": [ "### βš™οΈ Initialize the Data Designer interface\n", @@ -119,7 +119,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6508f2b7", + "id": "c5553f64", "metadata": {}, "outputs": [], "source": [ @@ -128,7 +128,7 @@ }, { "cell_type": "markdown", - "id": "7343463e", + "id": "98375953", "metadata": {}, "source": [ "### πŸŽ›οΈ Define model configurations\n", @@ -145,7 +145,7 @@ { "cell_type": "code", "execution_count": null, - "id": "31019979", + "id": "f46d9767", "metadata": {}, "outputs": [], "source": [ @@ -168,7 +168,7 @@ }, { "cell_type": "markdown", - "id": "186f9f98", + "id": "beda84e0", "metadata": {}, "source": [ "### πŸ—οΈ Initialize the Data Designer Config Builder\n", @@ -183,7 +183,7 @@ { "cell_type": "code", "execution_count": null, - "id": "80fbc002", + "id": "510fd21b", "metadata": {}, "outputs": [], "source": [ @@ -192,7 +192,7 @@ }, { "cell_type": "markdown", - "id": "203663e2", + "id": "fbfe9dc4", "metadata": {}, "source": [ "### 🌱 Seed Dataset Creation\n", @@ -209,7 +209,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1a2b0ce1", + "id": "2f06c086", "metadata": {}, "outputs": [], "source": [ @@ -224,7 +224,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8cc0f6ea", + "id": "18f42dee", "metadata": {}, "outputs": [], "source": [ @@ -272,7 +272,7 @@ { "cell_type": "code", "execution_count": null, - "id": "9fe4b02c", + "id": "e5b0c59b", "metadata": {}, "outputs": [], "source": [ @@ -290,7 +290,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e546cc4a", + "id": "b76739c8", "metadata": {}, "outputs": [], "source": [ @@ -300,7 +300,7 @@ { "cell_type": "code", "execution_count": null, - "id": "eebc9963", + "id": "d57302c5", "metadata": {}, "outputs": [], "source": [ @@ -314,7 +314,7 @@ { "cell_type": "code", "execution_count": null, - "id": "99b4a3b5", + "id": "919dce3b", "metadata": { "lines_to_next_cell": 2 }, @@ -343,7 +343,7 @@ }, { "cell_type": "markdown", - "id": "58ab7a96", + "id": "3e0039d4", "metadata": { "lines_to_next_cell": 2 }, @@ -351,7 +351,7 @@ }, { "cell_type": "markdown", - "id": "7d735a85", + "id": "36385331", "metadata": {}, "source": [ "### πŸ” Iteration is key – preview the dataset!\n", @@ -368,7 +368,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6404dc34", + "id": "e3612ca0", "metadata": {}, "outputs": [], "source": [ @@ -378,7 +378,7 @@ { "cell_type": "code", "execution_count": null, - "id": "44f3678c", + "id": "6fde5225", "metadata": {}, "outputs": [], "source": [ @@ -389,7 +389,7 @@ { "cell_type": "code", "execution_count": null, - "id": "06009be0", + "id": "ca5c91b5", "metadata": {}, "outputs": [], "source": [ @@ -399,7 +399,7 @@ }, { "cell_type": "markdown", - "id": "43e9af07", + "id": "64d4a3d1", "metadata": {}, "source": [ "### πŸ“Š Analyze the generated data\n", @@ -412,7 +412,7 @@ { "cell_type": "code", "execution_count": null, - "id": "7bce2b2e", + "id": "a06c4a54", "metadata": {}, "outputs": [], "source": [ @@ -422,7 +422,7 @@ }, { "cell_type": "markdown", - "id": "dc361acc", + "id": "500d2cb3", "metadata": {}, "source": [ "### πŸ”Ž Visual Inspection\n", @@ -433,7 +433,7 @@ { "cell_type": "code", "execution_count": null, - "id": "574bbc62", + "id": "9ec41533", "metadata": { "lines_to_next_cell": 2 }, @@ -457,7 +457,7 @@ }, { "cell_type": "markdown", - "id": "cc5ebe1c", + "id": "b107e9f3", "metadata": {}, "source": [ "### πŸ†™ Scale up!\n", @@ -470,7 +470,7 @@ { "cell_type": "code", "execution_count": null, - "id": "305f90ec", + "id": "c9fe690d", "metadata": {}, "outputs": [], "source": [ @@ -480,7 +480,7 @@ { "cell_type": "code", "execution_count": null, - "id": "ef40fc6b", + "id": "1c07b691", "metadata": {}, "outputs": [], "source": [ @@ -493,7 +493,7 @@ { "cell_type": "code", "execution_count": null, - "id": "72756cf3", + "id": "c52efcc9", "metadata": {}, "outputs": [], "source": [ @@ -505,7 +505,7 @@ }, { "cell_type": "markdown", - "id": "e9bac314", + "id": "ca47004d", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/concepts/models/configure-model-settings-with-the-cli.md b/docs/concepts/models/configure-model-settings-with-the-cli.md index 56315cdf..e7baed3f 100644 --- a/docs/concepts/models/configure-model-settings-with-the-cli.md +++ b/docs/concepts/models/configure-model-settings-with-the-cli.md @@ -132,4 +132,7 @@ The CLI will show which configuration files exist and ask for confirmation befor - **[Model Providers](model-providers.md)**: Learn about the `ModelProvider` class and provider configuration - **[Model Configurations](model-configs.md)**: Learn about `ModelConfig` - **[Default Model Settings](default-model-settings.md)**: Pre-configured providers and model settings included with Data Designer +- **[Custom Model Settings](custom-model-settings.md)**: Learn how to create custom providers and model configurations +- **[Model Providers](model-providers.md)**: Learn about the `ModelProvider` class and provider configuration +- **[Model Configurations](model-configs.md)**: Learn about `ModelConfig` - **[Quick Start Guide](../../quick-start.md)**: Get started with a simple example diff --git a/docs/concepts/models/custom-model-settings.md b/docs/concepts/models/custom-model-settings.md new file mode 100644 index 00000000..84aed5aa --- /dev/null +++ b/docs/concepts/models/custom-model-settings.md @@ -0,0 +1,229 @@ +# Custom Model Settings + +While Data Designer ships with pre-configured model providers and configurations, you can create custom configurations to use different models, adjust inference parameters, or connect to custom API endpoints. + +## When to Use Custom Settings + +Use custom model settings when you need to: + +- Use models not included in the defaults +- Adjust inference parameters (temperature, top_p, max_tokens) for specific use cases +- Add distribution-based inference parameters for variability +- Connect to self-hosted or custom model endpoints +- Create multiple variants of the same model with different settings + +## Creating and Using Custom Settings + +### Custom Models with Default Providers + +Create custom model configurations that use the default providers (no need to define providers yourself): + +```python +from data_designer.essentials import ( + CategorySamplerParams, + ChatCompletionInferenceParams, + DataDesigner, + DataDesignerConfigBuilder, + LLMTextColumnConfig, + ModelConfig, + SamplerColumnConfig, + SamplerType, +) + +# Create custom models using default providers +custom_models = [ + # High-temperature for more variability + ModelConfig( + alias="creative-writer", + model="nvidia/nvidia-nemotron-nano-9b-v2", + provider="nvidia", # Uses default NVIDIA provider + inference_parameters=ChatCompletionInferenceParams( + temperature=1.2, + top_p=0.98, + max_tokens=4096, + ), + ), + # Low-temperature for less variability + ModelConfig( + alias="fact-checker", + model="nvidia/nvidia-nemotron-nano-9b-v2", + provider="nvidia", # Uses default NVIDIA provider + inference_parameters=ChatCompletionInferenceParams( + temperature=0.1, + top_p=0.9, + max_tokens=2048, + ), + ), +] + +# Create DataDesigner (uses default providers) +data_designer = DataDesigner() + +# Pass custom models to config builder +config_builder = DataDesignerConfigBuilder(model_configs=custom_models) + +# Add a topic column using a categorical sampler +config_builder.add_column( + SamplerColumnConfig( + name="topic", + sampler_type=SamplerType.CATEGORY, + params=CategorySamplerParams( + values=["Artificial Intelligence", "Space Exploration", "Ancient History", "Climate Science"], + ), + ) +) + +# Use your custom models +config_builder.add_column( + LLMTextColumnConfig( + name="creative_story", + model_alias="creative-writer", + prompt="Write a creative short story about {{topic}}.", + ) +) + +config_builder.add_column( + LLMTextColumnConfig( + name="facts", + model_alias="fact-checker", + prompt="List 3 facts about {{topic}}.", + ) +) + +# Preview your dataset +preview_result = data_designer.preview(config_builder=config_builder) +preview_result.display_sample_record() +``` + +!!! note "Default Providers Always Available" + When you only specify `model_configs`, the default model providers (NVIDIA and OpenAI) are still available. You only need to create custom providers if you want to connect to different endpoints or modify provider settings. + +!!! tip "Mixing Custom and Default Models" + When you provide custom `model_configs` to `DataDesignerConfigBuilder`, they **replace** the defaults entirely. To use custom model configs in addition to the default configs, use the add_model_config method: + + ```python + # Load defaults first + config_builder = DataDesignerConfigBuilder() + + # Add custom model to defaults + config_builder.add_model_config( + ModelConfig( + alias="my-custom-model", + model="nvidia/llama-3.3-nemotron-super-49b-v1.5", + provider="nvidia", # Uses default provider + inference_parameters=ChatCompletionInferenceParams( + temperature=0.6, + max_tokens=8192, + ), + ) + ) + + # Now you can use both default and custom models + # Default: nvidia-text, nvidia-reasoning, nvidia-vision, etc. + # Custom: my-custom-model + ``` + +### Custom Providers with Custom Models + +Define both custom providers and custom model configurations when you need to connect to services not included in the defaults: + +!!! warning "Network Accessibility" + The custom provider endpoints must be reachable from where Data Designer runs. Ensure network connectivity, firewall rules, and any VPN requirements are properly configured. + +```python +from data_designer.essentials import ( + CategorySamplerParams, + ChatCompletionInferenceParams, + DataDesigner, + DataDesignerConfigBuilder, + LLMTextColumnConfig, + ModelConfig, + ModelProvider, + SamplerColumnConfig, + SamplerType, +) + +# Step 1: Define custom providers +custom_providers = [ + ModelProvider( + name="my-custom-provider", + endpoint="https://api.my-llm-service.com/v1", + provider_type="openai", # OpenAI-compatible API + api_key="MY_SERVICE_API_KEY", # Environment variable name + ), + ModelProvider( + name="my-self-hosted-provider", + endpoint="https://my-org.internal.com/llm/v1", + provider_type="openai", + api_key="SELF_HOSTED_API_KEY", + ), +] + +# Step 2: Define custom models +custom_models = [ + ModelConfig( + alias="my-text-model", + model="openai/some-model-id", + provider="my-custom-provider", # References provider by name + inference_parameters=ChatCompletionInferenceParams( + temperature=0.85, + top_p=0.95, + max_tokens=2048, + ), + ), + ModelConfig( + alias="my-self-hosted-text-model", + model="openai/some-hosted-model-id", + provider="my-self-hosted-provider", + inference_parameters=ChatCompletionInferenceParams( + temperature=0.7, + top_p=0.9, + max_tokens=1024, + ), + ), +] + +# Step 3: Create DataDesigner with custom providers +data_designer = DataDesigner(model_providers=custom_providers) + +# Step 4: Create config builder with custom models +config_builder = DataDesignerConfigBuilder(model_configs=custom_models) + +# Step 5: Add a topic column using a categorical sampler +config_builder.add_column( + SamplerColumnConfig( + name="topic", + sampler_type=SamplerType.CATEGORY, + params=CategorySamplerParams( + values=["Technology", "Healthcare", "Finance", "Education"], + ), + ) +) + +# Step 6: Use your custom model by referencing its alias +config_builder.add_column( + LLMTextColumnConfig( + name="short_news_article", + model_alias="my-text-model", # Reference custom alias + prompt="Write a short news article about the '{{topic}}' topic in 10 sentences.", + ) +) + +config_builder.add_column( + LLMTextColumnConfig( + name="long_news_article", + model_alias="my-self-hosted-text-model", # Reference custom alias + prompt="Write a detailed news article about the '{{topic}}' topic.", + ) +) + +# Step 7: Preview your dataset +preview_result = data_designer.preview(config_builder=config_builder) +preview_result.display_sample_record() +``` + +## See Also + +- **[Default Model Settings](default-model-settings.md)**: Pre-configured providers and model settings +- **[Configure Model Settings With the CLI](configure-model-settings-with-the-cli.md)**: CLI-based configuration +- **[Quick Start Guide](../../quick-start.md)**: Basic usage example diff --git a/docs/concepts/models/default-model-settings.md b/docs/concepts/models/default-model-settings.md index bfeff617..d6c31cdd 100644 --- a/docs/concepts/models/default-model-settings.md +++ b/docs/concepts/models/default-model-settings.md @@ -50,6 +50,12 @@ The following model configurations are automatically available when `OPENAI_API_ | `openai-vision` | `gpt-5` | Vision and image understanding | 0.85 | 0.95 | +## Using Default Settings + +Default settings work out of the box - no configuration needed! Simply create `DataDesigner` and `DataDesignerConfigBuilder` instances without any arguments, and reference the default model aliases in your column configurations. + +For a complete example showing how to use default model settings, see the **[Quick Start Guide](../../quick-start.md)**. + ### How Default Model Providers and Configurations Work When the Data Designer library or the CLI is initialized, default model configurations and providers are stored in the Data Designer home directory for easy access and customization if they do not already exist. These configuration files serve as the single source of truth for model settings. By default they are saved to the following paths: @@ -90,6 +96,6 @@ Both methods operate on the same files, ensuring consistency across your entire ## See Also -- **[Configure Model Settings With the CLI](configure-model-settings-with-the-cli.md)**: Learn how to use the CLI to manage model settings. -- **[Quick Start Guide](../../quick-start.md)**: Get started with a simple example +- **[Custom Model Settings](custom-model-settings.md)**: Learn how to create custom providers and model configurations +- **[Configure Model Settings With the CLI](configure-model-settings-with-the-cli.md)**: Learn how to use the CLI to manage model settings - **[Model Configurations](model-configs.md)**: Learn about model configurations diff --git a/docs/concepts/models/inference-parameters.md b/docs/concepts/models/inference-parameters.md index 6d7d079f..c77b22df 100644 --- a/docs/concepts/models/inference-parameters.md +++ b/docs/concepts/models/inference-parameters.md @@ -142,6 +142,7 @@ The `EmbeddingInferenceParams` class controls how models generate embeddings. Th ## See Also +- **[Default Model Settings](default-model-settings.md)**: Pre-configured model settings included with Data Designer +- **[Custom Model Settings](custom-model-settings.md)**: Learn how to create custom providers and model configurations - **[Model Configurations](model-configs.md)**: Learn about configuring model settings - **[Model Providers](model-providers.md)**: Learn about configuring model providers -- **[Default Model Settings](default-model-settings.md)**: Pre-configured model settings included with Data Designer diff --git a/docs/concepts/models/model-configs.md b/docs/concepts/models/model-configs.md index c78aa2ee..b91640b0 100644 --- a/docs/concepts/models/model-configs.md +++ b/docs/concepts/models/model-configs.md @@ -116,5 +116,8 @@ model_configs = [ - **[Inference Parameters](inference-parameters.md)**: Detailed guide to inference parameters and how to configure them - **[Model Providers](model-providers.md)**: Learn about configuring model providers - **[Default Model Settings](default-model-settings.md)**: Pre-configured model settings included with Data Designer +- **[Custom Model Settings](custom-model-settings.md)**: Learn how to create custom providers and model configurations +- **[Inference Parameters](inference-parameters.md)**: Detailed guide to inference parameters and how to configure them +- **[Model Providers](model-providers.md)**: Learn about configuring model providers - **[Configure Model Settings With the CLI](configure-model-settings-with-the-cli.md)**: Use the CLI to manage model settings - **[Column Configurations](../../code_reference/column_configs.md)**: Learn how to use models in column configurations diff --git a/docs/concepts/models/model-providers.md b/docs/concepts/models/model-providers.md index f21ae4ca..93d911f3 100644 --- a/docs/concepts/models/model-providers.md +++ b/docs/concepts/models/model-providers.md @@ -47,5 +47,8 @@ provider = ModelProvider( - **[Model Configurations](model-configs.md)**: Learn about configuring models - **[Inference Parameters](inference-parameters.md)**: Detailed guide to inference parameters and how to configure them - **[Default Model Settings](default-model-settings.md)**: Pre-configured providers and model settings included with Data Designer +- **[Custom Model Settings](custom-model-settings.md)**: Learn how to create custom providers and model configurations +- **[Model Configurations](model-configs.md)**: Learn about configuring models +- **[Inference Parameters](inference-parameters.md)**: Detailed guide to inference parameters and how to configure them - **[Configure Model Settings With the CLI](configure-model-settings-with-the-cli.md)**: Use the CLI to manage providers and model settings - **[Quick Start Guide](../../quick-start.md)**: Get started with a simple example diff --git a/mkdocs.yml b/mkdocs.yml index 84517616..4b02475b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -11,6 +11,7 @@ nav: - Concepts: - Models: - Default Model Settings: concepts/models/default-model-settings.md + - Custom Model Settings: concepts/models/custom-model-settings.md - Configure with the CLI: concepts/models/configure-model-settings-with-the-cli.md - Model Providers: concepts/models/model-providers.md - Model Configs: concepts/models/model-configs.md