Skip to content

Commit 2b2bb35

Browse files
authored
Merge pull request #363 from johnnygreco/25.10-getting-started
25.10 getting started
2 parents 46d34fb + 81cb41c commit 2b2bb35

30 files changed

+2972
-38
lines changed

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@
99
RAG/examples/**/volumes
1010
uploaded_files/
1111

12+
13+
# Mac OS exclusions
14+
.DS_Store
15+
1216
# Visual Studio Code
1317
.vscode
1418

@@ -38,3 +42,9 @@ RAG/notebooks/langchain/data/save_embedding
3842

3943
# egg-info directories
4044
**/egg-info
45+
46+
# uv exclusion
47+
uv.lock
48+
49+
# data designer exclusion
50+
data-designer-tutorial-output/

nemo/NeMo-Data-Designer/README.md

Lines changed: 17 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# 🎨 NeMo Data Designer Tutorial Notebooks
22

3-
This directory contains the tutorial notebooks for getting started with NeMo Data Designer.
3+
This directory contains tutorial notebooks for getting started with NeMo Data Designer.
44

55
## 📦 Set Up the Environment
66

7-
We will use the `uv` package manager to set up our environment and install the necessary dependencies. If you don't have `uv` installed, you can follow the installation instructions from the [uv documentation](https://docs.astral.sh/uv/getting-started/installation/).
7+
We will use the `uv` package manager to set up our environment and install the necessary dependencies. If you don't have `uv` installed, follow the installation instructions from the [uv documentation](https://docs.astral.sh/uv/getting-started/installation/).
88

99
Once you have `uv` installed, be sure you are in the `Nemo-Data-Designer` directory and run the following command:
1010

@@ -22,48 +22,28 @@ Be sure to select this virtual environment as your kernel when running the noteb
2222

2323
## 🚀 Deploying the NeMo Data Designer Microservice
2424

25-
To run these notebooks, you'll need the NeMo Data Designer microservice. You have two deployment options:
25+
To run the tutorial notebooks in this repository, you'll need access to a running instance of the NeMo Data Designer microservice.
2626

27-
### ⚙️ Using the NeMo Data Designer Managed Service
28-
We have a [managed service of NeMo Data Designer](https://build.nvidia.com/nemo/data-designer) to help you get started quickly.
29-
30-
Please refer to the [intro-tutorials](./intro-tutorials/) notebooks to learn how to connect to this service.
27+
You have two deployment options:
3128

32-
**Note**: This managed service of NeMo Data Designer is intended to only help you get started. As a result, it can only be used to launch `preview` jobs. It can **not** be used to launch long running jobs. If you need to launch long-running jobs please deploy an instance of [NeMo Data Designer locally](#-deploy-the-nemo-data-designer-microservice-locally)
29+
### 🐳 Self-Hosted Deployment
30+
Deploy the NeMo Data Designer microservice locally via Docker Compose.
3331

32+
Please see the [Installation Options](https://docs.nvidia.com/nemo/microservices/latest/design-synthetic-data-from-scratch-or-seeds/index.html#installation-options) section of the [NeMo Data Designer documentation](https://docs.nvidia.com/nemo/microservices/latest/design-synthetic-data-from-scratch-or-seeds/index.html) for more information.
3433

35-
### 🐳 Deploy the NeMo Data Designer Microservice Locally
3634

37-
Alternatively, you can deploy the NeMo Data Designer microservice locally via Docker Compose.
35+
### ⚙️ Managed Service
36+
We have a [managed service of NeMo Data Designer](https://build.nvidia.com/nemo/data-designer) to help you get started quickly.
3837

39-
To run the tutorial notebooks in the [advanced](./advanced/), you will need to have NeMo Data Designer deployed locally. Please see the [deployment guide](http://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-microservices/data-designer/docker-compose.html) for more details.
38+
**Note**: This managed service can only be used to launch `preview` jobs. It can **not** be used to launch long-running jobs. If you need to launch long-running jobs please deploy an instance of NeMo Data Designer locally.
4039

4140

4241
## 📚 Tutorial Directory
4342

44-
### 🚀 Intro Tutorials
45-
46-
| Notebook | Description |
47-
|---------------------------------------------------|----------------------------------------------------------------------------------|
48-
| [1-the-basics.ipynb](./intro-tutorials/1-the-basics.ipynb) | Learn the basics of Data Designer by generating a simple product review dataset |
49-
| [2-structured-outputs-and-jinja-expressions.ipynb](./intro-tutorials/2-structured-outputs-and-jinja-expressions.ipynb) | Explore advanced data generation using structured outputs and Jinja expressions |
50-
| [3-seeding-with-a-dataset.ipynb](./intro-tutorials/3-seeding-with-a-dataset.ipynb) | Discover how to seed synthetic data generation with an external dataset |
51-
| [4-custom-model-configs.ipynb](./intro-tutorials/4-custom-model-configs.ipynb) | Master creating and using custom model configurations |
52-
53-
### 🎯 Advanced Tutorials
54-
55-
| Notebook | Domain | Description |
56-
|---------------------------------------------------|---------------------|-----------------------------------------------------------------|
57-
| [person-sampler-tutorial.ipynb](./advanced/person-samplers/person-sampler-tutorial.ipynb) | Persona Samplers | Generate realistic personas using the person sampler |
58-
| [clinical-trials.ipynb](./advanced/healthcare-datasets/clinical-trials.ipynb) | Healthcare | Build synthetic clinical trial datasets with realistic PII for testing data protection |
59-
| [insurance-claims.ipynb](./advanced/healthcare-datasets/insurance-claims.ipynb) | Healthcare | Create synthetic insurance claims datasets with realistic claim data and processing information |
60-
| [physician-notes-with-realistic-personal-details.ipynb](./advanced/healthcare-datasets/physician-notes-with-realistic-personal-details.ipynb) | Healthcare | Generate realistic patient data and physician notes with embedded personal information |
61-
| [w2-dataset.ipynb](./advanced/forms/w2-dataset.ipynb) | Forms & Documents | Generate synthetic W-2 tax form datasets with realistic employee and employer information |
62-
| [multi-turn-conversation.ipynb](./advanced/multi-turn-chat/multi-turn-conversation.ipynb) | Conversational AI | Build synthetic conversational data with realistic person details and multi-turn dialogues |
63-
| [visual-question-answering-using-vlm.ipynb](./advanced/multimodal/visual-question-answering-using-vlm.ipynb) | Multimodal | Create visual question answering datasets using Vision Language Models |
64-
| [product-question-answer-generator.ipynb](./advanced/qa-generation/product-question-answer-generator.ipynb) | Q&A Generation | Build product information datasets with corresponding questions and answers |
65-
| [generate-rag-evaluation-dataset.ipynb](./advanced/rag-examples/generate-rag-evaluation-dataset.ipynb) | RAG & Retrieval | Generate diverse RAG evaluation datasets for testing retrieval-augmented generation systems |
66-
| [reasoning-traces.ipynb](./advanced/reasoning/reasoning-traces.ipynb) | Reasoning | Build synthetic reasoning traces to demonstrate step-by-step problem-solving processes |
67-
| [text-to-python.ipynb](./advanced/text-to-code/text-to-python.ipynb) | Text-to-Code | Generate Python code from natural language instructions with validation and evaluation |
68-
| [text-to-python-evol.ipynb](./advanced/text-to-code/text-to-python-evol.ipynb) | Text-to-Code | Build advanced Python code generation with evolutionary improvements and iterative refinement |
69-
| [text-to-sql.ipynb](./advanced/text-to-code/text-to-sql.ipynb) | Text-to-Code | Create SQL queries from natural language descriptions with validation and testing |
43+
If you find yourself writing Data Designer tutorial notebooks (thank you 🫶), please check out the [TUTORIAL_STYLE_GUIDE.md](./TUTORIAL_STYLE_GUIDE.md) for best practices and style guidelines.
44+
45+
#### Self-hosted tutorials:
46+
47+
- [Getting Started](./self-hosted-tutorials/getting-started): Learn the foundations of generating synthetic data with Data Designer.
48+
49+
- [Community Contributions](./self-hosted-tutorials/community-contributions/): Explore diverse use cases and advanced features in community-contributed notebooks.

0 commit comments

Comments
 (0)