Skip to content

Commit dc4ff19

Browse files
committed
trying something out
1 parent 5082514 commit dc4ff19

File tree

2 files changed

+69
-62
lines changed

2 files changed

+69
-62
lines changed
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Pack Tutorials
2+
3+
on:
4+
workflow_dispatch:
5+
push:
6+
branches:
7+
- main
8+
9+
jobs:
10+
pack-tutorials:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
# Step 1: Checkout the repository
15+
- name: Checkout repository
16+
uses: actions/checkout@v4
17+
18+
# Step 2: Zip the subfolder
19+
- name: Create ZIP of subfolder
20+
run: |
21+
SUBFOLDER="docs/tutorials"
22+
ZIP_NAME="${SUBFOLDER}.zip"
23+
24+
if [ ! -d "$SUBFOLDER" ]; then
25+
echo "Error: Subfolder '$SUBFOLDER' does not exist."
26+
exit 1
27+
fi
28+
29+
zip -r "$ZIP_NAME" "$SUBFOLDER"
30+
31+
# Step 3: Upload the ZIP as an artifact
32+
- name: Upload ZIP artifact
33+
uses: actions/upload-artifact@v4
34+
with:
35+
name: tutorials
36+
path: tutorials.zip

docs/notebooks/intro.md

Lines changed: 33 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -2,41 +2,6 @@
22

33
Welcome to the Data Designer tutorial series! These hands-on notebooks will guide you through the core concepts and features of Data Designer, from basic synthetic data generation to advanced techniques like structured outputs and dataset seeding.
44

5-
## 📚 Tutorial Series
6-
7-
The tutorials are designed to be completed in sequence, building upon concepts introduced in previous notebooks:
8-
9-
### [1. The Basics](1-the-basics.ipynb)
10-
11-
Learn the fundamentals of Data Designer by generating a simple product review dataset. This notebook covers:
12-
13-
- Setting up the `DataDesigner` interface
14-
- Configuring models and inference parameters
15-
- Using built-in samplers (Category, Person, Uniform)
16-
- Generating LLM text columns with dependencies
17-
- Understanding the generation workflow
18-
19-
**Start here if you're new to Data Designer!**
20-
21-
### [2. Structured Outputs and Jinja Expressions](2-structured-outputs-and-jinja-expressions.ipynb)
22-
23-
Explore more advanced data generation capabilities:
24-
25-
- Creating structured JSON outputs with schemas
26-
- Using Jinja expressions for derived columns
27-
- Combining samplers with structured data
28-
- Building complex data dependencies
29-
- Working with nested data structures
30-
31-
### [3. Seeding with an External Dataset](3-seeding-with-a-dataset.ipynb)
32-
33-
Learn how to leverage existing datasets to guide synthetic data generation:
34-
35-
- Loading and using seed datasets
36-
- Sampling from real data distributions
37-
- Combining seed data with LLM generation
38-
- Creating realistic synthetic data based on existing patterns
39-
405
## 🚀 Setting Up Your Environment
416

427
### Local Setup Best Practices
@@ -92,14 +57,40 @@ export OPENAI_API_KEY="your-api-key-here"
9257

9358
For more information, check the [Quick Start](../quick-start.md), [Default Model Settings](../models/default-model-settings.md) and how to [Configure Model Settings Using The CLI](../models/configure-model-settings-with-the-cli.md).
9459

95-
### Recommended Setup
60+
## 📚 Tutorial Series
61+
62+
The tutorials are designed to be completed in sequence, building upon concepts introduced in previous notebooks:
9663

97-
For the best experience with Data Designer notebooks:
64+
### [1. The Basics](1-the-basics.ipynb)
9865

99-
1. **Python Version:** Use Python 3.11 or later
100-
2. **Virtual Environment:** Always use a virtual environment to avoid dependency conflicts
101-
3. **Jupyter Extensions:** Consider installing JupyterLab for a more modern interface
102-
4. **API Access:** Ensure you have valid API keys for your chosen LLM provider
66+
Learn the fundamentals of Data Designer by generating a simple product review dataset. This notebook covers:
67+
68+
- Setting up the `DataDesigner` interface
69+
- Configuring models and inference parameters
70+
- Using built-in samplers (Category, Person, Uniform)
71+
- Generating LLM text columns with dependencies
72+
- Understanding the generation workflow
73+
74+
**Start here if you're new to Data Designer!**
75+
76+
### [2. Structured Outputs and Jinja Expressions](2-structured-outputs-and-jinja-expressions.ipynb)
77+
78+
Explore more advanced data generation capabilities:
79+
80+
- Creating structured JSON outputs with schemas
81+
- Using Jinja expressions for derived columns
82+
- Combining samplers with structured data
83+
- Building complex data dependencies
84+
- Working with nested data structures
85+
86+
### [3. Seeding with an External Dataset](3-seeding-with-a-dataset.ipynb)
87+
88+
Learn how to leverage existing datasets to guide synthetic data generation:
89+
90+
- Loading and using seed datasets
91+
- Sampling from real data distributions
92+
- Combining seed data with LLM generation
93+
- Creating realistic synthetic data based on existing patterns
10394

10495
## 📖 Important Documentation Sections
10596

@@ -125,24 +116,4 @@ Quick reference guides for the main configuration objects:
125116
- **[column_configs](../code_reference/column_configs.md)** - All column configuration types
126117
- **[config_builder](../code_reference/config_builder.md)** - The `DataDesignerConfigBuilder` API
127118
- **[data_designer_config](../code_reference/data_designer_config.md)** - Main configuration schema
128-
- **[validator_params](../code_reference/validator_params.md)** - Validator configuration options
129-
130-
## 💡 Tips for Success
131-
132-
- **Start Simple:** Begin with Tutorial 1 and work through the examples step by step
133-
- **Experiment:** Modify the examples to generate your own datasets
134-
- **Check Logs:** Data Designer provides detailed logging to help debug generation issues
135-
- **Read Error Messages:** Error messages are designed to be helpful and actionable
136-
- **Use the Registry:** Explore built-in samplers and generators using the registry system
137-
138-
## 🆘 Getting Help
139-
140-
If you run into issues:
141-
142-
1. Check the [GitHub Issues](https://github.com/NVIDIA-NeMo/DataDesigner/issues) for known problems
143-
2. Review the [Contributing Guide](../CONTRIBUTING.md) if you'd like to report a bug or contribute
144-
3. Consult the [Code Reference](../code_reference/column_configs.md) for detailed API documentation
145-
146-
---
147-
148-
Ready to get started? Head to [The Basics](1-the-basics.ipynb) to begin your journey! 🎨
119+
- **[validator_params](../code_reference/validator_params.md)** - Validator configuration options

0 commit comments

Comments
 (0)