Skip to content

Commit 6bc4c17

Browse files
committed
Merge branch 'main' into johnny/vibe-cli
2 parents 1491e17 + affc46f commit 6bc4c17

File tree

111 files changed

+2350
-1047
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+2350
-1047
lines changed

.github/workflows/build_docs.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
name: build_docs
2+
# Disable for now while the repo is private
3+
# on:
4+
# push:
5+
# branches:
6+
# - main
7+
jobs:
8+
deploy:
9+
runs-on: ubuntu-latest
10+
permissions:
11+
contents: write
12+
steps:
13+
- uses: actions/checkout@v2
14+
- uses: astral-sh/setup-uv@v6
15+
with:
16+
version: "0.9.5"
17+
- run: uv python install
18+
- run: uv sync --group docs
19+
- run: uv run mkdocs gh-deploy --force --clean --verbose

.github/workflows/dco-assistant.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,14 @@ jobs:
2727
steps:
2828
- name: "DCO Assistant"
2929
if: (github.event.comment.body == 'recheck' || github.event.comment.body == 'I have read the Contributor Agreement including DCO and I hereby sign the Contributor Agreement and DCO') || github.event_name == 'pull_request_target'
30-
uses: contributor-assistant/github-action@v2.6.1
30+
uses: contributor-assistant/github-action@ca4a40a7d1004f18d9960b404b97e5f30a505a08
3131
env:
3232
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
3333
PERSONAL_ACCESS_TOKEN: ${{ secrets.DCO_ASSISTANT_TOKEN }}
3434
with:
3535
path-to-signatures: "dco-signatures.json"
3636
path-to-document: 'https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/DCO'
37-
branch: 'main'
37+
branch: 'signatures'
3838
allowlist: dependabot
3939
create-file-commit-message: "chore: create file to store dco signatures"
4040
signed-commit-message: "chore: $contributorName has signed the dco in #$pullRequestNo"

CONTRIBUTING.md

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
# 🎨✨ Contributing to NeMo Data Designer 🎨✨
2+
3+
Thank you for your interest in contributing to Data Designer!
4+
5+
We welcome contributions from the community and sincerely appreciate your efforts to improve the project. Whether you're fixing a typo, reporting a bug, proposing a new feature, or implementing a major enhancement, your work helps make Data Designer better for everyone 🎉.
6+
7+
This guide will help you get started with the contribution process.
8+
9+
## Table of Contents
10+
11+
- [Getting Started](#getting-started)
12+
- [Ways to Contribute](#ways-to-contribute)
13+
- [Feature Requests](#feature-requests)
14+
- [Development Guide](#development-guide)
15+
- [Code Quality Standards](#code-quality-standards)
16+
- [Submitting Changes](#submitting-changes)
17+
- [Code of Conduct](#code-of-conduct)
18+
- [Signing off on your work](#signing-off-on-your-work)
19+
20+
21+
## Getting Started
22+
👋 Welcome to the Data Designer community! We're excited to have you here.
23+
24+
Whether you're new to the project or ready to dive in, the resources below will help you get oriented and productive quickly:
25+
26+
1. **[README.md](README.md)** – best place to start to learn the basics of the project
27+
28+
2. **[AGENTS.md](AGENTS.md)** – context and instructions to help AI coding agents work on Data Designer (it's also useful for human developers!)
29+
30+
3. **[Documentation](docs/)** – detailed documentation on Data Designer's capabilities and usage
31+
32+
## Ways to Contribute
33+
34+
There are many ways to contribute to Data Designer:
35+
36+
### 🐛 Bug Fixes
37+
38+
Found a bug? Before reporting, please
39+
1. Verify you're using the latest version: `uv pip install --upgrade data-designer`
40+
2. Search for duplicates in the [issue tracker](https://github.com/NVIDIA-NeMo/DataDesigner/issues)
41+
42+
When [creating a bug report](https://github.com/NVIDIA-NeMo/DataDesigner/issues/new), please include:
43+
- Data Designer version: `python -c "import data_designer; print(data_designer.__version__)"`
44+
- Python version and operating system
45+
- Minimal reproducible example
46+
- Expected vs. actual behavior
47+
- Full error messages and stack traces
48+
49+
If you are interested in fixing the bug yourself, that's AWESOME! Please follow the [development guide](#development-guide) to get started.
50+
51+
### ✨ Feature Implementation
52+
Want to add new functionality? Great! Please review [our development approach](#feature-requests) and open a feature request to discuss the idea and get feedback before investing significant time on the implementation.
53+
54+
### 📖 Documentation Improvements
55+
Documentation is crucial for user adoption. Contributions that clarify usage, add examples, or fix typos are highly valued.
56+
57+
### 💡 Examples and Tutorials
58+
Share your use cases! Example notebooks and tutorials help others understand how to leverage Data Designer effectively.
59+
60+
### 🧪 Test Coverage
61+
Help us improve test coverage by adding tests for untested code paths or edge cases.
62+
63+
## Feature Requests
64+
Data Designer is designed to be as flexible and extensible as possible, and we welcome your ideas for pushing its capabilities even further! To keep the core library maintainable, while also supporting innovation, we take an incremental approach when adding new features – we explore what's already possible, extend through plugins when needed, and integrate the most broadly useful features into the core library:
65+
66+
### How We Grow Data Designer
67+
1. 🧗 **Explore what's possible**: Can your use case be achieved with current features? We've designed Data Designer to be composable – sometimes creative combinations of existing tools can accomplish what you need. Check out our examples or open an issue if you'd like help exploring this!
68+
69+
2. 🔌 **Extend through plugins**: If existing features aren't quite enough, consider implementing your idea as a plugin that extends the core library. Plugins let you experiment and share functionality while keeping the core library focused.
70+
71+
3. ⚙️ **Integrate into the core library**: If your feature or plugin proves broadly useful and aligns with Data Designer's goals, we'd love to integrate it into the core library! We're happy to discuss whether it's a good fit and how to move forward together.
72+
73+
This approach helps us grow thoughtfully while keeping Data Designer focused and maintainable.
74+
75+
### Submitting a Feature Request
76+
Open a [new issue](https://github.com/NVIDIA-NeMo/DataDesigner/issues/new) with:
77+
78+
- **Clear title**: Concise description of the feature
79+
- **Use case**: Explain what problem this solves and why it's important
80+
- **Proposed solution**: Describe how you envision the feature working
81+
- **Alternatives considered**: Other approaches you've thought about
82+
- **Examples**: Code examples or mockups of how users would interact with the feature
83+
- **Willingness to implement**: Are you interested in implementing this yourself?
84+
85+
## Development Guide
86+
Data Designer uses [`uv`](https://github.com/astral-sh/uv) for dependency management. If you don't have uv installed, follow their [installation instructions](https://docs.astral.sh/uv/getting-started/installation/).
87+
88+
### Initial Setup
89+
0. **Create or find an issue**
90+
91+
Before starting work, ensure there's an issue tracking your contribution:
92+
- For bug fixes: Search [existing issues](https://github.com/NVIDIA-NeMo/DataDesigner/issues) or [create a new one](https://github.com/NVIDIA-NeMo/DataDesigner/issues/new)
93+
- For new features: Open a [feature request](#feature-requests) to discuss the approach first
94+
- Comment on the issue to let maintainers know you're working on it
95+
96+
1. **Fork and clone the repository**
97+
98+
Start by [forking the Data Designer repository](https://github.com/NVIDIA-NeMo/DataDesigner/fork), then clone your fork and add the upstream remote:
99+
100+
```bash
101+
git clone https://github.com/YOUR_GITHUB_USERNAME/DataDesigner.git
102+
103+
cd DataDesigner
104+
105+
git remote add upstream https://github.com/NVIDIA-NeMo/DataDesigner.git
106+
```
107+
108+
2. **Install dependencies**
109+
110+
```bash
111+
# Install project with dev dependencies
112+
make install-dev
113+
114+
# Or, if you use Jupyter / IPython for development
115+
make install-dev-notebooks
116+
```
117+
118+
3. **Verify your setup**
119+
120+
```bash
121+
make test && make check-all
122+
```
123+
124+
If no errors are reported, you're ready to develop 🚀
125+
126+
### Making Changes
127+
128+
1. **Create a feature branch**
129+
130+
```bash
131+
git checkout main
132+
git pull upstream main
133+
git checkout -b <username>/<type-of-change>/<issue-number>-<short-description>
134+
```
135+
136+
Example types of change:
137+
- `feat` for new features
138+
- `fix` for bug fixes
139+
- `docs` for documentation updates
140+
- `test` for testing changes
141+
- `refactor` for code refactoring
142+
- `chore` for chore tasks
143+
- `style` for style changes
144+
- `perf` for performance improvements
145+
146+
Example branch name:
147+
- `johnnygreco/feat/123-add-xyz-generator` for a new feature by @johnnygreco, addressing issue #123
148+
149+
2. **Develop your changes**
150+
151+
Please follow the patterns and conventions used throughout the codebase, as well as those outlined in [AGENTS.md](AGENTS.md).
152+
153+
3. **Test and validate**
154+
155+
```bash
156+
make check-all-fix # Format code and fix linting issues
157+
make test # Run all tests
158+
make coverage # Check test coverage (must be >90%)
159+
```
160+
161+
**Writing tests**: Place tests in [tests/](tests/) mirroring the source structure. Use fixtures from [tests/conftest.py](tests/conftest.py), mock external services with `unittest.mock` or `pytest-httpx`, and test both success and failure cases. See [AGENTS.md](AGENTS.md) for patterns and examples.
162+
163+
4. **Commit your work**
164+
165+
Write clear, descriptive commit messages, optionally including a brief summary (50 characters or less) and reference issue numbers when applicable (e.g., "Fixes #123").
166+
167+
```bash
168+
git commit -m "Add XYZ generator for synthetic data" -m "Fixes #123"
169+
```
170+
171+
5. **Stay up to date**
172+
173+
Regularly sync your branch with upstream changes:
174+
175+
```bash
176+
git fetch upstream
177+
git merge upstream/main
178+
```
179+
180+
## Submitting Changes
181+
182+
### Before Submitting
183+
184+
Ensure your changes meet the following criteria:
185+
186+
- All tests pass (`make test`)
187+
- Code is formatted and linted (`make check-all-fix`)
188+
- New functionality includes tests
189+
- Documentation is updated (README, docstrings, examples)
190+
- License headers are present on all new files
191+
- Commit messages are clear and descriptive
192+
193+
### Creating a Pull Request
194+
195+
1. **Push your changes** to your fork:
196+
197+
```bash
198+
git push origin <username>/<type-of-change>/<issue-number>-<short-description>
199+
```
200+
201+
2. **Open a pull request** on GitHub from your fork to the main repository
202+
203+
3. **Respond to review feedback** update your PR as needed
204+
205+
### Pull Request Review Process
206+
207+
- Maintainers will review your PR and may request changes
208+
- Address feedback by pushing additional commits to your branch
209+
- Reply to the feedback comment with a link to the commit that addresses it.
210+
- Once approved, a maintainer will merge your PR
211+
- Your contribution will be included in the next release!
212+
213+
## Code of Conduct
214+
Data Designer follows the Contributor Covenant Code of Conduct. We are committed to providing a welcoming and inclusive environment for all contributors.
215+
216+
**Please read our complete [Code of Conduct](CODE_OF_CONDUCT.md)** for full details on our standards and expectations.
217+
218+
### License File Headers
219+
All code files that are added to this repository must include the appropriate NVIDIA copyright header:
220+
221+
```python
222+
# SPDX-FileCopyrightText: Copyright (c) {YEAR} NVIDIA CORPORATION & AFFILIATES. All rights reserved.
223+
# SPDX-License-Identifier: Apache-2.0
224+
```
225+
226+
Use `make update-license-headers` to add headers automatically.
227+
228+
## Getting Help
229+
Need help with your contribution?
230+
231+
- **Documentation**: Check the [documentation](docs/) and [AGENTS.md](AGENTS.md) for additional information
232+
- **Issues**: Browse [existing issues](https://github.com/NVIDIA-NeMo/DataDesigner/issues) for similar questions
233+
- **Contact**: Reach out to the core maintainers at [[email protected]](mailto:[email protected])
234+
235+
236+
## Signing off on your work
237+
238+
When contributing to this project, you must agree that you have authored 100% of the content, that you have the necessary rights to the content and that the content you contribute may be provided under the project license. All contributors are asked to sign the Data Designer [Developer Certificate of Origin (DCO)](DCO) when submitting their first pull request. The process is automated by a bot that will comment on the pull request. Our DCO is the same as the Linux Foundation requires its contributors to sign.
239+
240+
---
241+
242+
Thank you for contributing to NeMo Data Designer! Your efforts help make synthetic data generation more accessible and powerful for everyone. 🎨✨

Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ help:
3535
@echo ""
3636
@echo "🛠️ Utilities:"
3737
@echo " clean - Remove coverage reports and cache files"
38+
@echo " serve-docs-locally - Serve documentation locally"
3839
@echo " check-license-headers - Check if all files have license headers"
3940
@echo " update-license-headers - Add license headers to all files"
4041
@echo ""
@@ -82,6 +83,11 @@ test:
8283
@echo "🧪 Running unit tests..."
8384
uv run --group dev pytest
8485

86+
serve-docs-locally:
87+
@echo "📝 Building and serving docs..."
88+
uv sync --group docs
89+
uv run mkdocs serve
90+
8591
check-license-headers:
8692
@echo "🔍 Checking license headers in all files..."
8793
uv run python $(REPO_PATH)/scripts/update_license_headers.py --check

docs/css/code_select.css

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.highlight .gp, .highlight .go { /* Generic.Prompt, Generic.Output */
2+
user-select: none;
3+
}

docs/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Data Designer
2+
3+
Welcome to the Data Designer documentation.

mkdocs.yml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
site_name: Data Designer
2+
theme:
3+
name: material
4+
plugins:
5+
- search
6+
- gen-files:
7+
scripts:
8+
- scripts/gen_ref_pages.py
9+
- literate-nav:
10+
nav_file: SUMMARY.md
11+
- section-index
12+
- mkdocstrings:
13+
handlers:
14+
python:
15+
paths: [src]
16+
filters: "!^(__.*__$)"
17+
nav:
18+
- Welcome: index.md
19+
- Code Reference: reference/
20+
extra_css:
21+
- css/code_select.css

pyproject.toml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,15 @@ dev = [
6666
"pytest-cov>=7.0.0",
6767
"pytest-httpx>=0.35.0",
6868
]
69+
docs = [
70+
"mkdocs>=1.6.1",
71+
"mkdocstrings>=0.30.1",
72+
"mkdocstrings-python>=1.18.2",
73+
"mkdocs-gen-files>=0.5.0",
74+
"mkdocs-literate-nav>=0.6.2",
75+
"mkdocs-section-index>=0.3.10",
76+
"mkdocs-material>=9.6.22",
77+
]
6978
notebooks = [
7079
"jupyter>=1.0.0",
7180
"ipykernel>=6.29.0",

scripts/gen_ref_pages.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
"""Generate the code reference pages and navigation."""
5+
6+
from pathlib import Path
7+
8+
import griffe
9+
import mkdocs_gen_files
10+
11+
data = griffe.load("data_designer")
12+
nav = mkdocs_gen_files.Nav()
13+
14+
root = Path(__file__).parent.parent
15+
src = root / "src"
16+
17+
for path in sorted(src.rglob("*.py")):
18+
module_path = path.relative_to(src).with_suffix("")
19+
doc_path = path.relative_to(src).with_suffix(".md")
20+
full_doc_path = Path("reference", doc_path)
21+
22+
parts = tuple(module_path.parts)
23+
24+
if parts[-1] == "__main__":
25+
continue
26+
27+
if parts[-1] == "__init__":
28+
parts = parts[:-1]
29+
doc_path = doc_path.with_name("index.md")
30+
full_doc_path = full_doc_path.with_name("index.md")
31+
32+
init_module = data[parts[1:]] if len(parts) > 1 else data
33+
has_docstrings = init_module.has_docstring or any(
34+
member.has_docstrings
35+
for member in init_module.members.values()
36+
if not (member.is_alias or member.is_module)
37+
)
38+
else:
39+
has_docstrings = data[parts[1:]].has_docstrings
40+
41+
if not has_docstrings:
42+
continue
43+
44+
nav[parts] = doc_path.as_posix()
45+
with mkdocs_gen_files.open(full_doc_path, "w") as fd:
46+
ident = ".".join(parts)
47+
fd.write(f"::: {ident}")
48+
49+
mkdocs_gen_files.set_edit_path(full_doc_path, path.relative_to(root))
50+
51+
with mkdocs_gen_files.open("reference/SUMMARY.md", "w") as nav_file:
52+
nav_file.writelines(nav.build_literate_nav())

0 commit comments

Comments
 (0)