This document covers local setup, day-to-day development workflow, testing, and pre-commit usage for DataDesigner contributors.
For architectural invariants and project identity, see AGENTS.md. For code style, naming, and import conventions, see STYLEGUIDE.md. For the contribution workflow (issues, PRs, agent-assisted development), see CONTRIBUTING.md.
git clone https://github.com/NVIDIA-NeMo/DataDesigner.git
cd DataDesigner
# Install with dev dependencies
make install-dev
# Or, if you use Jupyter / IPython for development
make install-dev-notebooksmake test && make check-allIf no errors are reported, you're ready to develop.
git checkout main
git pull origin main
git checkout -b <username>/<type>/<issue-number>-<short-description>Branch name types: feat, fix, docs, test, refactor, chore, style, perf.
Example: nmulepati/feat/123-add-xyz-generator
If you're working from a fork, add the upstream remote first:
git remote add upstream https://github.com/NVIDIA-NeMo/DataDesigner.gitThen sync:
git fetch upstream
git merge upstream/mainmake check-all-fix # format + lint (ruff)
make test # run all test suitesmake lint # Run ruff linter
make lint-fix # Fix linting issues automatically
make format # Format code with ruff
make format-check # Check code formatting without changes
make check-all # Run all checks (format-check + lint)
make check-all-fix # Run all checks with autofix (format + lint-fix)uv run ruff check # Lint all files
uv run ruff check --fix # Lint with autofix
uv run ruff format # Format all files
uv run ruff format --check # Check formattingmake test runs all three package test suites in sequence (config, engine, interface). When iterating on a single package, run its tests directly:
# Run all tests (config + engine + interface)
make test
# Run a single package's tests
make test-config # data-designer-config
make test-engine # data-designer-engine
make test-interface # data-designer (interface)
# Run a specific test file
uv run pytest tests/config/test_sampler_constraints.py
# Run tests with verbose output
uv run pytest -v
# Run tests with coverage
make coverage
# View htmlcov/index.html in browser
# E2E and example tests (slower, require API keys — see README.md for setup)
make test-e2e # end-to-end tests
make test-run-tutorials # run tutorial notebooks as tests
make test-run-recipes # run recipe scripts as testsThe project uses pytest with the following patterns:
- Flat test functions: Write standalone
test_*functions, notclass-based test suites. Use fixtures and parametrize for shared setup instead of class inheritance. - Fixtures: Shared fixtures are provided via
pytest_pluginsfromdata_designer.config.testing.fixturesanddata_designer.engine.testing.fixtures, plus localconftest.pyfiles in each test directory - Stub configs: YAML-based configuration stubs for testing (see
stub_data_designer_config_strfixture) - Mocking: Use
unittest.mock.patchfor external services and dependencies - Async support: pytest-asyncio for async tests (
asyncio_default_fixture_loop_scope = "session") - HTTP mocking: pytest-httpx for mocking HTTP requests
- Coverage: Track test coverage with pytest-cov
- Test public APIs only: Tests should exercise public interfaces, not
_-prefixed functions or classes. If something is hard to test without reaching into private internals, consider refactoring the code to expose a public entry point - Type annotations required: Test functions and fixtures must include type annotations —
-> Nonefor tests, typed parameters, and typed return values for fixtures - Imports at module level: Follow the same import rules as production code — keep imports at the top of the file, not inside test functions
- Parametrize over duplicate: Use
@pytest.mark.parametrize(withids=for readable names) instead of writing multiple test functions for variations of the same behavior - Minimal fixtures: Fixtures should be simple — one fixture, one responsibility, just setup with no behavior logic
- Shared fixtures in
conftest.py: Place fixtures shared across a test directory inconftest.py - Mock at boundaries: Mock external dependencies (APIs, databases, third-party services), not internal functions
- Test behavior, not implementation: Assert on outputs and side effects, not internal call counts (unless verifying routing)
- Keep mocking shallow: If a test requires deeply nested mocking, the code under test may need refactoring
from typing import Any
from data_designer.config.config_builder import DataDesignerConfigBuilder
def test_something(stub_model_configs: dict[str, Any]) -> None:
"""Test description."""
builder = DataDesignerConfigBuilder(model_configs=stub_model_configs)
# ... test implementation
assert expected == actualThe project uses pre-commit hooks to enforce code quality. Install them with:
uv run pre-commit installHooks include:
- Trailing whitespace removal
- End-of-file fixer
- YAML/JSON/TOML validation
- Merge conflict detection
- Debug statement detection
- Ruff linting and formatting
make clean # Clean up generated files
make update-license-headers # Add SPDX headers to new files
make check-all-fix # Format + lint before committing
make test # Run all tests
make coverage # Run tests with coverage report
make perf-import # Profile import time
make perf-import CLEAN=1 # Clean cache first, then profile
make convert-execute-notebooks # Regenerate .ipynb from docs/notebook_source/*.py
make generate-colab-notebooks # Generate Colab-compatible notebooksAfter adding heavy third-party dependencies, verify import performance:
make perf-import CLEAN=1There is also a CI test (test_import_performance in packages/data-designer/tests/test_import_perf.py) that runs 5 import cycles (1 cold + 4 warm) and fails if the average exceeds 3 seconds. If your dependency causes a regression, add it to lazy_heavy_imports.py — see STYLEGUIDE.md for the lazy loading pattern.