Skip to content

Latest commit

 

History

History
218 lines (153 loc) · 7.03 KB

File metadata and controls

218 lines (153 loc) · 7.03 KB

Development Guide

This document covers local setup, day-to-day development workflow, testing, and pre-commit usage for DataDesigner contributors.

For architectural invariants and project identity, see AGENTS.md. For code style, naming, and import conventions, see STYLEGUIDE.md. For the contribution workflow (issues, PRs, agent-assisted development), see CONTRIBUTING.md.


Prerequisites

  • Python 3.10+
  • uv for dependency management
  • GNU Make for common tasks

Local Setup

Clone and Install

git clone https://github.com/NVIDIA-NeMo/DataDesigner.git
cd DataDesigner

# Install with dev dependencies
make install-dev

# Or, if you use Jupyter / IPython for development
make install-dev-notebooks

Verify Your Setup

make test && make check-all

If no errors are reported, you're ready to develop.


Day-to-Day Workflow

Branching

git checkout main
git pull origin main
git checkout -b <username>/<type>/<issue-number>-<short-description>

Branch name types: feat, fix, docs, test, refactor, chore, style, perf.

Example: nmulepati/feat/123-add-xyz-generator

Syncing with Upstream

If you're working from a fork, add the upstream remote first:

git remote add upstream https://github.com/NVIDIA-NeMo/DataDesigner.git

Then sync:

git fetch upstream
git merge upstream/main

Validation Before Committing

make check-all-fix   # format + lint (ruff)
make test            # run all test suites

Code Quality

Using Make (Recommended)

make lint              # Run ruff linter
make lint-fix          # Fix linting issues automatically
make format            # Format code with ruff
make format-check      # Check code formatting without changes
make check-all         # Run all checks (format-check + lint)
make check-all-fix     # Run all checks with autofix (format + lint-fix)

Direct Commands

uv run ruff check                # Lint all files
uv run ruff check --fix          # Lint with autofix
uv run ruff format               # Format all files
uv run ruff format --check       # Check formatting

Testing

Running Tests

make test runs all three package test suites in sequence (config, engine, interface). When iterating on a single package, run its tests directly:

# Run all tests (config + engine + interface)
make test

# Run a single package's tests
make test-config       # data-designer-config
make test-engine       # data-designer-engine
make test-interface    # data-designer (interface)

# Run a specific test file
uv run pytest tests/config/test_sampler_constraints.py

# Run tests with verbose output
uv run pytest -v

# Run tests with coverage
make coverage
# View htmlcov/index.html in browser

# E2E and example tests (slower, require API keys — see README.md for setup)
make test-e2e              # end-to-end tests
make test-run-tutorials    # run tutorial notebooks as tests
make test-run-recipes      # run recipe scripts as tests

Test Patterns

The project uses pytest with the following patterns:

  • Flat test functions: Write standalone test_* functions, not class-based test suites. Use fixtures and parametrize for shared setup instead of class inheritance.
  • Fixtures: Shared fixtures are provided via pytest_plugins from data_designer.config.testing.fixtures and data_designer.engine.testing.fixtures, plus local conftest.py files in each test directory
  • Stub configs: YAML-based configuration stubs for testing (see stub_data_designer_config_str fixture)
  • Mocking: Use unittest.mock.patch for external services and dependencies
  • Async support: pytest-asyncio for async tests (asyncio_default_fixture_loop_scope = "session")
  • HTTP mocking: pytest-httpx for mocking HTTP requests
  • Coverage: Track test coverage with pytest-cov

Test Guidelines

  • Test public APIs only: Tests should exercise public interfaces, not _-prefixed functions or classes. If something is hard to test without reaching into private internals, consider refactoring the code to expose a public entry point
  • Type annotations required: Test functions and fixtures must include type annotations — -> None for tests, typed parameters, and typed return values for fixtures
  • Imports at module level: Follow the same import rules as production code — keep imports at the top of the file, not inside test functions
  • Parametrize over duplicate: Use @pytest.mark.parametrize (with ids= for readable names) instead of writing multiple test functions for variations of the same behavior
  • Minimal fixtures: Fixtures should be simple — one fixture, one responsibility, just setup with no behavior logic
  • Shared fixtures in conftest.py: Place fixtures shared across a test directory in conftest.py
  • Mock at boundaries: Mock external dependencies (APIs, databases, third-party services), not internal functions
  • Test behavior, not implementation: Assert on outputs and side effects, not internal call counts (unless verifying routing)
  • Keep mocking shallow: If a test requires deeply nested mocking, the code under test may need refactoring

Example Test

from typing import Any

from data_designer.config.config_builder import DataDesignerConfigBuilder


def test_something(stub_model_configs: dict[str, Any]) -> None:
    """Test description."""
    builder = DataDesignerConfigBuilder(model_configs=stub_model_configs)
    # ... test implementation
    assert expected == actual

Pre-commit Hooks

The project uses pre-commit hooks to enforce code quality. Install them with:

uv run pre-commit install

Hooks include:

  • Trailing whitespace removal
  • End-of-file fixer
  • YAML/JSON/TOML validation
  • Merge conflict detection
  • Debug statement detection
  • Ruff linting and formatting

Common Tasks

make clean                       # Clean up generated files
make update-license-headers      # Add SPDX headers to new files
make check-all-fix               # Format + lint before committing
make test                        # Run all tests
make coverage                    # Run tests with coverage report
make perf-import                 # Profile import time
make perf-import CLEAN=1         # Clean cache first, then profile
make convert-execute-notebooks   # Regenerate .ipynb from docs/notebook_source/*.py
make generate-colab-notebooks    # Generate Colab-compatible notebooks

Import Performance

After adding heavy third-party dependencies, verify import performance:

make perf-import CLEAN=1

There is also a CI test (test_import_performance in packages/data-designer/tests/test_import_perf.py) that runs 5 import cycles (1 cold + 4 warm) and fails if the average exceeds 3 seconds. If your dependency causes a regression, add it to lazy_heavy_imports.py — see STYLEGUIDE.md for the lazy loading pattern.