Skip to content

Latest commit

 

History

History
208 lines (145 loc) · 8.77 KB

File metadata and controls

208 lines (145 loc) · 8.77 KB

Testing Strategy

This document describes the testing strategy for AutoDeploy, covering the multi-tiered approach used to ensure quality and reliability.

Testing Philosophy

AutoDeploy uses a multi-tiered testing approach that balances fast feedback with comprehensive coverage:

┌─────────────────────────────────────────────────────────┐
│                    Dashboard                            │
│         (Broad model coverage + performance)            │
├─────────────────────────────────────────────────────────┤
│                 Integration Tests                       │
│            (Accuracy tests, CI-registered)              │
├─────────────────────────────────────────────────────────┤
│                  E2E Mini Tests                         │
│           (Compile + prompt workflows)                  │
├─────────────────────────────────────────────────────────┤
│                    Unit Tests                           │
│      (Component testing: patches, transforms, etc.)     │
└─────────────────────────────────────────────────────────┘
  • Unit Tests: Fast, isolated tests for individual components (patches, transforms, custom ops)
  • E2E Mini Tests: End-to-end workflows testing compile + prompt for unique model combinations
  • Integration Tests: Important accuracy tests registered individually in CI
  • Dashboard: Broad model coverage and performance testing across all supported models

Unit Tests

Unit tests verify individual components like patches, transformations, custom operations, and utilities.

Location

All unit tests are located in tests/unittest/auto_deploy/:

tests/unittest/auto_deploy/
├── _utils_test/                    # Shared test utilities
├── singlegpu/                      # Single GPU tests
│   ├── compile/                    # Compilation tests
│   ├── custom_ops/                 # Custom operations tests
│   ├── models/                     # Model-specific patch tests
│   ├── shim/                       # Executor/engine tests
│   ├── smoke/                      # E2E mini tests (see below)
│   ├── transformations/            # Graph transformation tests
│   └── utils/                      # Utility function tests
└── multigpu/                       # Multi-GPU tests
    ├── custom_ops/                 # Multi-GPU custom ops
    ├── smoke/                      # Multi-GPU E2E mini tests
    └── transformations/            # Multi-GPU transformation tests

CI Registration

Tests are automatically run in CI once registered. New test files and functions are picked up automatically if they are in an existing registered folder.

Tests are registered in tests/integration/test_lists/test-db/l0_*.yml files under the backend: autodeploy section:

backend: autodeploy
tests:
- unittest/auto_deploy/singlegpu/compile
- unittest/auto_deploy/singlegpu/custom_ops
- unittest/auto_deploy/singlegpu/models
- unittest/auto_deploy/singlegpu/shim
- unittest/auto_deploy/singlegpu/smoke
- unittest/auto_deploy/singlegpu/transformations
- unittest/auto_deploy/singlegpu/utils

Adding a New Folder

If you create a new folder (not just a new file in an existing folder), you must register it in the appropriate YAML files:

  1. Edit tests/integration/test_lists/test-db/l0_a30.yml (and other GPU-specific files as needed)
  2. Add the new folder path under the backend: autodeploy section
  3. Example: - unittest/auto_deploy/singlegpu/my_new_folder

Parallel Execution

Most unit tests run in parallel using pytest-xdist for faster execution. The exception is the smoke/ subfolders, which run sequentially (see E2E Mini Tests below).

E2E Mini Tests (Smoke Tests)

E2E mini tests verify complete end-to-end workflows including model compilation and prompt execution for unique model combinations.

Location

  • Single GPU: tests/unittest/auto_deploy/singlegpu/smoke/
  • Multi GPU: tests/unittest/auto_deploy/multigpu/smoke/

Purpose

These tests ensure that the full AutoDeploy pipeline works correctly for various model architectures and configurations:

  • test_ad_build_small_single.py - Tests multiple model configurations (Llama, Mixtral, Qwen, Phi-3, DeepSeek, Mistral, Nemotron)
  • test_ad_trtllm_bench.py - Benchmarking functionality
  • test_ad_trtllm_serve.py - Serving functionality
  • test_ad_speculative_decoding.py - Speculative decoding
  • test_ad_export_onnx.py - ONNX export functionality

Execution

Smoke tests are not executed in parallel to avoid resource contention during full model compilation and execution. They run sequentially within the CI pipeline.

Integration Tests

Integration tests cover important accuracy tests and other scenarios that require explicit CI registration.

Registration

Unlike unit tests (where new files in existing folders are auto-discovered), each individual integration test case must be explicitly registered in the CI YAML files.

Format: path/to/test_file.py::test_function_name[param_id]

Example from l0_a30.yml:

- accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_medusa_fp8_prequantized
- examples/test_multimodal.py::test_llm_multimodal_general[Qwen2-VL-7B-Instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:4]

Example: Adding an Accuracy Test

For reference, see PR #10717 which added a Nemotron 3 super accuracy test. The workflow is:

  1. Create the test function in the appropriate test file
  2. Register the specific test case in the relevant l0_*.yml file(s)
  3. Ensure the test passes locally before submitting

Location

Integration tests are typically located in:

  • examples/ - Model-specific integration tests
  • accuracy/ - Accuracy validation tests

Dashboard (Model Coverage Testing)

The dashboard provides broad model coverage and performance testing for all supported models in AutoDeploy.

Model Registry

Models are registered in examples/auto_deploy/model_registry/models.yaml. For detailed instructions, see the Model Registry README.

Format (Version 2.0)

The registry uses a flat list format with composable configurations:

version: '2.0'
description: AutoDeploy Model Registry - Flat format with composable configs
models:
- name: meta-llama/Llama-3.1-8B-Instruct
  yaml_extra: [dashboard_default.yaml, world_size_2.yaml]

- name: meta-llama/Llama-3.3-70B-Instruct
  yaml_extra: [dashboard_default.yaml, world_size_4.yaml, llama3_3_70b.yaml]

Key Concepts

  • Flat list: Models are in a single list (not grouped)
  • Composable configs: Each model references YAML config files via yaml_extra
  • Deep merging: Config files are merged in order (later files override earlier ones)

Configuration Files

Config files are stored in examples/auto_deploy/model_registry/configs/:

File Purpose
dashboard_default.yaml Baseline settings for all models
world_size_N.yaml GPU count configuration (1, 2, 4, or 8)
multimodal.yaml Vision + text models
demollm_triton.yaml DemoLLM runtime with Triton backend
Model-specific configs Custom settings for specific models

World Size Guidelines

World Size Model Size Range Example Models
1 < 2B params TinyLlama, Qwen 0.5B, Phi-4-mini
2 2-15B params Llama 3.1 8B, Qwen 7B, Mistral 7B
4 20-80B params Llama 3.3 70B, QwQ 32B, Gemma 27B
8 80B+ params DeepSeek V3, Llama 405B, Nemotron Ultra

Adding a New Model

  1. Add the model entry to models.yaml:
- name: organization/my-new-model-7b
  yaml_extra: [dashboard_default.yaml, world_size_2.yaml]
  1. For models with special requirements, create a custom config in configs/ and reference it:
- name: organization/my-custom-model
  yaml_extra: [dashboard_default.yaml, world_size_4.yaml, my_model.yaml]
  1. Validate with prepare_model_coverage_v2.py from the autodeploy-dashboard repository

The model will be automatically picked up by the dashboard testing infrastructure on the next run.