This document describes the testing strategy for AutoDeploy, covering the multi-tiered approach used to ensure quality and reliability.
AutoDeploy uses a multi-tiered testing approach that balances fast feedback with comprehensive coverage:
┌─────────────────────────────────────────────────────────┐
│ Dashboard │
│ (Broad model coverage + performance) │
├─────────────────────────────────────────────────────────┤
│ Integration Tests │
│ (Accuracy tests, CI-registered) │
├─────────────────────────────────────────────────────────┤
│ E2E Mini Tests │
│ (Compile + prompt workflows) │
├─────────────────────────────────────────────────────────┤
│ Unit Tests │
│ (Component testing: patches, transforms, etc.) │
└─────────────────────────────────────────────────────────┘
- Unit Tests: Fast, isolated tests for individual components (patches, transforms, custom ops)
- E2E Mini Tests: End-to-end workflows testing compile + prompt for unique model combinations
- Integration Tests: Important accuracy tests registered individually in CI
- Dashboard: Broad model coverage and performance testing across all supported models
Unit tests verify individual components like patches, transformations, custom operations, and utilities.
All unit tests are located in tests/unittest/auto_deploy/:
tests/unittest/auto_deploy/
├── _utils_test/ # Shared test utilities
├── singlegpu/ # Single GPU tests
│ ├── compile/ # Compilation tests
│ ├── custom_ops/ # Custom operations tests
│ ├── models/ # Model-specific patch tests
│ ├── shim/ # Executor/engine tests
│ ├── smoke/ # E2E mini tests (see below)
│ ├── transformations/ # Graph transformation tests
│ └── utils/ # Utility function tests
└── multigpu/ # Multi-GPU tests
├── custom_ops/ # Multi-GPU custom ops
├── smoke/ # Multi-GPU E2E mini tests
└── transformations/ # Multi-GPU transformation tests
Tests are automatically run in CI once registered. New test files and functions are picked up automatically if they are in an existing registered folder.
Tests are registered in tests/integration/test_lists/test-db/l0_*.yml files under the backend: autodeploy section:
backend: autodeploy
tests:
- unittest/auto_deploy/singlegpu/compile
- unittest/auto_deploy/singlegpu/custom_ops
- unittest/auto_deploy/singlegpu/models
- unittest/auto_deploy/singlegpu/shim
- unittest/auto_deploy/singlegpu/smoke
- unittest/auto_deploy/singlegpu/transformations
- unittest/auto_deploy/singlegpu/utilsIf you create a new folder (not just a new file in an existing folder), you must register it in the appropriate YAML files:
- Edit
tests/integration/test_lists/test-db/l0_a30.yml(and other GPU-specific files as needed) - Add the new folder path under the
backend: autodeploysection - Example:
- unittest/auto_deploy/singlegpu/my_new_folder
Most unit tests run in parallel using pytest-xdist for faster execution. The exception is the smoke/ subfolders, which run sequentially (see E2E Mini Tests below).
E2E mini tests verify complete end-to-end workflows including model compilation and prompt execution for unique model combinations.
- Single GPU:
tests/unittest/auto_deploy/singlegpu/smoke/ - Multi GPU:
tests/unittest/auto_deploy/multigpu/smoke/
These tests ensure that the full AutoDeploy pipeline works correctly for various model architectures and configurations:
test_ad_build_small_single.py- Tests multiple model configurations (Llama, Mixtral, Qwen, Phi-3, DeepSeek, Mistral, Nemotron)test_ad_trtllm_bench.py- Benchmarking functionalitytest_ad_trtllm_serve.py- Serving functionalitytest_ad_speculative_decoding.py- Speculative decodingtest_ad_export_onnx.py- ONNX export functionality
Smoke tests are not executed in parallel to avoid resource contention during full model compilation and execution. They run sequentially within the CI pipeline.
Integration tests cover important accuracy tests and other scenarios that require explicit CI registration.
Unlike unit tests (where new files in existing folders are auto-discovered), each individual integration test case must be explicitly registered in the CI YAML files.
Format: path/to/test_file.py::test_function_name[param_id]
Example from l0_a30.yml:
- accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_medusa_fp8_prequantized
- examples/test_multimodal.py::test_llm_multimodal_general[Qwen2-VL-7B-Instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:4]For reference, see PR #10717 which added a Nemotron 3 super accuracy test. The workflow is:
- Create the test function in the appropriate test file
- Register the specific test case in the relevant
l0_*.ymlfile(s) - Ensure the test passes locally before submitting
Integration tests are typically located in:
examples/- Model-specific integration testsaccuracy/- Accuracy validation tests
The dashboard provides broad model coverage and performance testing for all supported models in AutoDeploy.
Models are registered in examples/auto_deploy/model_registry/models.yaml. For detailed instructions, see the Model Registry README.
The registry uses a flat list format with composable configurations:
version: '2.0'
description: AutoDeploy Model Registry - Flat format with composable configs
models:
- name: meta-llama/Llama-3.1-8B-Instruct
yaml_extra: [dashboard_default.yaml, world_size_2.yaml]
- name: meta-llama/Llama-3.3-70B-Instruct
yaml_extra: [dashboard_default.yaml, world_size_4.yaml, llama3_3_70b.yaml]- Flat list: Models are in a single list (not grouped)
- Composable configs: Each model references YAML config files via
yaml_extra - Deep merging: Config files are merged in order (later files override earlier ones)
Config files are stored in examples/auto_deploy/model_registry/configs/:
| File | Purpose |
|---|---|
dashboard_default.yaml |
Baseline settings for all models |
world_size_N.yaml |
GPU count configuration (1, 2, 4, or 8) |
multimodal.yaml |
Vision + text models |
demollm_triton.yaml |
DemoLLM runtime with Triton backend |
| Model-specific configs | Custom settings for specific models |
| World Size | Model Size Range | Example Models |
|---|---|---|
| 1 | < 2B params | TinyLlama, Qwen 0.5B, Phi-4-mini |
| 2 | 2-15B params | Llama 3.1 8B, Qwen 7B, Mistral 7B |
| 4 | 20-80B params | Llama 3.3 70B, QwQ 32B, Gemma 27B |
| 8 | 80B+ params | DeepSeek V3, Llama 405B, Nemotron Ultra |
- Add the model entry to
models.yaml:
- name: organization/my-new-model-7b
yaml_extra: [dashboard_default.yaml, world_size_2.yaml]- For models with special requirements, create a custom config in
configs/and reference it:
- name: organization/my-custom-model
yaml_extra: [dashboard_default.yaml, world_size_4.yaml, my_model.yaml]- Validate with
prepare_model_coverage_v2.pyfrom the autodeploy-dashboard repository
The model will be automatically picked up by the dashboard testing infrastructure on the next run.