-
Notifications
You must be signed in to change notification settings - Fork 47
feat: add parameter sweeping support #699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Lokiiiiii
wants to merge
49
commits into
ai-dynamo:main
Choose a base branch
from
Lokiiiiii:param-sweep
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 41 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
052264b
feat: Add parameter sweeping functionality
Lokiiiiii a7987ad
refactor: make parameter sweep feature parameter-agnostic
Lokiiiiii cf0de56
test: add integration test for sweep-only mode without confidence
Lokiiiiii 4103ddb
fix: add validation for single-element concurrency lists
Lokiiiiii 7133a2a
fix: address critical review findings
Lokiiiiii e2a5575
chore: add .kiro/ to .gitignore
Lokiiiiii 8287926
refactor: address nitpick review comments
Lokiiiiii de66597
docs: fix markdown formatting and endpoint type in parameter sweep docs
Lokiiiiii dc45565
fix: address final review comments
Lokiiiiii 76d3eaa
Refactor parameter sweep aggregation and remove plot integration tests
Lokiiiiii 0e710c3
docs: update sweep documentation to match refactored aggregation
Lokiiiiii 6e07fd4
refactor: move aggregation logic from CLI runner to orchestrator
Lokiiiiii a527fd4
refactor: move export logic to orchestrator for consistency
Lokiiiiii f8df734
test: update cli_runner tests for orchestrator refactoring
Lokiiiiii 5f3b931
fix: update integration tests for parameter sweep API changes
Lokiiiiii 9b00249
docs: update parameter sweep and multi-run documentation
Lokiiiiii 6222f08
refactor: fix code quality issues in parameter sweep code
Lokiiiiii d13f03a
fix: address code review findings for parameter sweep
Lokiiiiii 2954077
refactor: improve type safety and performance in exporters
Lokiiiiii 4ddfc6f
fix: code review fixes - test naming, type annotations, and numpy ser…
Lokiiiiii 7c773c4
docs: fix docstring inconsistency in sweep JSON exporter
Lokiiiiii 9dfe295
refactor: rename test functions in TestConcurrencyListParsing to foll…
Lokiiiiii 243419a
refactor(tests): rename test functions in test_sweep_exporters.py
Lokiiiiii 8eae540
refactor(tests): use _is_run_directory for filtering in test_mode_det…
Lokiiiiii 82932dc
test: add unit tests for orchestrator aggregation methods
Lokiiiiii 4e008f3
test: rename 4 tests to be more explicit about expected outcomes
Lokiiiiii 33e78c5
fix: reject non-integer floats in concurrency list to prevent silent …
Lokiiiiii c86e95c
docs(tests): clarify sweep parameter validation tests document defaul…
Lokiiiiii 4b1459a
docs: fix incorrect Pareto dominance explanation in parameter sweepin…
Lokiiiiii 4a20cd6
docs: clarify sweep aggregate directory structure for both modes
Lokiiiiii b333f3a
refactor: remove redundant 'what' comments in orchestrator
Lokiiiiii b00c6bc
test: add unit tests for _compute_sweep_aggregates method
Lokiiiiii c0e770a
docs: update coverage tracking with sweep aggregation test results
Lokiiiiii 25df4b4
docs: mark Phase 1 item 1 as completed
Lokiiiiii d5cfe88
refactor: remove commented-out trend analysis tests
Lokiiiiii 5869b64
test: remove skipped integration tests from unit test file
Lokiiiiii cdb8855
docs: fix incorrect JSON structure in parameter sweeping tutorial
Lokiiiiii b566306
docs: complete Pareto optimal documentation update
Lokiiiiii fb211e4
test: fix misleading test names in sweep parameter validators
Lokiiiiii 9081483
fix: use user config for confidence strategy parameters
Lokiiiiii 518e4a8
fix: handle None return from _compute_sweep_aggregates
Lokiiiiii 2055eb7
Fix hardcoded 'concurrency' breaking other sweep types
Lokiiiiii 920598c
Improve test naming convention in parameter sweep properties
Lokiiiiii 20f756c
Add language identifiers to fenced code blocks in parameter-sweeping.md
Lokiiiiii c088be7
Remove empty test class TestProperty14PBT
Lokiiiiii 615c708
Fix random.randint breaking Hypothesis reproducibility
Lokiiiiii 7a4ef1d
Update stale docstring notes about CLI/cyclopts
Lokiiiiii c681df0
Remove redundant 'what' comments in orchestrator
Lokiiiiii d6d9606
Add comprehensive unit tests for orchestrator execution logic
Lokiiiiii File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,6 +10,8 @@ htmlcov/ | |
| .pytest_cache/ | ||
| .ruff_cache/ | ||
| .mypy_cache/ | ||
| .hypothesis/ | ||
| .kiro/ | ||
| .env | ||
| .env.* | ||
| *.log | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,346 @@ | ||
| # Test Coverage Improvement Plan | ||
|
|
||
| ## Overview | ||
|
|
||
| This document tracks the effort to improve test coverage for the parameter sweep feature. The Codecov report identified 268 lines missing coverage across 6 files. | ||
|
|
||
| **Current Status**: 75% overall coverage, 49% for orchestrator.py (the main gap) | ||
|
|
||
| **Goal**: Achieve 85%+ coverage for all parameter sweep related files | ||
|
|
||
| --- | ||
|
|
||
| ## Coverage Summary by File | ||
|
|
||
| | File | Current Coverage | Missing Lines | Priority | Change | | ||
| |------|-----------------|---------------|----------|--------| | ||
| | orchestrator.py | 63% | 153 miss + 21 partial | HIGH | +14% ⬆️ | | ||
| | cli_runner.py | 75% | 28 miss + 9 partial | MEDIUM | No change | | ||
| | aggregate_sweep_csv_exporter.py | 88% | 6 miss + 6 partial | LOW | No change | | ||
| | sweep.py | 96% | 3 miss + 2 partial | LOW | No change | | ||
| | loadgen_config.py | 94% | 7 miss + 6 partial | LOW | -1% ⬇️ | | ||
| | user_config.py | 92% | 24 miss + 22 partial | LOW | No change | | ||
|
|
||
| **Overall Progress**: orchestrator.py improved from 49% → 63% (+14%) due to new aggregation and sweep aggregation unit tests. | ||
|
|
||
| --- | ||
|
|
||
| ## Priority 1: orchestrator.py (49% → 85% target) | ||
|
|
||
| **File**: `src/aiperf/orchestrator/orchestrator.py` | ||
| **Missing**: 221 lines + 17 partial branches | ||
| **Current Tests**: Integration tests only, minimal unit tests | ||
|
|
||
| ### Missing Coverage Analysis | ||
|
|
||
| #### 1. Aggregation and Export Methods (Lines 80-157) | ||
| **Lines**: 80-85, 97-109, 124-157 | ||
| **Functions**: `_aggregate_and_export`, `_export_confidence_aggregate`, `_export_sweep_aggregates` | ||
| **Reason**: These are only tested via integration tests | ||
| **Test Needed**: Unit tests for: | ||
| - Confidence-only aggregation export | ||
| - Sweep aggregation export | ||
| - Mixed sweep + confidence export | ||
| - Error handling when aggregation fails | ||
|
|
||
| #### 2. Sweep Execution Logic (Lines 172-303) | ||
| **Lines**: 172-303 | ||
| **Functions**: `_execute_parameter_sweep`, `_collect_failed_sweep_values` | ||
| **Reason**: Complex sweep orchestration logic not unit tested | ||
| **Test Needed**: Unit tests for: | ||
| - Parameter sweep execution with different strategies | ||
| - Failed run collection and reporting | ||
| - Sweep value iteration | ||
| - Error handling during sweep execution | ||
|
|
||
| #### 3. Per-Value Export Logic (Lines 329-383) | ||
| **Lines**: 329-357, 371-383 | ||
| **Functions**: `_export_per_value_aggregates`, `export_all_per_value` | ||
| **Reason**: Async export logic not unit tested | ||
| **Test Needed**: Unit tests for: | ||
| - Per-value aggregate export | ||
| - Directory creation | ||
| - Concurrent export handling | ||
| - Export failure scenarios | ||
|
|
||
| #### 4. Result Aggregation (Lines 400-433) | ||
| **Lines**: 400-402, 422-433 | ||
| **Functions**: `aggregate_results` | ||
| **Reason**: Aggregation routing logic not fully tested | ||
| **Test Needed**: Unit tests for: | ||
| - Confidence-only aggregation | ||
| - Sweep-only aggregation | ||
| - Mixed mode aggregation | ||
| - Empty results handling | ||
|
|
||
| #### 5. Strategy Execution (Lines 499-635) | ||
| **Lines**: 499-505, 538-635 | ||
| **Functions**: `_execute_with_strategy`, `_execute_single_run` | ||
| **Reason**: Core execution logic tested via integration only | ||
| **Test Needed**: Unit tests for: | ||
| - Strategy execution with mocks | ||
| - Single run execution | ||
| - Error handling | ||
| - Result collection | ||
|
|
||
| #### 6. Sweep Aggregation Computation (Lines 662-760) | ||
| **Lines**: 662-760 | ||
| **Functions**: `_compute_sweep_aggregate` | ||
| **Reason**: Complex aggregation logic not unit tested | ||
| **Test Needed**: Unit tests for: | ||
| - Sweep aggregate computation | ||
| - Per-value aggregate grouping | ||
| - Best configuration identification | ||
| - Pareto optimal calculation | ||
|
|
||
| ### Test Files to Create/Enhance | ||
|
|
||
| 1. **tests/unit/orchestrator/test_orchestrator_aggregation.py** ✅ **COMPLETED** | ||
| - Test `_aggregate_and_export` routing logic ✅ | ||
| - Test `_export_confidence_aggregate` ✅ | ||
| - Test `_export_sweep_aggregates` ✅ | ||
| - Test per-value aggregate export (within sweep export) ✅ | ||
| - **Status**: All 14 tests passing | ||
| - **Commit**: 82932dc6 (original), verified passing | ||
|
|
||
| 2. **tests/unit/orchestrator/test_orchestrator_sweep_execution.py** (NEW) | ||
| - Test `_execute_parameter_sweep` | ||
| - Test `_collect_failed_sweep_values` | ||
| - Test sweep iteration logic | ||
| - Test error handling | ||
|
|
||
| 3. **tests/unit/orchestrator/test_orchestrator.py** (ENHANCE) | ||
| - Add tests for `aggregate_results` | ||
| - Add tests for `_execute_with_strategy` | ||
| - Add tests for `_execute_single_run` | ||
|
|
||
| 4. **tests/unit/orchestrator/test_orchestrator_sweep_aggregation.py** ✅ **COMPLETED** | ||
| - Test `_compute_sweep_aggregate` ✅ | ||
| - Test per-value grouping ✅ | ||
| - Test best configuration logic ✅ | ||
| - Test Pareto optimal calculation ✅ | ||
| - **Status**: All 9 tests passing, coverage improved +1% | ||
| - **Commit**: b00c6bc5 | ||
|
|
||
| --- | ||
|
|
||
| ## Priority 2: cli_runner.py (75% → 90% target) | ||
|
|
||
| **File**: `src/aiperf/cli_runner.py` | ||
| **Missing**: 28 lines + 9 partial branches | ||
| **Current Tests**: `tests/unit/test_cli_runner_aggregation.py` exists but incomplete | ||
|
|
||
| ### Missing Coverage Analysis | ||
|
|
||
| #### Lines 116-118, 135-146 | ||
| **Function**: Sweep aggregation setup in `run_benchmark` | ||
| **Reason**: Sweep-specific CLI logic not fully tested | ||
| **Test Needed**: Unit tests for: | ||
| - Sweep mode detection | ||
| - Sweep aggregation configuration | ||
| - Sweep-specific validation | ||
|
|
||
| #### Lines 197, 206, 219-229 | ||
| **Function**: Result aggregation and export | ||
| **Reason**: Aggregation routing not fully tested | ||
| **Test Needed**: Unit tests for: | ||
| - Confidence-only aggregation | ||
| - Sweep aggregation | ||
| - Mixed mode aggregation | ||
| - Export path handling | ||
|
|
||
| #### Lines 266-270 | ||
| **Function**: Error handling | ||
| **Reason**: Error paths not tested | ||
| **Test Needed**: Unit tests for: | ||
| - Aggregation failures | ||
| - Export failures | ||
| - Invalid configuration handling | ||
|
|
||
| ### Test Files to Enhance | ||
|
|
||
| 1. **tests/unit/test_cli_runner_aggregation.py** (ENHANCE) | ||
| - Add sweep mode tests | ||
| - Add aggregation routing tests | ||
| - Add error handling tests | ||
|
|
||
| --- | ||
|
|
||
| ## Priority 3: aggregate_sweep_csv_exporter.py (88% → 95% target) | ||
|
|
||
| **File**: `src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py` | ||
| **Missing**: 6 lines + 6 partial branches | ||
| **Current Tests**: `tests/unit/exporters/aggregate/test_sweep_exporters.py` exists | ||
|
|
||
| ### Missing Coverage Analysis | ||
|
|
||
| #### Lines 63, 102, 171, 174, 176, 178 | ||
| **Functions**: Edge cases in CSV formatting | ||
| **Reason**: Specific formatting edge cases not tested | ||
| **Test Needed**: Unit tests for: | ||
| - Empty metric values | ||
| - Missing units | ||
| - Null/None handling | ||
| - Special characters in parameter names | ||
|
|
||
| ### Test Files to Enhance | ||
|
|
||
| 1. **tests/unit/exporters/aggregate/test_sweep_exporters.py** (ENHANCE) | ||
| - Add edge case tests for CSV formatting | ||
| - Add tests for missing/null values | ||
| - Add tests for special characters | ||
|
|
||
| --- | ||
|
|
||
| ## Priority 4: sweep.py (96% → 98% target) | ||
|
|
||
| **File**: `src/aiperf/orchestrator/aggregation/sweep.py` | ||
| **Missing**: 3 lines + 2 partial branches | ||
| **Current Tests**: `tests/unit/orchestrator/aggregation/test_sweep.py` exists | ||
|
|
||
| ### Missing Coverage Analysis | ||
|
|
||
| #### Lines 233, 268-271 | ||
| **Functions**: Edge cases in Pareto optimal calculation | ||
| **Reason**: Specific edge cases not tested | ||
| **Test Needed**: Unit tests for: | ||
| - Empty metrics handling | ||
| - Single point Pareto optimal | ||
| - All points Pareto optimal | ||
|
|
||
| ### Test Files to Enhance | ||
|
|
||
| 1. **tests/unit/orchestrator/aggregation/test_sweep.py** (ENHANCE) | ||
| - Add Pareto optimal edge case tests | ||
|
|
||
| --- | ||
|
|
||
| ## Priority 5: loadgen_config.py (95% → 98% target) | ||
|
|
||
| **File**: `src/aiperf/common/config/loadgen_config.py` | ||
| **Missing**: 6 lines + 5 partial branches | ||
|
|
||
| ### Missing Coverage Analysis | ||
|
|
||
| #### Lines 110, 134, 700, 719-722, 758 | ||
| **Functions**: Validation edge cases | ||
| **Reason**: Specific validation paths not tested | ||
| **Test Needed**: Unit tests for: | ||
| - Edge cases in concurrency list parsing | ||
| - Validation error messages | ||
| - Boundary conditions | ||
|
|
||
| ### Test Files to Enhance | ||
|
|
||
| 1. **tests/unit/common/config/test_loadgen_config_validators.py** (ENHANCE) | ||
| - Add edge case validation tests | ||
|
|
||
| --- | ||
|
|
||
| ## Priority 6: user_config.py (92% → 95% target) | ||
|
|
||
| **File**: `src/aiperf/common/config/user_config.py` | ||
| **Missing**: 24 lines + 22 partial branches | ||
|
|
||
| ### Missing Coverage Analysis | ||
|
|
||
| **Lines**: Various validation and edge case paths | ||
| **Reason**: Complex configuration validation not fully tested | ||
| **Test Needed**: Unit tests for: | ||
| - Sweep-specific validation | ||
| - Configuration edge cases | ||
| - Error message formatting | ||
|
|
||
| ### Test Files to Enhance | ||
|
|
||
| 1. **tests/unit/common/config/test_user_config.py** (ENHANCE) | ||
| - Add sweep validation tests | ||
| - Add edge case tests | ||
|
|
||
| --- | ||
|
|
||
| ## Implementation Plan | ||
|
|
||
| ### Phase 1: High Priority (orchestrator.py) | ||
| **Goal**: 49% → 70% coverage | ||
| **Estimated Effort**: 3-4 hours | ||
| **Tasks**: | ||
| 1. Create `test_orchestrator_aggregation.py` - Test export methods | ||
| 2. Create `test_orchestrator_sweep_execution.py` - Test sweep execution | ||
| 3. Enhance `test_orchestrator.py` - Test core methods | ||
|
|
||
| ### Phase 2: Medium Priority (cli_runner.py) | ||
| **Goal**: 75% → 90% coverage | ||
| **Estimated Effort**: 1-2 hours | ||
| **Tasks**: | ||
| 1. Enhance `test_cli_runner_aggregation.py` - Add sweep mode tests | ||
|
|
||
| ### Phase 3: Low Priority (Remaining Files) | ||
| **Goal**: Bring all files to 95%+ coverage | ||
| **Estimated Effort**: 2-3 hours | ||
| **Tasks**: | ||
| 1. Add edge case tests to existing test files | ||
| 2. Add error handling tests | ||
| 3. Add boundary condition tests | ||
|
|
||
| --- | ||
|
|
||
| ## Testing Strategy | ||
|
|
||
| ### Unit Test Principles | ||
| 1. **Mock External Dependencies**: Use mocks for file I/O, async operations | ||
| 2. **Test One Thing**: Each test should verify one specific behavior | ||
| 3. **Use Fixtures**: Create reusable test data fixtures | ||
| 4. **Test Edge Cases**: Focus on error paths and boundary conditions | ||
| 5. **Fast Execution**: Unit tests should run in milliseconds | ||
|
|
||
| ### Coverage Goals | ||
| - **Critical Paths**: 100% coverage (main execution flows) | ||
| - **Error Handling**: 90%+ coverage (error paths) | ||
| - **Edge Cases**: 85%+ coverage (boundary conditions) | ||
| - **Overall**: 85%+ coverage per file | ||
|
|
||
| --- | ||
|
|
||
| ## Progress Tracking | ||
|
|
||
| ### Completed | ||
| - ✅ Coverage analysis complete | ||
| - ✅ Tracking document created | ||
| - ✅ Test plan defined | ||
| - ✅ Phase 1 started: orchestrator.py unit tests (49% → 64%) | ||
| - ✅ test_orchestrator_aggregation.py created (14 tests) | ||
| - ✅ Aggregation routing tests | ||
| - ✅ Export method tests | ||
| - ✅ Failed sweep value collection tests | ||
| - ✅ Test naming convention improvements (commit 4e008f3d) | ||
|
|
||
| ### In Progress | ||
| - ⏳ Phase 1: orchestrator.py unit tests (target: 70%+) | ||
| - Need: Sweep execution tests (_execute_parameter_sweep) | ||
| - Need: Strategy execution tests (_execute_with_strategy, _execute_single_run) | ||
| - Need: Sweep aggregation computation tests (_compute_sweep_aggregate) | ||
|
|
||
| ### Todo | ||
| - ⬜ Phase 1: Complete remaining orchestrator.py tests | ||
| - ⬜ Phase 2: cli_runner.py unit tests | ||
| - ⬜ Phase 3: Edge case tests for remaining files | ||
| - ⬜ Final coverage verification | ||
|
|
||
| --- | ||
|
|
||
| ## Notes | ||
|
|
||
| - Integration tests provide good end-to-end coverage but don't catch edge cases | ||
| - Unit tests are needed for error handling, edge cases, and boundary conditions | ||
| - Focus on testing the "unhappy paths" that integration tests don't cover | ||
| - Use mocks extensively to isolate units under test | ||
| - Aim for fast, focused tests that can run in CI | ||
|
|
||
| --- | ||
|
|
||
| ## References | ||
|
|
||
| - Codecov Report: https://codecov.io/gh/NVIDIA/aiperf | ||
| - Integration Tests: `tests/integration/test_parameter_sweep.py` | ||
| - Existing Unit Tests: `tests/unit/orchestrator/`, `tests/unit/exporters/aggregate/` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hyphenate compound modifier.
Use “parameter-sweep” (or “parameter‑sweep”) before “related files” for correct compound-modifier usage.
🧰 Tools
🪛 LanguageTool
[grammar] ~9-~9: Use a hyphen to join words.
Context: ...ve 85%+ coverage for all parameter sweep related files --- ## Coverage Summary ...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents