12 Feb 19:08

a813bd6

Latest

Release v0.6.0

This release focuses on collector upgrades, new/updated performance datasets (H100/H200/B200/Blackwell), and more robust config generation + CI automation.

Highlights

Collector upgrades + compatibility (SGLang/VLLM)

SGLang non-wideep collector upgraded to 0.5.6 (compatible with 0.5.5) (#176)
VLLM bumped to 0.12.0 (#181)
VLLM MLA collector updated for v0.12.0 (#197)

New attention/MLA collection + fixes

Added MLA attention collectors for VLLM (#177)
Fixed 1.2.0rc5 MLA + all-reduce generation (#196)

Blackwell / B200 enablement + datasets

Non-wideep SGLang collector Blackwell support (#218)
Added B200 TRTLLM 1.2.0rc5 data (#202)
Added B200 SGLang 0.5.6.post2 (no wideep) data (#223)
Fixed head dimension handling when not collecting Blackwell data (#236)

Performance DB refresh (H100/H200) + data cleanup

Removed old 0.20.0 DB and added new data from 1.2.0rc5 (H100 & H200) (#198)
Added new performance data for VLLM 0.12.0 (H100 & H200) (#199)
Added new performance data for SGLang 0.5.6.post2 (#200, #201)
Cleaned incomplete/old datasets (VLLM 0.11.0, SGLang 0.5.1.post1, TRTLLM 1.2.0rc2) (#204)
Updated H200 SGLang DB (#235)

More reliable generation + automation

“Lowest latency under SLA” support (#182)
Config/task/perf DB made more error-proof (+ L40S custom all-reduce data) (#183)
Added hf_token support in generated configs (#230)
Auto-download DeepSeek-V3 config from HuggingFace (#227)
CI: improved daily support matrix workflow automation/comparisons (#247)
Added cherry-pick workflow (#205)
Cherry-pick: add k8s_hf_home option (#305)

What's Changed

🚀 Features & Improvements
Upgrade SGLang non-wideep collector to 0.5.6 (compatible with 0.5.5) (#176)
Rename and simplify power-law functions for DeepEP MoE (#174)
Add MLA attention collectors for VLLM (#177)
Bump VLLM to 0.12.0 (#181)
Support “lowest latency under SLA” (#182)
Support 1-GPU collector (#185)
Make perf DB and task config more error-proof; add L40S SGLang custom all-reduce data (#183)
Delete 0.20.0 database and add new data from 1.2.0rc5 (H100 & H200) (#198)
Add new performance data for VLLM 0.12.0 (H100 & H200) (#199)
Add new performance data for SGLang 0.5.6.post2 (#200)
Add new data for SGLang 0.5.6.post2 on H200 (#201)
Make VLLM MLA collector compatible with v0.12.0 (#197)
Add B200 TRTLLM 1.2.0rc5 data (#202)
Refactor wideep collectors for collect.py framework with multiprocess support (#188)
Create cherry-pick.yml (#205)
SGLang non-wideep collector: Blackwell support (#218)
Add B200 SGLang 0.5.6.post2 data without wideep (#223)
Refactor tests and add marks for better management (#224)
Add hf_token support in AIC generated config (#230)
Collector: auto-download DeepSeek-V3 config from HuggingFace (#227)
CI: update daily support matrix workflow to enhance automation and comparison features (#247)
Cherry-pick: add k8s_hf_home option (#305)

🐛 Bug Fixes
Fix FP8 block GEMM collector (#171)
Use TTFT to filter prefill candidates (#169)
MoE args and workload distribution fallback (#168)
Delete wideep MLP for SGLang; improve DB/op query returns; fix collector repeat handling (#170)
Update DeepEP interface for SGLang 0.5.6+ compatibility (#172)
Use model_family for checks instead of model_name (#186)
Fix broken SGLang wideep deepseek path (#195)
Fix 1.2.0rc5 MLA and all-reduce generation (#196)
Delete incomplete data for VLLM 0.11.0, SGLang 0.5.1.post1, TRTLLM 1.2.0rc2 (#204)
Fix config generator missing MoE parallel config when using huggingface_id (#193)
Fix eval FileNotFoundError for service_mode=disagg output path (#194)
Add common code owners to avoid blocking merge (#225)
Update copyright date to 2025–2026 (#220)
Remove nvfp4 shape restriction (#221)
Fix automation pipeline bug (#217)
Fix ISL=1 and smaller local heads (#222)
Support matrix: update CSV + fix daily workflow (#226)
Default cache_transceiver_config.backend to DEFAULT (#231)
AIC eval: support replica > 1 (#234)
Include --max-model-len and --max-num-batched-tokens in VLLM run.sh (#238)
Update H200 SGLang database (#235)
Fix config generator for multiple replicas (#232)
Improve generator MoE parallelism for different backend (#237)
Add generator doc (#241)
Enable hybrid TP/DP/EP mode in wideep SGLang (#229)
Add w4a16_mxfp4 MoE data and set proper moe_quant_mode default for gpt-oss (#240)
Correct v_head_dim and head_dim_total when not collecting data for Blackwell (#236)
Fix multinode disagg config generator for GB200 (#242)
Fix TRTLLM tp=moe_tp × moe_ep behavior (#248)
CI: use self-hosted runners to avoid GitHub runner OOM (#252)
Add SGLang enable-mix-chunk for generator (#257)
Fix SGLang enable mixed chunk (#258)
Support matrix update (#270)
Update generator doc + allow graceful CLI exit when lacking DB data (#286)
Align generator run script with dynamo 0.8.0 (#283)
Use nixl as default disagg transfer backend for SGLang 0.5.6.post2 + allow CLI override (#287)
Fix VLLM/SGLang k8s template missing k8s_model_cache param (#285)
Move PVC support from frontend to workers for SGLang backend (#292)
Docs/guide updates on dynamo deployment + remove dynamoNamespace field (#300, #299)
Handle SGLang L40S missing data gracefully ([#306](https://github.co...

Contributors

panpan0000, Elaine4CY, and hhzhang16

Assets 2

21 Jan 21:14

nv-anants

v0.5.0.post0

1de1400

AIConfigurator Release v0.5.0.post0

AIConfigurator 0.5.0.post0

AIConfigurator 0.5.0.post0 is a patch release that updates container image compatibility and fixes copyright headers.

Release Highlights

This is a maintenance release for AIConfigurator 0.5.0 that ensures compatibility with Dynamo container image 0.8.0.

Changes

Dynamo Container Compatibility: Updated AIConfigurator 0.5.0 to use the matched Dynamo container image 0.8.0 (#262)
Copyright Update: Updated copyright date to 2025-2026 to pass CI checks (#264)

Full Changelog: v0.5.0...v0.5.0.post0

Assets 2

15 Jan 23:05

nv-anants

v0.5.0

f178c8a

AIConfigurator Release v0.5.0

AIConfigurator 0.5.0

AIConfigurator 0.5.0 brings significant performance optimizations, expands backend support for vLLM and SGLang, and introduces new modeling capabilities including Power Estimation and Power Law workload distribution. This release also adds comprehensive support matrix testing.

Release Highlights

This version focuses on performance efficiency with optimizations to the generation engine and database lookups. New hardware data support includes L40S for SGLang, and we have expanded MoE (Mixture of Experts) support to the vLLM backend. Additionally, users can now target End-to-End (E2E) latency and estimate power consumption.

Features and Improvements

1. Performance Optimizations

Engine Optimization: Optimized the implementation of run_generation and num_gpu lookups for faster execution (by @anish-shanbhag in #113, #114).
Efficient Data Handling: Replaced dataframes with dictionaries for batch operations in InferenceSummary generation and added caching for repeated queries to improve speed (by @anish-shanbhag in #115, #128).

2. New Modeling Capabilities

Power Estimation: Added support for estimating power consumption of configurations (by @kaim-eng in #153).
Workload Distribution: Introduced a 'power_law' option for workload distribution in the CLI and prefill modeling (by @xutizhou in #147, #134).
Hybrid Modeling: Added support for hybrid modeling scenarios (by @tianhaox in #125).
Latency Targets: Users can now set E2E latency as a target metric (by @tianhaox in #145).

3. Framework and Hardware Support

vLLM Support: Added MoE support for vLLM (by @ilyasher in #139) and generator support (by @Ethan-ES in #144).
SGLang Support: Added support for WideEP TP attention modeling (by @AichenF in #143), L40S data (non-WideEP) (by @venkywonka in #165), and generator support (by @Ethan-ES in #144).
DeepSeek: Replaced DeepSeek MLP with GEMM for better performance (by @AichenF in #155).

4. User Interface

Profiler UI: Introduced a new Profiler UI for better visualization and analysis (by @Harrilee in #117).
UI Updates: Relocated GPU cost references and updated profiling components (by @Harrilee in #167).

5. Build, CI and Test

Testing Framework: Added a comprehensive support matrix testing framework (by @Harrilee in #126).
Maintenance: Added a CODEOWNERS file for better repository management (by @Arsene12358 in #109).

Bug Fixes

SGLang Fixes: Addressed vulnerabilities in the collector (#108), aligned GEMM quantization methods (#122), and fixed attention collection for the regular path (#123).
MoE & Model Fixes: Fixed MoE memory issues and NVFP4 GEMM for TRT-LLM 1.x (#131), removed generation repeat attention (#148), and updated workload distribution logic for MoE/DeepSeek models (#146).
CLI & Compatibility: Fixed CLI for GB200 with TP > 4 (#137), improved Python compatibility by using Union instead of | (#158), and relaxed Pydantic requirements (#161, #162).
General Fixes: Fixed team name parsing (#130), updated custom_allreduce file locations (#156, #160), and removed PII from error stack traces (#166).

Documentation

Added design documentation for Power Law distribution (by @YijiaZhao in #119, #129).
Updated documentation to mention vLLM and SGLang support (by @jasonqinzhou in #159).

New Contributors

@xueh-nv made their first contribution in #133
@Harrilee made their first contribution in #117
@gangmuk made their first contribution in #158
@dmitry-tokarev-nv made their first contribution in #161
@venkywonka made their first contribution in #165
@kaim-eng made their first contribution in #153
@bcfre made their first contribution in #175

Full Changelog: v0.4.0...v0.5.0

Contributors

gangmuk, venkywonka, and 14 other contributors

Assets 3

24 Nov 17:01

saturley-hall

v0.4.0

3a4f56d

AIConfigurator Release v0.4.0

AIConfigurator 0.4.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments.AIConfigurator 0.4.0 adds extensive support for the SGLang backend, covering DeepSeek WideEP path and regular path with dense and MoE models support. We also added dense models support for vLLM backend. With this release, AIConfigurator now supports all 3 major backends: TensorRT-LLM, SGLang, and vLLM.

Release Highlights

AIConfigurator 0.4.0 significantly expands backend support, achieving coverage for all three major backends. This release introduces support for L40S GPUs, Qwen3 30B A3B MOE models, and direct HuggingFace model loading via --hf_id.

Additionally, it adds prefix cache modeling support to simulate workloads with system prompts or prefix cache hits, and unifies SGLang paths for better maintainability.

Features and Improvements

1. New Hardware Support

Added L40S support for TRT-LLM (by @ilyasher in #91)

2. Framework Support

Added SGLang attention collector (by @Atream in #73)
Enhanced allreduce data collector to enable data collection for vLLM backend (by @Arsene12358 in #87)
Added SGLang disagg support (by @jasonqinzhou in #84)
Added SGLang agg support (by @jasonqinzhou in #93)
Added vLLM disagg support (by @ilyasher in #89)
Added vLLM agg support (by @ilyasher in #98)
Unified SGLang WideEP and regular paths (by @tianhaox in #99)

3. Expanded Model Support

Supported using --hf_id as an alternative to --model (by @simone-chen in #86)
Added Qwen3 30B A3B MOE model support (by @jasonqinzhou in #58)

4. Modeling and Improvements

Added prefix length modeling support (by @tianhaox in #77)
Added version subcommand (by @jasonqinzhou in #72)

5. Build, CI and Test

Added linting and formatting with Ruff, created a developer guide (by @anish-shanbhag in #65)
Added A100 to e2e test (by @simone-chen in #64)

Bug Fixes

Added supported systems to CLI help (by @jasonqinzhou in #63)
Fixed MLP context state (by @AichenF in #78)
Moved Gradio to optional dependencies (by @Arsene12358 in #90)
Fixed LLAMA2_7B and LLAMA2_13B errors (by @ilyasher in #97)
Fixed webapp compatibility with SGLang and vLLM (by @tianhaox in #100)
Fixed collector minor problems (by @tianhaox in #101)
Enhanced log file collection with Path and error handling (by @xutizhou in #92)

Documentation

Updated README to include A100 SXM in support matrix (by @simone-chen in #62)
Added git lfs pull step before install from source code to download full data files (by @cr7258 in #69)
Added more A100 docs (by @jasonqinzhou in #67)

New Contributors

@cr7258 made their first contribution in #69
@anish-shanbhag made their first contribution in #65
@xutizhou made their first contribution in #92

Contributors

cr7258

Assets 2

24 Oct 18:03

saturley-hall

v0.3.0

8025c3b

AIConfigurator v0.3.0

AIConfigurator 0.3.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, GB200, B200, A100, or future hardware with the Dynamo backend.

Currently AIConfigurator supports NVIDIA TensorRT-LLM as the primary inference engine, with limited support for SGLang.

Release Highlights

AIConfigurator 0.3.0 introduces significant expansion in hardware support, framework compatibility, and model coverage. This release adds support for multiple new GPU architectures, introduces SGLang framework integration, and expands the model library with new Qwen3 variants and GPT-OSS models.

Features and Improvements

1. New Hardware Support

Added GB200 GPU support (by @YijiaZhao in #32)
Added B200 GPU support with TensorRT-LLM 1.0.0rc6 data (by @tianhaox in #36)
Added A100 GPU support (by @simone-chen in #55)

2. New Framework Support: SGLang and Wide-EP

Note: SGLang support is currently limited and experimental.

Added SGLang GEMM collector and performance data (by @Atream in #28)
Added SGLang MLA-BMM collector and performance data (by @Atream in #29)
Added SGLang MLA collector and performance data (by @Atream in #31)
Added SGLang fused MoE Triton collector (by @Atream in #39)
Added support for disaggregated DeepSeek in SGLang (by @AichenF in #54)

3. Expanded Model Support

Added several Qwen3 models (by @tianhaox in #30)
Added GPT-OSS support in AIConfigurator SDK (by @Arsene12358 in #56)

4. Configuration Generation and Evaluation

Refactored generator as a standalone module for improved modularity (by @Ethan-ES in #40)
Added new CLI and SDK support for presets in search space configuration (by @tianhaox in #44)
Added AIPerf integration for performance evaluation (by @Ethan-ES in #57)
Improved aggregated and disaggregated modeling and performance (by @tianhaox in #45)

5. Collector Improvements

Enhanced collector to support data collection for windowed attention and additional MoE configurations (by @Arsene12358 in #33)

Bug Fixes

Fixed LICENSE file (by @saturley-hall in #21)
Added allowed path workspace configuration (by @tianhaox in #23)
Updated MoE tuning logic (by @YijiaZhao in #19)
Updated Gradio version for compatibility (by @saturley-hall in #35)
Improved error handling for database loading failures (by @tianhaox in #37, #38)
Enhanced Kubernetes support with corresponding documentation (by @Ethan-ES in #50)
Changed NVIDIA SMI command from -lgc to -ac (by @LyleLuo in #49)
Excluded FP8 from MLA generation post-processing test cases for Ampere architecture (by @simone-chen in #52)
Fixed TensorRT-LLM 1.0.0 collector compatibility (by @tianhaox in #48)
Improved tensor initialization to occur directly on device (by @ilyasher in #51)
Enabled SDK tests in CI pipeline (by @ilyasher in #46)

Documentation

Added guidance for adding new models (by @tianhaox in #26)
Added NVIDIA SMI clock locking script to README (by @jasonqinzhou in #47)
Added git LFS pull step to installation instructions for downloading full data files (by @saturley-hall in #71)
Enhanced A100 documentation (by @saturley-hall in #70)

New Contributors

@Arsene12358 made their first contribution in #33
@ilyasher made their first contribution in #41
@biswapanda made their first contribution in #42
@LyleLuo made their first contribution in #49
@AichenF made their first contribution in #54

For the complete list of changes, see the full changelog.

Contributors

saturley-hall, biswapanda, and 10 other contributors

Assets 2

18 Sep 19:20

saturley-hall

v0.2.0

f3d7bba

AIConfigurator Release v0.2.0

AIConfigurator 0.2.0

Currently AIConfigurator supports NVIDIA TensorRT-LLM as inference engine.

Release Highlights

AIConfigurator 0.2.0 brings several new features, improvements, and important fixes to enhance configuration workflows and automation.

Features and Improvements

1. Automation

Added automation evaluation support (by @tianhaox in #5)

2. Collector improvement

Mix-of-Expert collector now supports autotuning for improved efficiency (by @YijiaZhao in #11)

3. Dynamo upgrade

Upgraded to Dynamo 0.5.0 (by @Ethan-ES in #13)

Bug Fixes

Switched to using torch flow collector and added more default memory configuration options (by @tianhaox in #7)
Improved performance alignment logic and reliability (by @tianhaox in #10)
Enhanced mixture-of-experts (MoE) support: added power law handling and improved solver calculation for generative attention (by @tianhaox in #15)
Added safe directory creation to mitigate security risk and clarified error handling (by @tianhaox in #16)

Documentation

Improved README (https://github.com/ai-dynamo/aiconfigurator/blob/main/README.md) for clarity and precision (by @nealvaidya in #9)

New Contributors

@nealvaidya made the first contribution in #9
@Ethan-ES made the first contribution in #13

For the complete list of changes, see the full changelog.

Contributors

nealvaidya, YijiaZhao, and 2 other contributors

Assets 2

28 Aug 00:16

saturley-hall

v0.1.1

1f004eb

v0.1.1

What's Changed

🚀 Features & Improvements

feat: feat: power_law_moe collector and webapp by @YijiaZhao in #2

🐛 Bug Fixes

fix: update project name, version, system data support matrix by @tianhaox in #3
fix: Harrison/fix spdx headers by @saturley-hall in #6

New Contributors

@YijiaZhao made their first contribution in #2
@tianhaox made their first contribution in #3
@saturley-hall made their first contribution in #6

Full Changelog: v0.1.0...v0.1.1

Contributors

saturley-hall, YijiaZhao, and tianhaox

Assets 3

12 Aug 19:10

saturley-hall

v0.1.0

efcae12

v0.1.0 Initial release of AIConfigurator

AIConfigurator is a tool designed for Dynamo to optimize disaggregated serving for generative AI models. It automatically finds optimal deployment configurations by searching thousands of candidates in tens of seconds, helping you achieve better throughput and latency in disaggregated serving.

Major Features

Automated Configuration Search: Search across thousands of deployment configurations to find optimal one of both disaggregated and aggregated system and do intelligent choice of disaggregated or aggregated deployment.
SLA-based Optimization: Optimize under TTFT (Time-To-First-Token) and TPOT (Time-Per-Output-Token) constraints to address throughput@latency problem
Dynamo Integration: Seamless integration with Dynamo by automatic generation of deployment configurations
Multi-framework Support: Compatible with NVIDIA TensorRT-LLM backend with extensible architecture for other frameworks (coming soon)

Model and System Support

Comprehensive Model Support:
- GPT
- LLAMA (2,3)
- MoE
- QWEN
- DEEPSEEK_V3
- NEMOTRON model families
System Support: H200 SXM and H100 SXM

User Interfaces

Command Line Interface (Suggested): Simple CLI with 3 basic arguments for quick start and configuration generation
Web Application: Interactive web interface for advanced configuration tuning and visualization

Assets 3

Releases: ai-dynamo/aiconfigurator

AIConfigurator Release v0.6.0

Release v0.6.0

Highlights

What's Changed

Contributors

Uh oh!

AIConfigurator Release v0.5.0.post0

AIConfigurator 0.5.0.post0

Release Highlights

Changes

Uh oh!

AIConfigurator Release v0.5.0

AIConfigurator 0.5.0

Release Highlights

Features and Improvements

1. Performance Optimizations

2. New Modeling Capabilities

3. Framework and Hardware Support

4. User Interface

5. Build, CI and Test

Bug Fixes

Documentation

New Contributors

Contributors

Uh oh!

AIConfigurator Release v0.4.0

AIConfigurator 0.4.0

Release Highlights

Features and Improvements

1. New Hardware Support

2. Framework Support

3. Expanded Model Support

4. Modeling and Improvements

5. Build, CI and Test

Bug Fixes

Documentation

New Contributors

Contributors

Uh oh!

AIConfigurator v0.3.0

AIConfigurator 0.3.0

Release Highlights

Features and Improvements

1. New Hardware Support

2. New Framework Support: SGLang and Wide-EP

3. Expanded Model Support

4. Configuration Generation and Evaluation

5. Collector Improvements

Bug Fixes

Documentation

New Contributors

Contributors

Uh oh!

AIConfigurator Release v0.2.0

AIConfigurator 0.2.0

Release Highlights

Features and Improvements

1. Automation

2. Collector improvement

3. Dynamo upgrade

Bug Fixes

Documentation

New Contributors

Contributors

Uh oh!

v0.1.1

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

New Contributors

Contributors

Uh oh!

v0.1.0 Initial release of AIConfigurator

Major Features

Model and System Support

User Interfaces

Uh oh!