Skip to content

Releases: ai-dynamo/aiconfigurator

AIConfigurator Release v0.6.0

12 Feb 19:08
a813bd6

Choose a tag to compare

Release v0.6.0

This release focuses on collector upgrades, new/updated performance datasets (H100/H200/B200/Blackwell), and more robust config generation + CI automation.

Highlights

Collector upgrades + compatibility (SGLang/VLLM)

SGLang non-wideep collector upgraded to 0.5.6 (compatible with 0.5.5) (#176)
VLLM bumped to 0.12.0 (#181)
VLLM MLA collector updated for v0.12.0 (#197)

New attention/MLA collection + fixes

Added MLA attention collectors for VLLM (#177)
Fixed 1.2.0rc5 MLA + all-reduce generation (#196)

Blackwell / B200 enablement + datasets

Non-wideep SGLang collector Blackwell support (#218)
Added B200 TRTLLM 1.2.0rc5 data (#202)
Added B200 SGLang 0.5.6.post2 (no wideep) data (#223)
Fixed head dimension handling when not collecting Blackwell data (#236)

Performance DB refresh (H100/H200) + data cleanup

Removed old 0.20.0 DB and added new data from 1.2.0rc5 (H100 & H200) (#198)
Added new performance data for VLLM 0.12.0 (H100 & H200) (#199)
Added new performance data for SGLang 0.5.6.post2 (#200, #201)
Cleaned incomplete/old datasets (VLLM 0.11.0, SGLang 0.5.1.post1, TRTLLM 1.2.0rc2) (#204)
Updated H200 SGLang DB (#235)

More reliable generation + automation

“Lowest latency under SLA” support (#182)
Config/task/perf DB made more error-proof (+ L40S custom all-reduce data) (#183)
Added hf_token support in generated configs (#230)
Auto-download DeepSeek-V3 config from HuggingFace (#227)
CI: improved daily support matrix workflow automation/comparisons (#247)
Added cherry-pick workflow (#205)
Cherry-pick: add k8s_hf_home option (#305)

What's Changed

🚀 Features & Improvements
Upgrade SGLang non-wideep collector to 0.5.6 (compatible with 0.5.5) (#176)
Rename and simplify power-law functions for DeepEP MoE (#174)
Add MLA attention collectors for VLLM (#177)
Bump VLLM to 0.12.0 (#181)
Support “lowest latency under SLA” (#182)
Support 1-GPU collector (#185)
Make perf DB and task config more error-proof; add L40S SGLang custom all-reduce data (#183)
Delete 0.20.0 database and add new data from 1.2.0rc5 (H100 & H200) (#198)
Add new performance data for VLLM 0.12.0 (H100 & H200) (#199)
Add new performance data for SGLang 0.5.6.post2 (#200)
Add new data for SGLang 0.5.6.post2 on H200 (#201)
Make VLLM MLA collector compatible with v0.12.0 (#197)
Add B200 TRTLLM 1.2.0rc5 data (#202)
Refactor wideep collectors for collect.py framework with multiprocess support (#188)
Create cherry-pick.yml (#205)
SGLang non-wideep collector: Blackwell support (#218)
Add B200 SGLang 0.5.6.post2 data without wideep (#223)
Refactor tests and add marks for better management (#224)
Add hf_token support in AIC generated config (#230)
Collector: auto-download DeepSeek-V3 config from HuggingFace (#227)
CI: update daily support matrix workflow to enhance automation and comparison features (#247)
Cherry-pick: add k8s_hf_home option (#305)

🐛 Bug Fixes
Fix FP8 block GEMM collector (#171)
Use TTFT to filter prefill candidates (#169)
MoE args and workload distribution fallback (#168)
Delete wideep MLP for SGLang; improve DB/op query returns; fix collector repeat handling (#170)
Update DeepEP interface for SGLang 0.5.6+ compatibility (#172)
Use model_family for checks instead of model_name (#186)
Fix broken SGLang wideep deepseek path (#195)
Fix 1.2.0rc5 MLA and all-reduce generation (#196)
Delete incomplete data for VLLM 0.11.0, SGLang 0.5.1.post1, TRTLLM 1.2.0rc2 (#204)
Fix config generator missing MoE parallel config when using huggingface_id (#193)
Fix eval FileNotFoundError for service_mode=disagg output path (#194)
Add common code owners to avoid blocking merge (#225)
Update copyright date to 2025–2026 (#220)
Remove nvfp4 shape restriction (#221)
Fix automation pipeline bug (#217)
Fix ISL=1 and smaller local heads (#222)
Support matrix: update CSV + fix daily workflow (#226)
Default cache_transceiver_config.backend to DEFAULT (#231)
AIC eval: support replica > 1 (#234)
Include --max-model-len and --max-num-batched-tokens in VLLM run.sh (#238)
Update H200 SGLang database (#235)
Fix config generator for multiple replicas (#232)
Improve generator MoE parallelism for different backend (#237)
Add generator doc (#241)
Enable hybrid TP/DP/EP mode in wideep SGLang (#229)
Add w4a16_mxfp4 MoE data and set proper moe_quant_mode default for gpt-oss (#240)
Correct v_head_dim and head_dim_total when not collecting data for Blackwell (#236)
Fix multinode disagg config generator for GB200 (#242)
Fix TRTLLM tp=moe_tp × moe_ep behavior (#248)
CI: use self-hosted runners to avoid GitHub runner OOM (#252)
Add SGLang enable-mix-chunk for generator (#257)
Fix SGLang enable mixed chunk (#258)
Support matrix update (#270)
Update generator doc + allow graceful CLI exit when lacking DB data (#286)
Align generator run script with dynamo 0.8.0 (#283)
Use nixl as default disagg transfer backend for SGLang 0.5.6.post2 + allow CLI override (#287)
Fix VLLM/SGLang k8s template missing k8s_model_cache param (#285)
Move PVC support from frontend to workers for SGLang backend (#292)
Docs/guide updates on dynamo deployment + remove dynamoNamespace field (#300, #299)
Handle SGLang L40S missing data gracefully ([#306](https://github.co...

Read more

AIConfigurator Release v0.5.0.post0

21 Jan 21:14
1de1400

Choose a tag to compare

AIConfigurator 0.5.0.post0

AIConfigurator 0.5.0.post0 is a patch release that updates container image compatibility and fixes copyright headers.

Release Highlights

This is a maintenance release for AIConfigurator 0.5.0 that ensures compatibility with Dynamo container image 0.8.0.

Changes

  1. Dynamo Container Compatibility: Updated AIConfigurator 0.5.0 to use the matched Dynamo container image 0.8.0 (#262)
  2. Copyright Update: Updated copyright date to 2025-2026 to pass CI checks (#264)

Full Changelog: v0.5.0...v0.5.0.post0

AIConfigurator Release v0.5.0

15 Jan 23:05
f178c8a

Choose a tag to compare

AIConfigurator 0.5.0

AIConfigurator 0.5.0 brings significant performance optimizations, expands backend support for vLLM and SGLang, and introduces new modeling capabilities including Power Estimation and Power Law workload distribution. This release also adds comprehensive support matrix testing.

Release Highlights

This version focuses on performance efficiency with optimizations to the generation engine and database lookups. New hardware data support includes L40S for SGLang, and we have expanded MoE (Mixture of Experts) support to the vLLM backend. Additionally, users can now target End-to-End (E2E) latency and estimate power consumption.

Features and Improvements

1. Performance Optimizations

  • Engine Optimization: Optimized the implementation of run_generation and num_gpu lookups for faster execution (by @anish-shanbhag in #113, #114).
  • Efficient Data Handling: Replaced dataframes with dictionaries for batch operations in InferenceSummary generation and added caching for repeated queries to improve speed (by @anish-shanbhag in #115, #128).

2. New Modeling Capabilities

  • Power Estimation: Added support for estimating power consumption of configurations (by @kaim-eng in #153).
  • Workload Distribution: Introduced a 'power_law' option for workload distribution in the CLI and prefill modeling (by @xutizhou in #147, #134).
  • Hybrid Modeling: Added support for hybrid modeling scenarios (by @tianhaox in #125).
  • Latency Targets: Users can now set E2E latency as a target metric (by @tianhaox in #145).

3. Framework and Hardware Support

4. User Interface

  • Profiler UI: Introduced a new Profiler UI for better visualization and analysis (by @Harrilee in #117).
  • UI Updates: Relocated GPU cost references and updated profiling components (by @Harrilee in #167).

5. Build, CI and Test

  • Testing Framework: Added a comprehensive support matrix testing framework (by @Harrilee in #126).
  • Maintenance: Added a CODEOWNERS file for better repository management (by @Arsene12358 in #109).

Bug Fixes

  • SGLang Fixes: Addressed vulnerabilities in the collector (#108), aligned GEMM quantization methods (#122), and fixed attention collection for the regular path (#123).
  • MoE & Model Fixes: Fixed MoE memory issues and NVFP4 GEMM for TRT-LLM 1.x (#131), removed generation repeat attention (#148), and updated workload distribution logic for MoE/DeepSeek models (#146).
  • CLI & Compatibility: Fixed CLI for GB200 with TP > 4 (#137), improved Python compatibility by using Union instead of | (#158), and relaxed Pydantic requirements (#161, #162).
  • General Fixes: Fixed team name parsing (#130), updated custom_allreduce file locations (#156, #160), and removed PII from error stack traces (#166).

Documentation

New Contributors

Full Changelog: v0.4.0...v0.5.0

AIConfigurator Release v0.4.0

24 Nov 17:01
3a4f56d

Choose a tag to compare

AIConfigurator 0.4.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments.AIConfigurator 0.4.0 adds extensive support for the SGLang backend, covering DeepSeek WideEP path and regular path with dense and MoE models support. We also added dense models support for vLLM backend. With this release, AIConfigurator now supports all 3 major backends: TensorRT-LLM, SGLang, and vLLM.

Release Highlights

AIConfigurator 0.4.0 significantly expands backend support, achieving coverage for all three major backends. This release introduces support for L40S GPUs, Qwen3 30B A3B MOE models, and direct HuggingFace model loading via --hf_id.

Additionally, it adds prefix cache modeling support to simulate workloads with system prompts or prefix cache hits, and unifies SGLang paths for better maintainability.

Features and Improvements

1. New Hardware Support

2. Framework Support

3. Expanded Model Support

4. Modeling and Improvements

5. Build, CI and Test

Bug Fixes

Documentation

  • Updated README to include A100 SXM in support matrix (by @simone-chen in #62)
  • Added git lfs pull step before install from source code to download full data files (by @cr7258 in #69)
  • Added more A100 docs (by @jasonqinzhou in #67)

New Contributors

AIConfigurator v0.3.0

24 Oct 18:03
8025c3b

Choose a tag to compare

AIConfigurator 0.3.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, GB200, B200, A100, or future hardware with the Dynamo backend.

Currently AIConfigurator supports NVIDIA TensorRT-LLM as the primary inference engine, with limited support for SGLang.

Release Highlights

AIConfigurator 0.3.0 introduces significant expansion in hardware support, framework compatibility, and model coverage. This release adds support for multiple new GPU architectures, introduces SGLang framework integration, and expands the model library with new Qwen3 variants and GPT-OSS models.

Features and Improvements

1. New Hardware Support

2. New Framework Support: SGLang and Wide-EP

Note: SGLang support is currently limited and experimental.

  • Added SGLang GEMM collector and performance data (by @Atream in #28)
  • Added SGLang MLA-BMM collector and performance data (by @Atream in #29)
  • Added SGLang MLA collector and performance data (by @Atream in #31)
  • Added SGLang fused MoE Triton collector (by @Atream in #39)
  • Added support for disaggregated DeepSeek in SGLang (by @AichenF in #54)

3. Expanded Model Support

4. Configuration Generation and Evaluation

  • Refactored generator as a standalone module for improved modularity (by @Ethan-ES in #40)
  • Added new CLI and SDK support for presets in search space configuration (by @tianhaox in #44)
  • Added AIPerf integration for performance evaluation (by @Ethan-ES in #57)
  • Improved aggregated and disaggregated modeling and performance (by @tianhaox in #45)

5. Collector Improvements

  • Enhanced collector to support data collection for windowed attention and additional MoE configurations (by @Arsene12358 in #33)

Bug Fixes

  • Fixed LICENSE file (by @saturley-hall in #21)
  • Added allowed path workspace configuration (by @tianhaox in #23)
  • Updated MoE tuning logic (by @YijiaZhao in #19)
  • Updated Gradio version for compatibility (by @saturley-hall in #35)
  • Improved error handling for database loading failures (by @tianhaox in #37, #38)
  • Enhanced Kubernetes support with corresponding documentation (by @Ethan-ES in #50)
  • Changed NVIDIA SMI command from -lgc to -ac (by @LyleLuo in #49)
  • Excluded FP8 from MLA generation post-processing test cases for Ampere architecture (by @simone-chen in #52)
  • Fixed TensorRT-LLM 1.0.0 collector compatibility (by @tianhaox in #48)
  • Improved tensor initialization to occur directly on device (by @ilyasher in #51)
  • Enabled SDK tests in CI pipeline (by @ilyasher in #46)

Documentation

New Contributors

For the complete list of changes, see the full changelog.

AIConfigurator Release v0.2.0

18 Sep 19:20
f3d7bba

Choose a tag to compare

AIConfigurator 0.2.0

AIConfigurator is a tool that helps users find optimal configurations for deploying LLM inference workloads in distributed, multi-GPU environments such as those using NVIDIA H100, H200, or future hardware with the Dynamo backend.

Currently AIConfigurator supports NVIDIA TensorRT-LLM as inference engine.

Release Highlights

AIConfigurator 0.2.0 brings several new features, improvements, and important fixes to enhance configuration workflows and automation.

Features and Improvements

1. Automation

  • Added automation evaluation support (by @tianhaox in #5)

2. Collector improvement

  • Mix-of-Expert collector now supports autotuning for improved efficiency (by @YijiaZhao in #11)

3. Dynamo upgrade

Bug Fixes

  • Switched to using torch flow collector and added more default memory configuration options (by @tianhaox in #7)
  • Improved performance alignment logic and reliability (by @tianhaox in #10)
  • Enhanced mixture-of-experts (MoE) support: added power law handling and improved solver calculation for generative attention (by @tianhaox in #15)
  • Added safe directory creation to mitigate security risk and clarified error handling (by @tianhaox in #16)

Documentation

New Contributors

For the complete list of changes, see the full changelog.

v0.1.1

28 Aug 00:16
1f004eb

Choose a tag to compare

What's Changed

🚀 Features & Improvements

  • feat: feat: power_law_moe collector and webapp by @YijiaZhao in #2

🐛 Bug Fixes

  • fix: update project name, version, system data support matrix by @tianhaox in #3
  • fix: Harrison/fix spdx headers by @saturley-hall in #6

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0 Initial release of AIConfigurator

12 Aug 19:10
efcae12

Choose a tag to compare

AIConfigurator is a tool designed for Dynamo to optimize disaggregated serving for generative AI models. It automatically finds optimal deployment configurations by searching thousands of candidates in tens of seconds, helping you achieve better throughput and latency in disaggregated serving.

Major Features

  • Automated Configuration Search: Search across thousands of deployment configurations to find optimal one of both disaggregated and aggregated system and do intelligent choice of disaggregated or aggregated deployment.
  • SLA-based Optimization: Optimize under TTFT (Time-To-First-Token) and TPOT (Time-Per-Output-Token) constraints to address throughput@latency problem
  • Dynamo Integration: Seamless integration with Dynamo by automatic generation of deployment configurations
  • Multi-framework Support: Compatible with NVIDIA TensorRT-LLM backend with extensible architecture for other frameworks (coming soon)

Model and System Support

  • Comprehensive Model Support:
    • GPT
    • LLAMA (2,3)
    • MoE
    • QWEN
    • DEEPSEEK_V3
    • NEMOTRON model families
  • System Support: H200 SXM and H100 SXM

User Interfaces

  • Command Line Interface (Suggested): Simple CLI with 3 basic arguments for quick start and configuration generation
  • Web Application: Interactive web interface for advanced configuration tuning and visualization