Skip to content

AIConfigurator Release v0.6.0

Latest

Choose a tag to compare

@pvijayakrish pvijayakrish released this 12 Feb 19:08
a813bd6

Release v0.6.0

This release focuses on collector upgrades, new/updated performance datasets (H100/H200/B200/Blackwell), and more robust config generation + CI automation.

Highlights

Collector upgrades + compatibility (SGLang/VLLM)

SGLang non-wideep collector upgraded to 0.5.6 (compatible with 0.5.5) (#176)
VLLM bumped to 0.12.0 (#181)
VLLM MLA collector updated for v0.12.0 (#197)

New attention/MLA collection + fixes

Added MLA attention collectors for VLLM (#177)
Fixed 1.2.0rc5 MLA + all-reduce generation (#196)

Blackwell / B200 enablement + datasets

Non-wideep SGLang collector Blackwell support (#218)
Added B200 TRTLLM 1.2.0rc5 data (#202)
Added B200 SGLang 0.5.6.post2 (no wideep) data (#223)
Fixed head dimension handling when not collecting Blackwell data (#236)

Performance DB refresh (H100/H200) + data cleanup

Removed old 0.20.0 DB and added new data from 1.2.0rc5 (H100 & H200) (#198)
Added new performance data for VLLM 0.12.0 (H100 & H200) (#199)
Added new performance data for SGLang 0.5.6.post2 (#200, #201)
Cleaned incomplete/old datasets (VLLM 0.11.0, SGLang 0.5.1.post1, TRTLLM 1.2.0rc2) (#204)
Updated H200 SGLang DB (#235)

More reliable generation + automation

“Lowest latency under SLA” support (#182)
Config/task/perf DB made more error-proof (+ L40S custom all-reduce data) (#183)
Added hf_token support in generated configs (#230)
Auto-download DeepSeek-V3 config from HuggingFace (#227)
CI: improved daily support matrix workflow automation/comparisons (#247)
Added cherry-pick workflow (#205)
Cherry-pick: add k8s_hf_home option (#305)

What's Changed

🚀 Features & Improvements
Upgrade SGLang non-wideep collector to 0.5.6 (compatible with 0.5.5) (#176)
Rename and simplify power-law functions for DeepEP MoE (#174)
Add MLA attention collectors for VLLM (#177)
Bump VLLM to 0.12.0 (#181)
Support “lowest latency under SLA” (#182)
Support 1-GPU collector (#185)
Make perf DB and task config more error-proof; add L40S SGLang custom all-reduce data (#183)
Delete 0.20.0 database and add new data from 1.2.0rc5 (H100 & H200) (#198)
Add new performance data for VLLM 0.12.0 (H100 & H200) (#199)
Add new performance data for SGLang 0.5.6.post2 (#200)
Add new data for SGLang 0.5.6.post2 on H200 (#201)
Make VLLM MLA collector compatible with v0.12.0 (#197)
Add B200 TRTLLM 1.2.0rc5 data (#202)
Refactor wideep collectors for collect.py framework with multiprocess support (#188)
Create cherry-pick.yml (#205)
SGLang non-wideep collector: Blackwell support (#218)
Add B200 SGLang 0.5.6.post2 data without wideep (#223)
Refactor tests and add marks for better management (#224)
Add hf_token support in AIC generated config (#230)
Collector: auto-download DeepSeek-V3 config from HuggingFace (#227)
CI: update daily support matrix workflow to enhance automation and comparison features (#247)
Cherry-pick: add k8s_hf_home option (#305)

🐛 Bug Fixes
Fix FP8 block GEMM collector (#171)
Use TTFT to filter prefill candidates (#169)
MoE args and workload distribution fallback (#168)
Delete wideep MLP for SGLang; improve DB/op query returns; fix collector repeat handling (#170)
Update DeepEP interface for SGLang 0.5.6+ compatibility (#172)
Use model_family for checks instead of model_name (#186)
Fix broken SGLang wideep deepseek path (#195)
Fix 1.2.0rc5 MLA and all-reduce generation (#196)
Delete incomplete data for VLLM 0.11.0, SGLang 0.5.1.post1, TRTLLM 1.2.0rc2 (#204)
Fix config generator missing MoE parallel config when using huggingface_id (#193)
Fix eval FileNotFoundError for service_mode=disagg output path (#194)
Add common code owners to avoid blocking merge (#225)
Update copyright date to 2025–2026 (#220)
Remove nvfp4 shape restriction (#221)
Fix automation pipeline bug (#217)
Fix ISL=1 and smaller local heads (#222)
Support matrix: update CSV + fix daily workflow (#226)
Default cache_transceiver_config.backend to DEFAULT (#231)
AIC eval: support replica > 1 (#234)
Include --max-model-len and --max-num-batched-tokens in VLLM run.sh (#238)
Update H200 SGLang database (#235)
Fix config generator for multiple replicas (#232)
Improve generator MoE parallelism for different backend (#237)
Add generator doc (#241)
Enable hybrid TP/DP/EP mode in wideep SGLang (#229)
Add w4a16_mxfp4 MoE data and set proper moe_quant_mode default for gpt-oss (#240)
Correct v_head_dim and head_dim_total when not collecting data for Blackwell (#236)
Fix multinode disagg config generator for GB200 (#242)
Fix TRTLLM tp=moe_tp × moe_ep behavior (#248)
CI: use self-hosted runners to avoid GitHub runner OOM (#252)
Add SGLang enable-mix-chunk for generator (#257)
Fix SGLang enable mixed chunk (#258)
Support matrix update (#270)
Update generator doc + allow graceful CLI exit when lacking DB data (#286)
Align generator run script with dynamo 0.8.0 (#283)
Use nixl as default disagg transfer backend for SGLang 0.5.6.post2 + allow CLI override (#287)
Fix VLLM/SGLang k8s template missing k8s_model_cache param (#285)
Move PVC support from frontend to workers for SGLang backend (#292)
Docs/guide updates on dynamo deployment + remove dynamoNamespace field (#300, #299)
Handle SGLang L40S missing data gracefully (#306)

📚 Documentation
Triton install doc for w4a16_mxfp4 MoE kernel collection (#239)
Add nccl-test path to avoid missing all_gather_perf (#233)
Update SGLang version docs to 0.5.6.post2 (#228)
Add AIC arXiv paper to README (#244)
Add citation section (#251)
New Contributors
@hhzhang16 made their first contribution in (#180)
@Elaine4CY made their first contribution in (#217)
@panpan0000 made their first contribution in (#233)

Full Changelog: v0.5.0.post0...v0.6.0