feat: add --max-concurrency to limit global concurrency search space by Arsene12358 · Pull Request #505 · ai-dynamo/aiconfigurator

Arsene12358 · 2026-03-03T13:15:34Z

Summary

Add --max-concurrency CLI flag, Python API parameter, and YAML experiment key to let users cap the global concurrency (total concurrent requests across all DP ranks / workers) considered during AIConfigurator's Pareto sweep.
Agg mode: caps the per-engine batch size so that batch_size * pp_size * attention_dp_size <= max_concurrency, avoiding unnecessary sweep iterations. Parallel configs where even batch_size=1 would exceed the limit are skipped entirely.
Disagg mode: filters out replica compositions where per_worker_concurrency * num_decode_workers > max_concurrency during the rate-matching step.
Semantics are consistent with aiperf's --concurrency flag: the value represents the total number of in-flight requests for the entire deployment, not per DP rank.

Changes

File	What changed
`sdk/task.py`	`TaskContext` and `TaskConfig` gain `max_concurrency`; `TaskRunner` forwards it to `agg_pareto` / `disagg_pareto`
`sdk/pareto_analysis.py`	`agg_pareto()` computes an effective `max_batch_size` per parallel config; `disagg_pareto()` threads through to the session
`sdk/inference_session.py`	`find_best_disagg_result_under_constraints()` filters compositions exceeding the limit
`cli/main.py`	`--max-concurrency` arg added to default mode; recognized in YAML experiment configs
`cli/api.py`	`cli_default()` accepts `max_concurrency` kwarg
`cli/example.yaml`	Documents the new key
`tests/unit/sdk/task/test_task.py`	5 new tests: storage, default, agg forwarding, disagg forwarding, None default
`tests/unit/cli/test_argument_parsing.py`	2 new tests: default None, integer parsing

copy-pr-bot · 2026-03-03T13:15:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Allow users to cap the global concurrency (total concurrent requests across all DP ranks / workers) considered during the Pareto sweep. Agg mode: the per-engine batch size sweep is capped so that batch_size * pp_size * attention_dp_size <= max_concurrency. Disagg mode: replica compositions whose per_worker_concurrency * num_decode_workers > max_concurrency are filtered out during rate matching. Exposed via: - CLI: --max-concurrency <int> - Python API: cli_default(..., max_concurrency=N) - YAML experiment config: max_concurrency: N Signed-off-by: Yimingl <yimingl@nvidia.com>

…tests - to_yaml() now includes max_concurrency when set (was silently dropped) - Autoscale path (pick_autoscale) now filters by max_concurrency - Validate max_concurrency >= 1 in TaskConfig (raises ValueError) - Add INFO-level log when max_concurrency constraint is active - Update find_best_disagg_result_under_constraints docstring - Add tests: validation rejects 0/-5, to_yaml round-trip with/without Signed-off-by: Yimingl <yimingl@nvidia.com>

Signed-off-by: Yimingl <yimingl@nvidia.com>

Arsene12358 · 2026-03-23T13:06:09Z

closing, will design a new version

github-actions bot added the feat label Mar 3, 2026

Arsene12358 added 2 commits March 3, 2026 21:21

Arsene12358 force-pushed the feat/max-concurrency branch from 360f9c9 to dcf8615 Compare March 3, 2026 13:22

style: fix ruff formatting in pareto_analysis and test_argument_parsing

cdcbe21

Signed-off-by: Yimingl <yimingl@nvidia.com>

Arsene12358 closed this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --max-concurrency to limit global concurrency search space#505

feat: add --max-concurrency to limit global concurrency search space#505
Arsene12358 wants to merge 3 commits intoai-dynamo:mainfrom
Arsene12358:feat/max-concurrency

Arsene12358 commented Mar 3, 2026

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

Arsene12358 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Arsene12358 commented Mar 3, 2026

Summary

Changes

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

Arsene12358 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant