Skip to content

[AutoDeploy]: Improve MoE performance for TP>1 #8232

@nzmora-nvidia

Description

@nzmora-nvidia

🚀 The feature, motivation and pitch

Autotuning was already enabled here: #8120

For tp=1, perf gap is ~8% on ToT.
For multiple GPUs we're still seeing a ~50% gap compared to manual trt-llm (previously was 300%-400%),

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy BackendScale-out<NV>Multi-GPU and distributed inference scaling issues, tensor/pipeline/data parallelism

Type

No type

Projects

Status

Rejected

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions