Skip to content

[Feature]: AutoDeploy: develop generalized pattern matcher for bmm-style MoE #10048

@lucaslie

Description

@lucaslie

🚀 The feature, motivation and pitch

#9556 introduces a graph-traversal style pattern matcher that is hard to maintain and currently only tested / hard-coded to Llama4.

Let's develop a pattern matcher in the style of the attention pattern matcher (see https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/auto_deploy/transform/library/attention.py) that enables flexible matching for multiple patterns and is more easily maintainable and extendable.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

AutoDeploy<NV> AutoDeploy Backendfeature requestNew feature or request. This includes new model, dtype, functionality support

Type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions