[Feature]:  AutoDeploy: develop generalized pattern matcher for bmm-style MoE

### 🚀 The feature, motivation and pitch

#9556 introduces a graph-traversal style pattern matcher that is hard to maintain and currently only tested / hard-coded to Llama4. 

Let's develop a pattern matcher in the style of the attention pattern matcher (see https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/auto_deploy/transform/library/attention.py) that enables flexible matching for multiple patterns and is more easily maintainable and extendable. 

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: AutoDeploy: develop generalized pattern matcher for bmm-style MoE #10048

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: AutoDeploy: develop generalized pattern matcher for bmm-style MoE #10048

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions