-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backendfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Description
🚀 The feature, motivation and pitch
#9556 introduces a graph-traversal style pattern matcher that is hard to maintain and currently only tested / hard-coded to Llama4.
Let's develop a pattern matcher in the style of the attention pattern matcher (see https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/auto_deploy/transform/library/attention.py) that enables flexible matching for multiple patterns and is more easily maintainable and extendable.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backendfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Type
Projects
Status
Ready