-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendCustomized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't workingfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supporttriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
🚀 The feature, motivation and pitch
The autotuner is used when AD record cuda-graphs, and per-shape tactics are cached.
The operator should not explicitly use the auto-tuner context because that is the responsibility of the AD engine.
TRTLLM's nvfp4_gemm_runner is responsible for auto-tuning and caching a tactic and uses this tactic during inference, but only if the auto-tuner is not inside an auto-tuner context.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendCustomized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't workingfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supporttriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Type
Projects
Status
Done