-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed as not planned
Labels
feature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supportstalewaiting for feedback
Description
On 0.20.0, I got an error trying to use speculative decoding because of an unsupported param. But even after plumbing it in, it's still giving me an error:
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] Assertion failed: No available XQA kernels are found for speculative decoding mode. (/home/jenkins/agent/workspace/LLM/main/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/common/attentionOp.cpp:2027)
How can the kernels be added for gemma3 models?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supportstalewaiting for feedback