Skip to content

speculative decoding in gemma3 #6067

@tonyay163

Description

@tonyay163

On 0.20.0, I got an error trying to use speculative decoding because of an unsupported param. But even after plumbing it in, it's still giving me an error:

terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
  what():  [TensorRT-LLM][ERROR] Assertion failed: No available XQA kernels are found for speculative decoding mode. (/home/jenkins/agent/workspace/LLM/main/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/common/attentionOp.cpp:2027)

How can the kernels be added for gemma3 models?

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions