Skip to content

[Bug]: Gemma3 RoPE is broken for engine workflow #10058

@shivghai

Description

@shivghai

System Info

Debian 12
256GB RAM
TRT-LLM: 0.16.0
Running on an H100 80GB

Who can help?

@kaiyux @byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  • Build engine for Gemma 3
  • Run inference with very long sequence (>>1024 tokens for a local window attn size of 1024)
  • Outputs are bad

An example of a long sequence could be something like: "Can you repeat this exact paragraph: "long sequence""; this will break with the currently built Gemma3 engine

Expected behavior

  • Outputs should match HF

actual behavior

  • Outputs bad

additional notes

I have a fix I'd like to contribute - #9961

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Customized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions