[Bug]: Gemma3 RoPE is broken for engine workflow

### System Info

Debian 12
256GB RAM
TRT-LLM: 0.16.0
Running on an H100 80GB



### Who can help?

@kaiyux @byshiue 

### Information

- [x] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

- Build engine for Gemma 3
- Run inference with very long sequence (>>1024 tokens for a local window attn size of 1024)
- Outputs are bad

An example of a long sequence could be something like: "Can you repeat this exact paragraph: "long sequence""; this will break with the currently built Gemma3 engine


### Expected behavior

- Outputs should match HF

### actual behavior

- Outputs bad

### additional notes

I have a fix I'd like to contribute -  https://github.com/NVIDIA/TensorRT-LLM/pull/9961

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Gemma3 RoPE is broken for engine workflow #10058

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Gemma3 RoPE is broken for engine workflow #10058

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions