Skip to content

[Regression] Low QWEN3 30B B200 and H100 performance with dropless #1998

@erhoo82

Description

@erhoo82

We have 2X slower performance in QWEN3 30B on H100 and B200 with the dropless schedule compared to the token-drop schedule.

Basically, the perf in 25.11 is 2X slower than 25.09.
We need to fix this in 26.02.

Metadata

Metadata

Labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions