sequence packing for gpt-oss

**Is your feature request related to a problem? Please describe.**
I want to use sequence packing for better efficiency when fine-tuning gpt-oss. But got assert.

**Describe the solution you'd like**
I want to know why currently sequence packing is not supported for gpt-oss and if not so, I would like support sequence packing for gpt-oss.

**Describe alternatives you've considered**
I tried to removed assert. The log shows no dot product attention backend is available for the provided inputs. When adding debug info, I find that all attention backends are disabled because softmax_type = 'learnable' and qkv_format = 'thd'.

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sequence packing for gpt-oss #1782

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sequence packing for gpt-oss #1782

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions