Skip to content

gpt-oss sft megatron does not support sequence packingΒ #1685

@jordane95

Description

@jordane95

Describe the bug

When set sequence_packing=true and use Megatron backend for fine-tuning gpt-oss. The debug info says no attention backend available.
If sequence_packing=False, only UnfusedAttention is available, FlashAttention disabled for softmax_type = learnable, FusedAttention disabled as no backend supports the provided input.

This is different from the fine-tuning in megatron-bridge, which support FusedAttention with learnable softmax.

Update: The different backends enabled are due to different versions of cuDNN in docker image. Latest nemo-rl nano image has cuDNN version = 91002, but in nemo image cuDNN version is 91310.

Steps/Code to reproduce bug

Please list minimal steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions