-
Notifications
You must be signed in to change notification settings - Fork 208
Description
Describe the bug
When set sequence_packing=true and use Megatron backend for fine-tuning gpt-oss. The debug info says no attention backend available.
If sequence_packing=False, only UnfusedAttention is available, FlashAttention disabled for softmax_type = learnable, FusedAttention disabled as no backend supports the provided input.
This is different from the fine-tuning in megatron-bridge, which support FusedAttention with learnable softmax.
Update: The different backends enabled are due to different versions of cuDNN in docker image. Latest nemo-rl nano image has cuDNN version = 91002, but in nemo image cuDNN version is 91310.
Steps/Code to reproduce bug
Please list minimal steps or code snippet for us to be able to reproduce the bug.
A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.