Skip to content

NeMo-RL DPO OOM issues with Nemotron-3-Nano-30B #1922

@slic33

Description

@slic33

Describe the bug

Ran NeMo-RL DPO on Nemotron-3-Nano-30B BF16 with custom {prompt, chosen, rejected} dataset and consistently hit OOM during DTensorPolicyWorker init. Set-up: Single node, 4×A100-80GB (Brev), TP=4, CPU offload, activation checkpointing, long context (~3.2-3.4k tokens). Open questions: Is 4×80GB expected to be insufficient for this recipe? Any known working DPO config for Nemotron-3-Nano-30B? Is DPO + LoRA supported?

Expected behavior

Provide working DPO configuration for Nemotron-3-Nano-30B. Document memory requirements and LoRA+DPO support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions