NeMo-RL DPO OOM issues with Nemotron-3-Nano-30B

**Describe the bug**

Ran NeMo-RL DPO on Nemotron-3-Nano-30B BF16 with custom {prompt, chosen, rejected} dataset and consistently hit OOM during DTensorPolicyWorker init. Set-up: Single node, 4×A100-80GB (Brev), TP=4, CPU offload, activation checkpointing, long context (~3.2-3.4k tokens). Open questions: Is 4×80GB expected to be insufficient for this recipe? Any known working DPO config for Nemotron-3-Nano-30B? Is DPO + LoRA supported?

**Expected behavior**

Provide working DPO configuration for Nemotron-3-Nano-30B. Document memory requirements and LoRA+DPO support.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NeMo-RL DPO OOM issues with Nemotron-3-Nano-30B #1922

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NeMo-RL DPO OOM issues with Nemotron-3-Nano-30B #1922

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions