-
Notifications
You must be signed in to change notification settings - Fork 250
Open
Labels
Description
Describe the bug
Ran NeMo-RL DPO on Nemotron-3-Nano-30B BF16 with custom {prompt, chosen, rejected} dataset and consistently hit OOM during DTensorPolicyWorker init. Set-up: Single node, 4×A100-80GB (Brev), TP=4, CPU offload, activation checkpointing, long context (~3.2-3.4k tokens). Open questions: Is 4×80GB expected to be insufficient for this recipe? Any known working DPO config for Nemotron-3-Nano-30B? Is DPO + LoRA supported?
Expected behavior
Provide working DPO configuration for Nemotron-3-Nano-30B. Document memory requirements and LoRA+DPO support.
Reactions are currently unavailable