-
Notifications
You must be signed in to change notification settings - Fork 183
How long did pretraining take? #114
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
I could not find any mention of training time in the tech report. Could you tell us? Something like "On X B200 GPUs, training took Y GPU hours for Nemotron 3 Super 120B-A12B with Z precision.", or "Pretraining throughput for one B200 GPU was X tokens per second.` (if you want to keep networking overhead a secret).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request