Skip to content

How long did pretraining take? #114

@99991

Description

@99991

I could not find any mention of training time in the tech report. Could you tell us? Something like "On X B200 GPUs, training took Y GPU hours for Nemotron 3 Super 120B-A12B with Z precision.", or "Pretraining throughput for one B200 GPU was X tokens per second.` (if you want to keep networking overhead a secret).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions