Skip to content

Add FP8 support to nano-v3 branchΒ #1704

@Rexhaif

Description

@Rexhaif

Is your feature request related to a problem? Please describe.
When attempting to train (SFT) a base version of Nemotron Nano v3 30B-A3B, I've encountered this error:
AssertionError: FP8 block scaled GEMM requires Hopper and CUDA >= 12.9.

I assume that might be because nano-v3 branch does not include that specific commit that addresses this issue (as mentioned in the docs). Any chance it could be added there?

Describe the solution you'd like
Adding that commit into nano-v3 branch.

Describe alternatives you've considered

  • I've tried using main branch directly, but it looks like the pinned Megatron-LM version is not compatible with Nemotron Nano v3 - it always fails on this line: num_query_groups (2) must be a multiple of tensor_model_parallel_size (4).

Current latest version of Megatron-LM seems to use a different check, and it should work with nemotron then, but I haven't had time to test it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions