Add FP8 support to nano-v3 branch

**Is your feature request related to a problem? Please describe.**
When attempting to train (SFT) a base version of Nemotron Nano v3 30B-A3B, I've encountered this error:
`AssertionError: FP8 block scaled GEMM requires Hopper and CUDA >= 12.9.`

I assume that might be because nano-v3 branch does not include that specific [commit](https://docs.nvidia.com/nemo/rl/latest/fp8.html#compatibility-note-for-deepseek-style-fp8-training) that addresses this issue (as mentioned in the docs). Any chance it could be added there?

**Describe the solution you'd like**
Adding that commit into nano-v3 branch.

**Describe alternatives you've considered**
- I've tried using main branch directly, but it looks like the pinned Megatron-LM version is not compatible with Nemotron Nano v3 - it always fails on this [line](https://github.com/terrykong/Megatron-LM/blob/25a62edf77b5130f888328ca8119d6a76117cf23/megatron/core/transformer/transformer_config.py#L853):   ``num_query_groups (2) must be a multiple of tensor_model_parallel_size (4).``

Current latest version of Megatron-LM seems to use a different [check](https://github.com/NVIDIA/Megatron-LM/blob/dd7c9f4f6963e133a07515ffc51fa331a8658184/megatron/core/transformer/transformer_config.py#L815), and it should work with nemotron then, but I haven't had time to test it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add FP8 support to nano-v3 branch #1704

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add FP8 support to nano-v3 branch #1704

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions