NVFP4 Selective Fwd / Bwd

The [NVFP4 Pretraining paper](https://arxiv.org/pdf/2509.25149) observed that in cases where there is a loss gap between `NVFP4` and a higher precision baseline that the gap can be closed by switching to higher precision during the last ~10% of training (See Appendix D).

<img width="683" height="412" alt="Image" src="https://github.com/user-attachments/assets/3125e80a-b145-4a00-8b13-ebff5e5ebff4" />

Moreover, they observe that the majority of quantization error results from the forward pass.

What is the recommended way to selectively switch to higher precision *only* for `FProp` and while using `NVFP4` only for backward?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NVFP4 Selective Fwd / Bwd #2569

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NVFP4 Selective Fwd / Bwd #2569

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions