-
Notifications
You must be signed in to change notification settings - Fork 603
Open
Description
The NVFP4 Pretraining paper observed that in cases where there is a loss gap between NVFP4 and a higher precision baseline that the gap can be closed by switching to higher precision during the last ~10% of training (See Appendix D).
Moreover, they observe that the majority of quantization error results from the forward pass.
What is the recommended way to selectively switch to higher precision only for FProp and while using NVFP4 only for backward?
Metadata
Metadata
Assignees
Labels
No labels