You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/nemo_run/qat/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ graph TD;
40
40
41
41
## Results
42
42
43
-
QAT of Qwen3-8B NVFP4 recovers most of the accuracy on the MMLU benchmark after NVFP4 PTQ. We finetune the Qwen3-8B NVFP4 checkpoint for 200 steps with a learning rate of 1e-5 and global batch size of 512.
43
+
QAT of Qwen3-8B NVFP4 recovers most of the accuracy on the MMLU benchmark after NVFP4 PTQ. We finetune the Qwen3-8B NVFP4 checkpoint for 200 steps with a learning rate of 1e-5 and global batch size of 512 on one node of 8 x H100 GPUs.
0 commit comments