File tree Expand file tree Collapse file tree 2 files changed +4
-2
lines changed Expand file tree Collapse file tree 2 files changed +4
-2
lines changed Original file line number Diff line number Diff line change @@ -48,6 +48,8 @@ QAT of Qwen3-8B NVFP4 recovers most of the accuracy on the MMLU benchmark after
48
48
| Qwen3-8B NVFP4 | 70.3 |
49
49
| Qwen3-8B NVFP4 after QAT | 72.8 |
50
50
51
+ The resulting exported checkpoint also is much smaller in memory at 6.4GB compared to the original BF16 checkpoint which is 16.4 GB.
52
+
51
53
## Usage
52
54
53
55
### Prerequisites
Original file line number Diff line number Diff line change @@ -140,7 +140,7 @@ def get_args():
140
140
action = "store_true" ,
141
141
default = False ,
142
142
)
143
- parser .add_argument ("--tensor_parallelism" , type = int , default = 1 )
143
+ parser .add_argument ("--tensor_parallelism" , type = int , default = 2 )
144
144
parser .add_argument ("--pipeline_parallelism" , type = int , default = 1 )
145
145
return parser .parse_args ()
146
146
@@ -375,7 +375,7 @@ def main(args):
375
375
SEQUENCE_LENGTH = 4096
376
376
MBS = 1
377
377
GBS = 512
378
- TRAIN_STEPS = 400
378
+ TRAIN_STEPS = 200
379
379
VAL_INTERVAL = 50
380
380
# # # # # # # # # # # # # # # # # # # # # #
381
381
You can’t perform that action at this time.
0 commit comments