Skip to content

Commit e6f3ed7

Browse files
committed
add results to readme
Signed-off-by: Jennifer Chen <[email protected]>
1 parent 51d14be commit e6f3ed7

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

examples/nemo_run/qat/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,16 @@ graph TD;
3838
05_train-->07_export_hf;
3939
```
4040

41+
## Results
42+
43+
QAT of Qwen3-8B NVFP4 recovers most of the accuracy on the MMLU benchmark after NVFP4 PTQ. We finetune the Qwen3-8B NVFP4 checkpoint for 200 steps with a learning rate of 1e-5 and global batch size of 512.
44+
45+
| | MMLU 5% |
46+
|---------------------------|---------|
47+
| Qwen3-8B FP16 | 73.8 |
48+
| Qwen3-8B NVFP4 | 70.3 |
49+
| Qwen3-8B NVFP4 after QAT | 72.8 |
50+
4151
## Usage
4252

4353
### Prerequisites
@@ -92,6 +102,10 @@ The default configuration works on 1 node with 4 H100 GPUs for PTQ and 8 H100 GP
92102
- **Model**: Qwen3-8B
93103
- **Recipe**: qwen3_8b
94104

105+
### Common Errors
106+
107+
Depending on the amount of memory your GPUs have, you may get an Out of Memory error. If that happens, add flags for `--tensor_parallelism` or `--pipeline_parallelism` (e.g. `--tensor_parallelism 2`).
108+
95109
### Custom Chat Template
96110

97111
By default the script will use the model/tokenizer's chat template, which may not contain the `{% generation %}` and `{% endgeneration %}` tags around the assistant tokens which are needed to generate the assistant loss mask (see [this PR](https://github.com/huggingface/transformers/pull/30650)). To provide path to a custom chat template, use the `--chat-template <my_template.txt>` flag.

0 commit comments

Comments
 (0)