You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/nemo_run/qat/README.md
+26-11Lines changed: 26 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,20 +2,35 @@
2
2
3
3
## Overview
4
4
5
-
This directory also contains an end-to-end NeMo QAT Simplified Flow example, which supports both QAT with cross-entropy loss and QAD (quantization-aware distillation) with knowledge-distillation loss between the full-precision teacher and quantized student models.
5
+
This directory contains an end-to-end QAT Simplified Flow example using NeMo for model training. It supports both QAT with cross-entropy loss and QAD (quantization-aware distillation) with knowledge-distillation loss between the BF16 teacher and quantized student models.
6
6
7
7
## Flow Stages
8
8
9
9
Currently the Simplified Flow runs the following steps in order:
10
10
11
11
1. Process Nvidia/OpenScience data (if `--data-path` is not specified)
12
-
1. Import NeMo model checkpoint 1. PTQ the model
12
+
1. Import NeMo BF16 model checkpoint and evaluate 5% of MMLU on BF16 checkpoint
13
+
1. PTQ the model and evaluate 5% of MMLU on PTQ Checkpoint
13
14
1. SFT (finetune) the model
14
-
1. Export model to Unified checkpoint (HuggingFace) format
15
+
1. Evaluate 5% of MMLU on the SFT checkpoint
16
+
1. Export model to Unified checkpoint (HuggingFace) format in lower precision
17
+
18
+
```mermaid
19
+
graph TD;
20
+
Data-->SFT;
21
+
Import-->Evaluate BF16;
22
+
Import-->PTQ;
23
+
PTQ-->Evaluate PTQ;
24
+
PTQ --> SFT;
25
+
SFT-->Evaluate SFT;
26
+
SFT-->Export SFT;
27
+
```
15
28
16
29
## Supported models
17
30
18
-
Currently supports models that can be trained on 1 node with 8 x 80GB GPUs. The default configuration uses:
31
+
Locally this script currently supports models that can be trained on 1 node with 8 x 80GB GPUs. On Slurm you can configure the number of nodes/gpus for training and PTQ with the following flags: `--train-nodes`, `--train-gpus`, `--ptq-gpus`.
32
+
33
+
The default configuration works on 1 node with 4 H100 GPUs for PTQ and 8 H100 GPUs for training with the following model:
19
34
20
35
-**Model**: Qwen3-8B
21
36
-**Recipe**: qwen3_8b
@@ -24,7 +39,7 @@ Currently supports models that can be trained on 1 node with 8 x 80GB GPUs. The
24
39
25
40
### Prerequisites
26
41
27
-
You can run the example either locally (if your server has GPUs) or on a Slurm cluster.
42
+
You can run the example either locally or on a Slurm cluster.
28
43
29
44
To run the example locally, launch a [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) with version 25.07 or higher using Docker on on a Slurm interactive node. Mount your cloned `modelopt` repository to the container by adding this mount flag to your Docker/Slurm command: `-v <modelopt-path>:/workspace/modelopt -v <modelopt-path>/modelopt:/usr/local/lib/python3.12/dist-packages/modelopt`.
30
45
@@ -34,19 +49,21 @@ To run SFT properly you may also need to clone NeMo at the respective commits, a
34
49
35
50
To run the example on slurm, edit the `SLURM_CONFIG` at the bottom of `nemo_qat_flow.py` with the appropriate credentials, container, cluster name (host), and container mounts. Make sure you are mounting the NeMo and Megatron-LM repositories above in the Slurm cluster and that you've checked out the correct commits.
36
51
52
+
### Dataset limitations
53
+
The current QAT recipe has been tuned for the Qwen3-8B model to improve accuracy on the MMLU benchmark after PTQ degradation. QAT/QAD results are highly dependent on the specific model, dataset, and hyperparameters. There is no guarantee that the same dataset will recover the accuracy of the PTQ model. Feel free to try your own model and dataset combinations and test which combination works best.
54
+
37
55
### Running the Flow Locally
38
56
39
57
After launching the NeMo container with the specified mounts, follow these examples to run the flow locally.
40
58
41
59
#### QAT
42
60
43
-
From the `nemo_run` folder, launch the example with `python qat/nemo_qat_flow.py --experiment qat_experiment`. To use a different model than the default model (Qwen3-8B), you can add the `--model-name <hf-model-name> --finetune-recipe <recipe-name>` flags and use the model's HuggingFace name and NeMo recipe names listed [here](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes). To provide your own custom dataset, use the `--data-path` flag, otherwise the default [NVIDIA OpenScience](https://huggingface.co/datasets/nvidia/OpenScience) dataset will be used.
61
+
From the `nemo_run` folder, launch the example with the `qat/nemo_qat_flow.py` script. To use a different model than the default model (Qwen3-8B), you can add the `--model-name <hf-model-name> --finetune-recipe <recipe-name>` flags and use the model's HuggingFace name and NeMo recipe names listed [here](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes). To provide your own custom dataset, use the `--data-path` flag, otherwise the default [NVIDIA OpenScience](https://huggingface.co/datasets/nvidia/OpenScience) dataset will be used.
0 commit comments