Skip to content

Commit 60bf784

Browse files
committed
update readme
Signed-off-by: Jennifer Chen <[email protected]>
1 parent b4dbaac commit 60bf784

File tree

1 file changed

+26
-11
lines changed

1 file changed

+26
-11
lines changed

examples/nemo_run/qat/README.md

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,35 @@
22

33
## Overview
44

5-
This directory also contains an end-to-end NeMo QAT Simplified Flow example, which supports both QAT with cross-entropy loss and QAD (quantization-aware distillation) with knowledge-distillation loss between the full-precision teacher and quantized student models.
5+
This directory contains an end-to-end QAT Simplified Flow example using NeMo for model training. It supports both QAT with cross-entropy loss and QAD (quantization-aware distillation) with knowledge-distillation loss between the BF16 teacher and quantized student models.
66

77
## Flow Stages
88

99
Currently the Simplified Flow runs the following steps in order:
1010

1111
1. Process Nvidia/OpenScience data (if `--data-path` is not specified)
12-
1. Import NeMo model checkpoint 1. PTQ the model
12+
1. Import NeMo BF16 model checkpoint and evaluate 5% of MMLU on BF16 checkpoint
13+
1. PTQ the model and evaluate 5% of MMLU on PTQ Checkpoint
1314
1. SFT (finetune) the model
14-
1. Export model to Unified checkpoint (HuggingFace) format
15+
1. Evaluate 5% of MMLU on the SFT checkpoint
16+
1. Export model to Unified checkpoint (HuggingFace) format in lower precision
17+
18+
```mermaid
19+
graph TD;
20+
Data-->SFT;
21+
Import-->Evaluate BF16;
22+
Import-->PTQ;
23+
PTQ-->Evaluate PTQ;
24+
PTQ --> SFT;
25+
SFT-->Evaluate SFT;
26+
SFT-->Export SFT;
27+
```
1528

1629
## Supported models
1730

18-
Currently supports models that can be trained on 1 node with 8 x 80GB GPUs. The default configuration uses:
31+
Locally this script currently supports models that can be trained on 1 node with 8 x 80GB GPUs. On Slurm you can configure the number of nodes/gpus for training and PTQ with the following flags: `--train-nodes`, `--train-gpus`, `--ptq-gpus`.
32+
33+
The default configuration works on 1 node with 4 H100 GPUs for PTQ and 8 H100 GPUs for training with the following model:
1934

2035
- **Model**: Qwen3-8B
2136
- **Recipe**: qwen3_8b
@@ -24,7 +39,7 @@ Currently supports models that can be trained on 1 node with 8 x 80GB GPUs. The
2439

2540
### Prerequisites
2641

27-
You can run the example either locally (if your server has GPUs) or on a Slurm cluster.
42+
You can run the example either locally or on a Slurm cluster.
2843

2944
To run the example locally, launch a [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) with version 25.07 or higher using Docker on on a Slurm interactive node. Mount your cloned `modelopt` repository to the container by adding this mount flag to your Docker/Slurm command: `-v <modelopt-path>:/workspace/modelopt -v <modelopt-path>/modelopt:/usr/local/lib/python3.12/dist-packages/modelopt`.
3045

@@ -34,19 +49,21 @@ To run SFT properly you may also need to clone NeMo at the respective commits, a
3449

3550
To run the example on slurm, edit the `SLURM_CONFIG` at the bottom of `nemo_qat_flow.py` with the appropriate credentials, container, cluster name (host), and container mounts. Make sure you are mounting the NeMo and Megatron-LM repositories above in the Slurm cluster and that you've checked out the correct commits.
3651

52+
### Dataset limitations
53+
The current QAT recipe has been tuned for the Qwen3-8B model to improve accuracy on the MMLU benchmark after PTQ degradation. QAT/QAD results are highly dependent on the specific model, dataset, and hyperparameters. There is no guarantee that the same dataset will recover the accuracy of the PTQ model. Feel free to try your own model and dataset combinations and test which combination works best.
54+
3755
### Running the Flow Locally
3856

3957
After launching the NeMo container with the specified mounts, follow these examples to run the flow locally.
4058

4159
#### QAT
4260

43-
From the `nemo_run` folder, launch the example with `python qat/nemo_qat_flow.py --experiment qat_experiment`. To use a different model than the default model (Qwen3-8B), you can add the `--model-name <hf-model-name> --finetune-recipe <recipe-name>` flags and use the model's HuggingFace name and NeMo recipe names listed [here](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes). To provide your own custom dataset, use the `--data-path` flag, otherwise the default [NVIDIA OpenScience](https://huggingface.co/datasets/nvidia/OpenScience) dataset will be used.
61+
From the `nemo_run` folder, launch the example with the `qat/nemo_qat_flow.py` script. To use a different model than the default model (Qwen3-8B), you can add the `--model-name <hf-model-name> --finetune-recipe <recipe-name>` flags and use the model's HuggingFace name and NeMo recipe names listed [here](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes). To provide your own custom dataset, use the `--data-path` flag, otherwise the default [NVIDIA OpenScience](https://huggingface.co/datasets/nvidia/OpenScience) dataset will be used.
4462

4563
To perform QAT, run:
4664

4765
```bash
48-
python qat/nemo_qat_flow.py \
49-
--experiment qat_experiment
66+
python qat/nemo_qat_flow.py --log-dir /my/log/dir --experiment qat_experiment
5067
```
5168

5269
> **_NOTE:_** To enable KV cache quantization, add `--enable-kv-cache` and specify qformat using `--kv-cache-qformat <fp8, nvfp4>`.
@@ -58,9 +75,7 @@ In order to train using QAD, launch the example with `python qat/nemo_qat_flow.p
5875
To perform QAD training, run:
5976

6077
```bash
61-
python qat/nemo_qat_flow.py \
62-
--distill \
63-
--experiment qad_experiment
78+
python qat/nemo_qat_flow.py --distill --log-dir /my/log/dir --experiment qad_experiment
6479
```
6580

6681
### Running the Flow on Slurm

0 commit comments

Comments
 (0)