Skip to content

Commit 08c1aa1

Browse files
committed
lint
Signed-off-by: Jennifer Chen <[email protected]>
1 parent 1ef040f commit 08c1aa1

File tree

2 files changed

+3
-6
lines changed

2 files changed

+3
-6
lines changed

examples/nemo_run/common/process_openscience.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@
1414
# limitations under the License.
1515

1616
import argparse
17-
import json
1817
import os
1918
from pathlib import Path
2019

examples/nemo_run/qat/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
This directory contains an end-to-end QAT Simplified Flow example using NeMo for model training. It supports both QAT with cross-entropy loss and QAD (quantization-aware distillation) with knowledge-distillation loss between the BF16 teacher and quantized student models.
1414

15-
After PTQ (post-training quantization), the quantized model may
15+
After PTQ (post-training quantization), the quantized model may
1616

1717
## Flow Stages
1818

@@ -36,7 +36,6 @@ graph TD;
3636
05_train-->07_export_hf;
3737
```
3838

39-
4039
## Usage
4140

4241
### Prerequisites
@@ -49,11 +48,11 @@ To run the example locally, launch a [NeMo container](https://catalog.ngc.nvidia
4948
- `git clone https://github.com/NVIDIA-NeMo/NeMo.git && cd NeMo && git checkout ddcb75f`
5049

5150
Example docker command:
51+
5252
```
5353
docker run -v /home/user/:/home/user/ -v /home/user/NeMo:/opt/NeMo -v /home/user/TensorRT-Model-Optimizer/modelopt/:/usr/local/lib/python3.12/dist-packages/modelopt --gpus all -it --shm-size 20g --rm nvcr.io/nvidia/nemo:25.07 bash
5454
```
5555

56-
5756
### Running the Flow Locally
5857

5958
After launching the NeMo container with the specified mounts, follow these examples to run the flow locally.
@@ -80,7 +79,6 @@ To perform QAD training, run:
8079
python qat/nemo_qat_flow.py --distill --log-dir /my/log/dir --experiment qad_experiment
8180
```
8281

83-
8482
## Supported models
8583

8684
Locally this script currently supports models that can be trained on 1 node with 8 x 80GB GPUs. On Slurm you can configure the number of nodes/gpus for training and PTQ with the following flags: `--train-nodes`, `--train-gpus`, `--ptq-gpus`.
@@ -90,10 +88,10 @@ The default configuration works on 1 node with 4 H100 GPUs for PTQ and 8 H100 GP
9088
- **Model**: Qwen3-8B
9189
- **Recipe**: qwen3_8b
9290

93-
9491
### Custom Chat Template
9592

9693
By default the script will use the model/tokenizer's chat template, which may not contain the `{% generation %}` and `{% endgeneration %}` tags around the assistant tokens which are needed to generate the assistant loss mask (see [this PR](https://github.com/huggingface/transformers/pull/30650)). To provide path to a custom chat template, use the `--chat-template <my_template.txt>` flag.
9794

9895
### Dataset limitations
96+
9997
The current QAT recipe has been tuned for the Qwen3-8B model to improve accuracy on the MMLU benchmark after PTQ degradation. QAT/QAD results are highly dependent on the specific model, dataset, and hyperparameters. There is no guarantee that the same dataset will recover the accuracy of the PTQ model. Feel free to try your own model and dataset combinations and test which combination works best.

0 commit comments

Comments
 (0)