Add QAT Walkthrough Notebook example #278

farshadghodsian · 2025-08-29T20:55:29Z

What does this PR do?

Type of change: New example

Overview:
Adding a QAT Jupyter Notebook example that walks user through how to apply Quantization Aware Training (QAT) to an LLM, Meta's Llama-3.1-8b, and serve it via TensorRT-LLM Docker container.

Usage

See new QAT Walkthrough Notebook in ./examples/llm_qat/notebooks for usage instructions

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

New Features
- End-to-end Quantization Aware Training walkthrough for large models: NVFP4 calibration, quantization, QAT training, checkpointing, export, and TensorRT-LLM deployment with example inference.
Documentation
- Step-by-step notebook covering prerequisites, model/dataset setup, training/calibration/quantization workflow, sample outputs, config notes, and Docker-based deployment/serving with an example request.
Chores
- Added notebook dependencies: ipywidgets, nvidia-modelopt[all], and trl.

copy-pr-bot · 2025-08-29T20:55:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

examples/llm_qat/notebooks/QAT_Walkthrough.ipynb

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (4)

examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb (4)

346-355: Pass tokenizer via tokenizer= or rely on auto-detection; don’t use processing_class for tokenizers.

Keeps behavior aligned with TRL expectations and avoids subtle tokenization issues.

 trainer = SFTTrainer(
     model=model,
     args=training_args,
     train_dataset=dataset[script_args.dataset_train_split],
     eval_dataset=dataset[script_args.dataset_test_split],
-    processing_class=tokenizer,
+    tokenizer=tokenizer,  # or remove entirely to rely on auto-detection
 )

26-41: Fix “Dependancies” typos and align dependency list with requirements.

Multiple typos and omissions (datasets/accelerate/peft). Also tweak the path sentence.

-## Installing Prerequisites and Dependancies
+## Installing Prerequisites and Dependencies
-If you haven't already, install the required dependencies for this notebook. Key dependancies include:
+If you haven't already, install the required dependencies for this notebook. Key dependencies include:
 - nvidia-modelopt
 - torch
 - transformers
-- jupyterlab
+- datasets
+- accelerate
+- peft
+- jupyterlab
-
-This repo contains a `examples/llm_qat/notebooks/requirements.txt` file that can be used to install all required dependancies.
+This repository contains `examples/llm_qat/notebooks/requirements.txt` to install all required dependencies.

374-391: Calibration forward loop should be eval+no-grad and move tensors to model device.

Prevents unnecessary grad tracking and device mismatch errors.

 def forward_loop(model):
-    for data in data_loader:
-        model(**data)
+    model.eval()
+    # best-effort device selection
+    try:
+        device = next(model.parameters()).device
+    except StopIteration:
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    with torch.inference_mode():
+        for data in data_loader:
+            data = {k: (v.to(device) if hasattr(v, "to") else v) for k, v in data.items()}
+            _ = model(**data)

16-19: QAT is training-time (not post-training) + grammar fix.

Clarify definition and fix subject-verb and tense issues.

-**Quantization Aware Training (QAT)** is a method that learn the effects of quantization during neural network post-training to preserve accuracy when deploying models in very-low-precision formats. QAT inserts quantizer nodes into the computational graph, mimicking the rounding and clamping operations that occur during actual quantization. This allows the model to adapt its weights and activations to mitigate accuracy loss.
-
-This notebook demonstrates how to apply Quantization Aware Training (QAT) to an LLM, Qwen3-8b in this example, with NVIDIA's TensorRT Model Optimizer (ModelOpt) QAT toolkit. We walk through downloading and loading the model, calibrates on a small eval subset, applying NVFP4 quantization and finally deploying the quantized model to TensorRT-LLM.
+**Quantization Aware Training (QAT)** simulates quantization during training (not post‑training) so the model adapts to low‑precision rounding and clamping, preserving accuracy at deployment.
+
+This notebook applies QAT to Qwen/Qwen3‑8B using NVIDIA’s TensorRT Model Optimizer (ModelOpt) QAT toolkit. We walk through downloading and loading the model, calibrating on a small eval subset, applying NVFP4 quantization, and finally deploying the quantized model to TensorRT‑LLM.

🧹 Nitpick comments (5)

examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb (5)

611-617: Make the Docker run command more portable.

Mount the notebook requirements dir explicitly; also keep the image tag as a placeholder to avoid encouraging RCs.

-docker run --rm --ipc=host -it \
+docker run --rm --ipc=host -it \
   --ulimit stack=67108864   --ulimit memlock=-1 \
   --gpus all   -p 8000:8000   -e TRTLLM_ENABLE_PDL=1 \
   -v ~/.cache:/root/.cache:rw --name tensorrt_llm \
   -v $(pwd)/qwen3-8b-qat-multilingual-reasoner/:/app/tensorrt_llm/qat \
-  nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2  /bin/bash
+  nvcr.io/nvidia/tensorrt-llm/release:<LATEST_TAG> /bin/bash

713-718: Parameterize tensor/pipeline parallelism or document GPU requirement.

--tp_size 8 will fail on machines with <8 visible GPUs.

-trtllm-serve /app/tensorrt_llm/saved_models_checkpoint-450_nvfp4_hf/  \
-  --max_batch_size 1 --max_num_tokens 1024 \
-  --max_seq_len 4096 --tp_size 8 --pp_size 1 \
+trtllm-serve /app/tensorrt_llm/saved_models_checkpoint-450_nvfp4_hf/  \
+  --max_batch_size 1 --max_num_tokens 1024 \
+  --max_seq_len 4096 --tp_size ${TP_SIZE:-1} --pp_size ${PP_SIZE:-1} \
   --host 0.0.0.0 --port 8000 \
   --kv_cache_free_gpu_memory_fraction 0.95
# Note: set TP_SIZE/PP_SIZE according to available GPUs and engine build.

774-786: Align “model” field in curl with the served model name.

Reduce confusion by matching the folder name (or use “default” if the server ignores it).

-curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
-    "model": "Qwen3/qwen3-8b-qat-multilingual-reasoner",
+curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
+    "model": "saved_models_checkpoint-450_nvfp4_hf",
     "messages": [
         {
             "role": "user",
             "content": "What is NVIDIAs advantage for inference?"
         }
     ],
     "max_tokens": 1024,
     "top_p": 0.9
 }' -w "\n"

205-212: Optional: pass use_fast=True when available.

Tokenizer perf can improve with fast tokenizers.

-tokenizer = AutoTokenizer.from_pretrained(
-    model_args.model_name_or_path,
-)
+tokenizer = AutoTokenizer.from_pretrained(
+    model_args.model_name_or_path,
+    use_fast=True,
+)

401-411: Quantization call: clarify that some configs don’t need calibration.

Minor doc tweak to reduce confusion about when forward_loop is used.

-mtq.quantize(model, quantization_config, forward_loop)
+mtq.quantize(model, quantization_config, forward_loop)  # pass forward_loop only for configs that require calibration

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 374d9a6 and a531dcb.

📒 Files selected for processing (2)

examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb (1 hunks)
examples/llm_qat/notebooks/requirements.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/llm_qat/notebooks/requirements.txt

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-05T21:48:21.342Z

Learnt from: farshadghodsian
PR: NVIDIA/TensorRT-Model-Optimizer#278
File: examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb:346-355
Timestamp: 2025-09-05T21:48:21.342Z
Learning: SFTTrainer from the TRL library can automatically detect and use tokenizers when they are already present in the model directory or model configuration, making explicit tokenizer parameter specification optional in such cases.

Applied to files:

examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb

🔇 Additional comments (1)

examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb (1)
286-301: Use correct SFTConfig parameter names
Replace the invalid max_length with max_seq_length and eval_strategy with evaluation_strategy in your SFTConfig call.
 training_args = SFTConfig(
     output_dir="qwen3-8b-qat-multilingual-reasoner",
     num_train_epochs=1,
     learning_rate=2e-5,
     per_device_train_batch_size=1,
     per_device_eval_batch_size=1,
     gradient_accumulation_steps=2,
-    max_length=4096,
+    max_seq_length=4096,
     warmup_ratio=0.03,
-    eval_strategy="steps",
+    evaluation_strategy="steps",
     eval_on_start=True,
     logging_steps=50,
     save_steps=450,
     eval_steps=50,
     save_total_limit=2,
 )
(max_seq_length is the supported truncation parameter in SFTConfig) (huggingface.co)
(use evaluation_strategy to set evaluation intervals) (huggingface.co)

examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb

kevalmorabia97 · 2025-09-06T05:18:01Z

@farshadghodsian your commits are still not verified with an ssh key. Please refer to the steps here: https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=contributing-ov-file#%EF%B8%8F-signing-your-work

farshadghodsian · 2025-09-08T16:40:10Z

@farshadghodsian your commits are still not verified with an ssh key. Please refer to the steps here: https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=contributing-ov-file#%EF%B8%8F-signing-your-work

I see my issue now. I forgot to add my signing key to my Github account. My commits should now be verified. ✅

codecov · 2025-09-08T17:02:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.88%. Comparing base (358b0c6) to head (4926fa7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #278   +/-   ##
=======================================
  Coverage   73.88%   73.88%           
=======================================
  Files         172      172           
  Lines       17444    17444           
=======================================
  Hits        12888    12888           
  Misses       4556     4556

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2025-09-08T18:35:10Z

Code quality checks are failing:
https://github.com/NVIDIA/TensorRT-Model-Optimizer/actions/runs/17557846268/job/49867165107?pr=278
You can check CONTRIBUTING.md for steps to fix this

Signed-off-by: Farshad Ghodsian <[email protected]>

kevalmorabia97 · 2025-09-11T03:51:38Z

/ok to test 4926fa7

Signed-off-by: Farshad Ghodsian <[email protected]>

farshadghodsian requested a review from a team as a code owner August 29, 2025 20:55

farshadghodsian requested a review from Edwardf0t1 August 29, 2025 20:55

farshadghodsian force-pushed the QAT-Walkthrough-Notebook branch 2 times, most recently from 5d1bf5a to e587a06 Compare August 29, 2025 20:56

kevalmorabia97 requested review from realAsma and removed request for Edwardf0t1 August 30, 2025 06:12

RalphMao requested review from RalphMao and kinjalpatel27 September 2, 2025 17:20

realAsma reviewed Sep 2, 2025

View reviewed changes

examples/llm_qat/notebooks/QAT_Walkthrough.ipynb Outdated Show resolved Hide resolved

RalphMao reviewed Sep 3, 2025

View reviewed changes

examples/llm_qat/notebooks/QAT_Walkthrough.ipynb Outdated Show resolved Hide resolved

RalphMao reviewed Sep 3, 2025

View reviewed changes

examples/llm_qat/notebooks/QAT_Walkthrough.ipynb Outdated Show resolved Hide resolved

farshadghodsian force-pushed the QAT-Walkthrough-Notebook branch 4 times, most recently from 1b37baa to 01b1a65 Compare September 3, 2025 23:17

farshadghodsian requested review from a team as code owners September 3, 2025 23:17

farshadghodsian force-pushed the QAT-Walkthrough-Notebook branch 2 times, most recently from 374d9a6 to a531dcb Compare September 5, 2025 22:03

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

farshadghodsian force-pushed the QAT-Walkthrough-Notebook branch from a531dcb to e0caa1c Compare September 6, 2025 00:49

farshadghodsian requested review from RalphMao and kevalmorabia97 September 6, 2025 00:55

realAsma approved these changes Sep 8, 2025

View reviewed changes

kevalmorabia97 approved these changes Sep 8, 2025

View reviewed changes

kevalmorabia97 enabled auto-merge (squash) September 8, 2025 16:50

kevalmorabia97 disabled auto-merge September 8, 2025 18:35

farshadghodsian force-pushed the QAT-Walkthrough-Notebook branch 5 times, most recently from 5ca089e to 0a95934 Compare September 10, 2025 21:38

RalphMao approved these changes Sep 10, 2025

View reviewed changes

farshadghodsian force-pushed the QAT-Walkthrough-Notebook branch 3 times, most recently from 2bc96c7 to 638b8dd Compare September 10, 2025 22:23

farshadghodsian added 2 commits September 10, 2025 19:20

Created a QAT Walkthrough notebook

9419544

Signed-off-by: Farshad Ghodsian <[email protected]>

Fixed linting issues with QAT Notebook

4926fa7

Signed-off-by: Farshad Ghodsian <[email protected]>

farshadghodsian force-pushed the QAT-Walkthrough-Notebook branch from 638b8dd to 4926fa7 Compare September 10, 2025 23:21

kevalmorabia97 enabled auto-merge (squash) September 11, 2025 03:37

kevalmorabia97 merged commit 76e8ce2 into NVIDIA:main Sep 11, 2025
22 checks passed

benchislett pushed a commit that referenced this pull request Sep 15, 2025

Add QAT Walkthrough Notebook example (#278)

cc40c71

Signed-off-by: Farshad Ghodsian <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add QAT Walkthrough Notebook example #278

Add QAT Walkthrough Notebook example #278

Uh oh!

farshadghodsian commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevalmorabia97 commented Sep 6, 2025

Uh oh!

farshadghodsian commented Sep 8, 2025

Uh oh!

codecov bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

kevalmorabia97 commented Sep 8, 2025

Uh oh!

kevalmorabia97 commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add QAT Walkthrough Notebook example #278

Add QAT Walkthrough Notebook example #278

Uh oh!

Conversation

farshadghodsian commented Aug 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevalmorabia97 commented Sep 6, 2025

Uh oh!

farshadghodsian commented Sep 8, 2025

Uh oh!

codecov bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 commented Sep 8, 2025

Uh oh!

kevalmorabia97 commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

farshadghodsian commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

codecov bot commented Sep 8, 2025 •

edited

Loading