Skip to content

Commit 14dad2b

Browse files
committed
changelog update
Signed-off-by: Suguna Velury <[email protected]>
1 parent f99e2fb commit 14dad2b

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

CHANGELOG.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,12 @@ Model Optimizer Changelog (Linux)
3535
- Add support for ``torch.compile`` and benchmarking in ``examples/diffusers/quantization/diffusion_trt.py``.
3636
- Enabled native Modelopt quantization support for FP8 and NVFP4 formats in SGLang. See `SGLang quantization documentation <https://github.com/sgl-project/sglang/blob/main/docs/advanced_features/quantization.md#using-nvidia-modelopt>`_ for more details.
3737
- Added modelopt quantized checkpoints in vLLM/SGLang CI/CD pipelines (PRs are under review).
38+
- Add support for exporting QLoRA checkpoint fintuned using ModelOpt.
3839

3940
**Documentation**
4041

4142
- Add general guidelines for Minitron pruning and distillation. See `examples/pruning/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/pruning#pruning-guidelines>`_ for more details.
43+
- Added example for exporting QLoRA checkpoint for vLLM deployment. Refer to `examples/llm_qat/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/79ef31bc7269ba4da0cfab446da5b64509cbfcef/examples/llm_qat/README.md#qlora-deployment>`_ for more details
4244

4345
0.37 (2025-10-08)
4446
^^^^^^^^^^^^^^^^^

examples/llm_qat/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,8 @@ To deploy with vLLM, run the following command. For more details about QLoRA dep
360360
vllm serve llama3-fp4-qlora-hf/base_model --enable-lora --lora-modules adapter=llama3-fp4-qlora-hf --port 8000 --tokenizer llama3-fp4-qlora-hf
361361
```
362362

363+
> _Note: We currently do not support export option for QLoRA models generated using FSDP2._
364+
>
363365
## Pre-Quantized Checkpoints
364366

365367
- Ready-to-deploy checkpoints \[[🤗 Hugging Face - Nvidia TensorRT Model Optimizer Collection](https://huggingface.co/collections/nvidia/inference-optimized-checkpoints-with-model-optimizer)\]

0 commit comments

Comments
 (0)