Skip to content

Commit d02bc95

Browse files
committed
modify changelog and codeowern
Signed-off-by: Huizi Mao <[email protected]>
1 parent 99e6f76 commit d02bc95

File tree

3 files changed

+6
-0
lines changed

3 files changed

+6
-0
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,4 @@ modelopt/torch/utils @NVIDIA/modelopt-torch-utils-codeowners
5151
/examples/speculative_decoding @NVIDIA/modelopt-torch-speculative-codeowners
5252
/examples/vlm_ptq @NVIDIA/modelopt-examples-vlm-codeowners
5353
/examples/windows @NVIDIA/modelopt-windows-codeowners
54+
/examples/windows @NVIDIA/modelopt-examples-llm_ptq-codeowners

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Model Optimizer Changelog (Linux)
99
**New Features**
1010

1111
- Add flag ``op_types_to_exclude_fp16`` in ONNX quantization to exclude ops from being converted to FP16/BF16. Alternatively, for custom TensorRT ops, this can also be done by indicating ``'fp32'`` precision in ``trt_plugins_precision``.
12+
- Support PTQ and fakequant in vLLM for fast evaluation of arbitrary quantization formats. See ``examples/vllm_serve`` for more details.
1213

1314
0.37 (2025-09-xx)
1415
^^^^^^^^^^^^^^^^^

examples/vllm_serve/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,7 @@ python convert_amax_hf2vllm.py -i <amax.pth> -o <vllm_amax.pth>
5454
```
5555

5656
Step 2: add `<vllm_amax.pth>` to `quant_config` in `vllm_serve_fakequant.py`
57+
58+
## Know Problems
59+
60+
1. AWQ is not yet supported in vLLM.

0 commit comments

Comments
 (0)