modify changelog and codeowern

RalphMao · RalphMao · commit d02bc95c8440 · 2025-10-07T17:06:55.000Z
Signed-off-by: Huizi Mao &lt;ralphmao95@gmail.com&gt;
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -51,3 +51,4 @@ modelopt/torch/utils @NVIDIA/modelopt-torch-utils-codeowners
 /examples/speculative_decoding @NVIDIA/modelopt-torch-speculative-codeowners
 /examples/vlm_ptq @NVIDIA/modelopt-examples-vlm-codeowners
 /examples/windows @NVIDIA/modelopt-windows-codeowners
+/examples/windows @NVIDIA/modelopt-examples-llm_ptq-codeowners
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -9,6 +9,7 @@ Model Optimizer Changelog (Linux)
 **New Features**
 
 - Add flag ``op_types_to_exclude_fp16`` in ONNX quantization to exclude ops from being converted to FP16/BF16. Alternatively, for custom TensorRT ops, this can also be done by indicating ``'fp32'`` precision in ``trt_plugins_precision``.
+- Support PTQ and fakequant in vLLM for fast evaluation of arbitrary quantization formats. See ``examples/vllm_serve`` for more details.
 
 0.37 (2025-09-xx)
 ^^^^^^^^^^^^^^^^^
diff --git a/examples/vllm_serve/README.md b/examples/vllm_serve/README.md
@@ -54,3 +54,7 @@ python convert_amax_hf2vllm.py -i <amax.pth> -o <vllm_amax.pth>
 ```
 
 Step 2: add `<vllm_amax.pth>` to `quant_config` in `vllm_serve_fakequant.py`
+
+## Know Problems
+
+1. AWQ is not yet supported in vLLM.