Skip to content

Commit 1b55630

Browse files
cjluo-nvkevalmorabia97
authored andcommitted
Reinstate int8_sq support for vlm_example. (#333)
Signed-off-by: Chenjie Luo <[email protected]>
1 parent a78b2e1 commit 1b55630

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

CHANGELOG.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ Model Optimizer Changelog (Linux)
99
- Deprecated ``quantize_mode`` argument in ``examples/onnx_ptq/evaluate.py`` to support strongly typing. Use ``engine_precision`` instead.
1010
- Deprecated TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. For performance evaluation, please use ``trtllm-bench`` directly.
1111
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
12-
- ``int8_sq`` quantization format is deprecated from the ``examples/vlm_ptq`` with respect to the TensorRT-LLM's torch backend switch. Please refer to the previous releases if this quantization format is needed.
1312
- Deprecated ``examples/vlm_eval`` as it depends on the deprecated TRT-LLM's TRT backend.
1413

1514
**New Features**

examples/vlm_ptq/scripts/huggingface_example.sh

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,10 @@ if [ -z "$MODEL_PATH" ]; then
3535
fi
3636

3737
case $QFORMAT in
38-
fp8|int4_awq|w4a8_awq|nvfp4)
38+
fp8|int8_sq|int4_awq|w4a8_awq|nvfp4)
3939
;;
4040
*)
41-
echo "Unknown quant argument: Expected one of: [fp8, int4_awq, w4a8_awq, nvfp4]" >&2
41+
echo "Unknown quant argument: Expected one of: [fp8, int8_sq, int4_awq, w4a8_awq, nvfp4]" >&2
4242
exit 1
4343
esac
4444

@@ -95,6 +95,8 @@ if [[ $TASKS =~ "quant" ]] || [[ ! -d "$SAVE_PATH" ]] || [[ ! $(ls -A $SAVE_PATH
9595
--qformat=$QFORMAT \
9696
--calib_size=$CALIB_SIZE \
9797
--batch_size=$CALIB_BATCH_SIZE \
98+
--inference_tensor_parallel=$TP \
99+
--inference_pipeline_parallel=$PP \
98100
$PTQ_ARGS
99101
else
100102
echo "Quantized model config $MODEL_CONFIG exists, skipping the quantization stage"

0 commit comments

Comments
 (0)