You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,9 @@ Model Optimizer Changelog (Linux)
6
6
7
7
**Deprecations**
8
8
- Deprecated ``quantize_mode`` argument in ``examples/onnx_ptq/evaluate.py`` to support strongly typing. Use ``engine_precision`` instead.
9
-
10
9
- TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. For performance evaluation, please use ``trtllm-bench`` directly.
11
10
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
11
+
- ``int8_sq`` quantization format is deprecated from the ``examples/vlm_ptq`` respect to the TensorRT-LLM's torch backend switch. Please refer to the previous releases if this quantization format is needed.
12
12
- ``examples/vlm_eval`` as it depends on the deprecated TRT-LLM's TRT backend.
if [[ !" fp8 nvfp4 bf16 fp16 "=~"${QFORMAT}" ]];then
195
182
echo"Quant $QFORMAT specified. Please read TensorRT-LLM quantization support matrix https://nvidia.github.io/TensorRT-LLM/features/quantization.html#quantization-in-tensorrt-llm and use TensorRT-LLM for deployment. Checkpoint export_path: $SAVE_PATH"
196
183
exit 0
197
184
fi
@@ -238,6 +225,8 @@ if [[ $TASKS =~ "lm_eval" ]]; then
238
225
239
226
pip install -r requirements.txt
240
227
228
+
echo"Using the following config: max output $BUILD_MAX_OUTPUT_LEN max batch $BUILD_MAX_BATCH_SIZE"
0 commit comments