You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Model Optimizer Changelog (Linux)
6
6
7
7
**Deprecations**
8
8
9
-
- TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``.
9
+
- TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. For performance evaluation, please use ``trtllm-bench`` directly.
10
10
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
11
11
- ``examples/vlm_eval`` as it depends on the deprecated TRT-LLM's TRT backend.
The above example perform `AutoQuantize` where the less quantization accuracy sensitive layers are quantized with `w4a8_awq` (specified by `--quant w4a8_awq`) and the more sensitive layers
204
204
are kept un-quantized such that the effective bits is 4.8 (specified by `--auto_quantize_bits 4.8`).
205
205
206
-
The example scripts above also have an additional flag `--tasks`, where the actual tasks run in the script can be customized. The allowed tasks are `build,mmlu,benchmark,lm_eval,livecodebench` specified in the script [parser](./scripts/parser.sh). The tasks combo can be specified with a comma-separated task list. Some tasks like mmlu can take a long time to run. To run lm_eval tasks, please also specify the `--lm_eval_tasks` flag with comma separated lm_eval tasks [here](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks).
206
+
The example scripts above also have an additional flag `--tasks`, where the actual tasks run in the script can be customized. The allowed tasks are `quant,mmlu,lm_eval,livecodebench` specified in the script [parser](./scripts/parser.sh). The tasks combo can be specified with a comma-separated task list. Some tasks like mmlu can take a long time to run. To run lm_eval tasks, please also specify the `--lm_eval_tasks` flag with comma separated lm_eval tasks [here](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks).
207
207
208
208
> *If GPU out-of-memory error is reported running the scripts, please try editing the scripts and reducing the max batch size to save GPU memory.*
echo"Quant $QFORMATnot supported with the TensorRT-LLM torch llmapi. Allowed values are: fp8, nvfp4, bf16, fp16, int4_awq, w4a8_awq"
195
+
echo"Quant $QFORMATspecified. Please read TensorRT-LLM quantization support matrix https://nvidia.github.io/TensorRT-LLM/features/quantization.html#quantization-in-tensorrt-llm and use TensorRT-LLM for deployment. Checkpoint export_path: $SAVE_PATH"
198
196
exit 0
199
197
fi
200
198
@@ -315,15 +313,15 @@ if [[ $TASKS =~ "livecodebench" || $TASKS =~ "simple_eval" ]]; then
315
313
pushd ../llm_eval/
316
314
317
315
if [[ $TASKS=~"livecodebench" ]];then
318
-
bash run_livecodebench.sh $MODEL_FULL_NAME$BUILD_MAX_BATCH_SIZE$BUILD_MAX_OUTPUT_LEN$PORT| tee $SAVE_PATH/livecodebench.txt
316
+
bash run_livecodebench.sh $MODEL_NAME$BUILD_MAX_BATCH_SIZE$BUILD_MAX_OUTPUT_LEN$PORT| tee $SAVE_PATH/livecodebench.txt
0 commit comments