Skip to content

Commit 8845aba

Browse files
committed
Refactor engine_dir to checkpoint_dir in PTQ examples.
Signed-off-by: Chenjie Luo <[email protected]>
1 parent 26c203a commit 8845aba

File tree

11 files changed

+46
-192
lines changed

11 files changed

+46
-192
lines changed

CHANGELOG.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Model Optimizer Changelog (Linux)
88

99
- Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM or TensorRT docker image directly or refer to the `installation guide <https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html>`_ for more details.
1010
- Deprecated ``quantize_mode`` argument in ``examples/onnx_ptq/evaluate.py`` to support strongly typing. Use ``engine_precision`` instead.
11-
- Deprecated TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. For performance evaluation, please use ``trtllm-bench`` directly.
11+
- Deprecated TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. ``engine_dir`` is replaced with ``checkpoint_dir`` in ``examples/llm_ptq`` and ``examples/vlm_ptq``. For performance evaluation, please use ``trtllm-bench`` directly.
1212
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
1313
- Deprecated ``examples/vlm_eval`` as it depends on the deprecated TRT-LLM's TRT backend.
1414

examples/llm_eval/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ If `trust_remote_code` needs to be true, please append the command with the `--t
9393
### TensorRT-LLM
9494

9595
```sh
96-
python lm_eval_tensorrt_llm.py --model trt-llm --model_args tokenizer=<HF model folder>,engine_dir=<Quantized checkpoint dir> --tasks <comma separated tasks> --batch_size <engine batch size>
96+
python lm_eval_tensorrt_llm.py --model trt-llm --model_args tokenizer=<HF model folder>,checkpoint_dir=<Quantized checkpoint dir> --tasks <comma separated tasks> --batch_size <max batch size>
9797
```
9898

9999
## MMLU
@@ -137,10 +137,10 @@ python mmlu.py --model_name causal --model_path <HF model folder or model card>
137137
python mmlu.py --model_name causal --model_path <HF model folder or model card> --quant_cfg $MODELOPT_QUANT_CFG_TO_SEARCH --auto_quantize_bits $EFFECTIVE_BITS --batch_size 4
138138
```
139139

140-
### Evaluate the TensorRT-LLM engine
140+
### Evaluate with TensorRT-LLM
141141

142142
```bash
143-
python mmlu.py --model_name causal --model_path <HF model folder or model card> --engine_dir <Quantized checkpoint dir>
143+
python mmlu.py --model_name causal --model_path <HF model folder or model card> --checkpoint_dir <Quantized checkpoint dir>
144144
```
145145

146146
## MT-Bench
@@ -160,7 +160,7 @@ bash run_fastchat.sh -h <HF model folder or model card>
160160
bash run_fastchat.sh -h <HF model folder or model card> --quant_cfg MODELOPT_QUANT_CFG
161161
```
162162

163-
### Evaluate the TensorRT-LLM engine
163+
### Evaluate with TensorRT-LLM
164164

165165
```bash
166166
bash run_fastchat.sh -h <HF model folder or model card> <Quantized checkpoint dir>

examples/llm_eval/gen_model_answer.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ def run_eval(
118118
max_gpu_memory,
119119
dtype,
120120
revision,
121-
engine_dir,
121+
checkpoint_dir,
122122
nim_model,
123123
args,
124124
):
@@ -150,7 +150,7 @@ def run_eval(
150150
revision=revision,
151151
top_p=top_p,
152152
temperature=temperature,
153-
engine_dir=engine_dir,
153+
checkpoint_dir=checkpoint_dir,
154154
nim_model=nim_model,
155155
)
156156
for i in range(0, len(questions), chunk_size)
@@ -174,25 +174,25 @@ def get_model_answers(
174174
revision,
175175
top_p=None,
176176
temperature=None,
177-
engine_dir=None,
177+
checkpoint_dir=None,
178178
nim_model=None,
179179
):
180180
# Model Optimizer modification
181-
if engine_dir:
181+
if checkpoint_dir:
182182
tokenizer = get_tokenizer(model_path, trust_remote_code=args.trust_remote_code)
183-
if engine_dir:
183+
if checkpoint_dir:
184184
# get model type
185-
last_part = os.path.basename(engine_dir)
185+
last_part = os.path.basename(checkpoint_dir)
186186
model_type = last_part.split("_")[0]
187187
# Some models require to set pad_token and eos_token based on external config (e.g., qwen)
188188
if model_type == "qwen":
189189
tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
190190
tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)
191191

192192
assert LLM is not None, "tensorrt_llm APIs could not be imported."
193-
model = LLM(engine_dir, tokenizer=tokenizer)
193+
model = LLM(checkpoint_dir, tokenizer=tokenizer)
194194
else:
195-
raise ValueError("engine_dir is required for TensorRT LLM inference.")
195+
raise ValueError("checkpoint_dir is required for TensorRT LLM inference.")
196196
elif not nim_model:
197197
model, _ = load_model(
198198
model_path,
@@ -259,7 +259,7 @@ def get_model_answers(
259259

260260
# some models may error out when generating long outputs
261261
try:
262-
if not engine_dir:
262+
if not checkpoint_dir:
263263
output_ids = model.generate(
264264
torch.as_tensor(input_ids).cuda(),
265265
do_sample=do_sample,
@@ -427,9 +427,9 @@ def reorg_answer_file(answer_file):
427427
help="The model revision to load.",
428428
)
429429
parser.add_argument(
430-
"--engine-dir",
430+
"--checkpoint-dir",
431431
type=str,
432-
help="The path to the TensorRT LLM engine directory.",
432+
help="The path to the model checkpoint directory.",
433433
)
434434
parser.add_argument(
435435
"--nim-model",
@@ -502,7 +502,7 @@ def reorg_answer_file(answer_file):
502502
max_gpu_memory=args.max_gpu_memory,
503503
dtype=str_to_torch_dtype(args.dtype),
504504
revision=args.revision,
505-
engine_dir=args.engine_dir,
505+
checkpoint_dir=args.checkpoint_dir,
506506
nim_model=args.nim_model,
507507
args=args,
508508
)

examples/llm_eval/lm_eval_tensorrt_llm.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ class TRTLLM(TemplateAPI):
4242
def __init__(
4343
self,
4444
tokenizer: str,
45-
engine_dir: str,
45+
checkpoint_dir: str,
4646
batch_size: int = 1,
4747
**kwargs,
4848
):
@@ -56,11 +56,11 @@ def __init__(
5656
if self.tokenizer.pad_token_id is None:
5757
self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
5858

59-
assert isinstance(engine_dir, str)
59+
assert isinstance(checkpoint_dir, str)
6060

61-
self.llm = LLM(checkpoint_dir=engine_dir, tokenizer=self.tokenizer)
61+
self.llm = LLM(checkpoint_dir=checkpoint_dir, tokenizer=self.tokenizer)
6262
self.max_length = self.llm.max_seq_len - 1
63-
logger.info("Loaded TRT-LLM engine")
63+
logger.info("Loaded TRT-LLM")
6464

6565
def model_call(
6666
self,

examples/llm_eval/mmlu.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -252,9 +252,9 @@ def main(
252252
mto.enable_huggingface_checkpointing()
253253
model_path = kwargs["model_path"]
254254
tokenizer = get_tokenizer(model_path, trust_remote_code=kwargs.get("trust_remote_code", False))
255-
if kwargs.get("engine_dir"):
255+
if kwargs.get("checkpoint_dir"):
256256
# get model type
257-
last_part = os.path.basename(kwargs["engine_dir"])
257+
last_part = os.path.basename(kwargs["checkpoint_dir"])
258258
model_type = last_part.split("_")[0]
259259
# Some models require to set pad_token and eos_token based on external config (e.g., qwen)
260260
if model_type == "qwen":
@@ -264,7 +264,9 @@ def main(
264264
assert LLM is not None, "tensorrt_llm APIs could not be imported."
265265
medusa_choices = kwargs.get("medusa_choices")
266266
model = LLM(
267-
checkpoint_dir=kwargs["engine_dir"], tokenizer=tokenizer, medusa_choices=medusa_choices
267+
checkpoint_dir=kwargs["checkpoint_dir"],
268+
tokenizer=tokenizer,
269+
medusa_choices=medusa_choices,
268270
)
269271
else:
270272
model = select_model(

examples/llm_eval/run_fastchat.sh

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,18 +20,18 @@
2020
# If you are using NIM, ensure that you export the NIM API key using:
2121
# export OPENAI_API_KEY=<NIM_API_KEY>
2222
#
23-
# Usage: bash run_fastchat.sh -h <HF model folder or model card> -e <engine_dir> -n <NIM model model card>
23+
# Usage: bash run_fastchat.sh -h <HF model folder or model card> -e <checkpoint_dir> -n <NIM model model card>
2424
# model_name: The HuggingFace handle or folder of the model to evaluate.
25-
# engine_dir: The directory where the TRT-LLM engine is stored.
25+
# checkpoint_dir: The directory where the checkpoint is stored.
2626
# nim_model_name: The handle of the NIM model to be used for evaluation.
2727
#
2828
# Example commands:
2929
#
3030
# Evaluate "meta-llama/Meta-Llama-3-8B-Instruct" HF model:
3131
# bash run_fastchat.sh -h meta-llama/Meta-Llama-3-8B-Instruct
3232
#
33-
# Evaluate "meta-llama/Meta-Llama-3-8B-Instruct" HF model with TRT-LLM engine:
34-
# bash run_fastchat.sh -h meta-llama/Meta-Llama-3-8B-Instruct -e /path/to/engine_dir
33+
# Evaluate "meta-llama/Meta-Llama-3-8B-Instruct" HF model with TRT-LLM:
34+
# bash run_fastchat.sh -h meta-llama/Meta-Llama-3-8B-Instruct -e /path/to/checkpoint_dir
3535
#
3636
# Evaluate "meta-llama/Meta-Llama-3-8B-Instruct" HF model with NIM:
3737
# bash run_fastchat.sh -h meta-llama/Meta-Llama-3-8B-Instruct -n meta-llama/Meta-Llama-3-8B-Instruct
@@ -41,7 +41,7 @@ set -e
4141
set -x
4242

4343
hf_model_name=""
44-
engine_dir=""
44+
checkpoint_dir=""
4545
nim_model_name=""
4646
answer_file=""
4747
quant_cfg=""
@@ -56,9 +56,9 @@ while [[ "$1" != "" ]]; do
5656
shift
5757
hf_model_name=$1
5858
;;
59-
-e | --engine_dir )
59+
-e | --checkpoint_dir )
6060
shift
61-
engine_dir=$1
61+
checkpoint_dir=$1
6262
;;
6363
-n | --nim_model_name )
6464
shift
@@ -96,8 +96,8 @@ if [ "$hf_model_name" == "" ]; then
9696
exit 1
9797
fi
9898

99-
if [ "$engine_dir" != "" ]; then
100-
engine_dir=" --engine-dir $engine_dir "
99+
if [ "$checkpoint_dir" != "" ]; then
100+
checkpoint_dir=" --checkpoint-dir $checkpoint_dir "
101101
fi
102102

103103
if [ "$nim_model_name" != "" ]; then
@@ -143,7 +143,7 @@ PYTHONPATH=FastChat:$PYTHONPATH python gen_model_answer.py \
143143
--model-id $hf_model_name \
144144
--temperature 0.0001 \
145145
--top-p 0.0001 \
146-
$engine_dir \
146+
$checkpoint_dir \
147147
$nim_model_name \
148148
$answer_file \
149149
$quant_args

examples/llm_ptq/example_utils.py

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -36,16 +36,6 @@ def is_speculative(hf_config):
3636
)
3737

3838

39-
def get_mode_type_from_engine_dir(engine_dir_str):
40-
# Split the path by '/' and get the last part
41-
last_part = os.path.basename(engine_dir_str)
42-
43-
# Split the last part by '_' and get the first segment
44-
model_type = last_part.split("_")[0]
45-
46-
return model_type
47-
48-
4939
def get_tokenizer(ckpt_path, trust_remote_code=False, **kwargs):
5040
print(f"Initializing tokenizer from {ckpt_path}")
5141

examples/llm_ptq/run_tensorrt_llm.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16-
"""An example script to run the tensorrt_llm engine."""
16+
"""An example script to run the tensorrt_llm inference."""
1717

1818
import argparse
1919

@@ -28,7 +28,7 @@ def parse_arguments():
2828
parser = argparse.ArgumentParser()
2929
parser.add_argument("--tokenizer", type=str, default="")
3030
parser.add_argument("--max_output_len", type=int, default=100)
31-
parser.add_argument("--engine_dir", type=str, default="/tmp/modelopt")
31+
parser.add_argument("--checkpoint_dir", type=str)
3232
parser.add_argument(
3333
"--input_texts",
3434
type=str,
@@ -49,8 +49,8 @@ def parse_arguments():
4949

5050
def run(args):
5151
if not args.tokenizer:
52-
# Assume the tokenizer files are saved in the engine_dr.
53-
args.tokenizer = args.engine_dir
52+
# Assume the tokenizer files are saved in the checkpoint_dir.
53+
args.tokenizer = args.checkpoint_dir
5454

5555
if isinstance(args.tokenizer, PreTrainedTokenizerBase):
5656
tokenizer = args.tokenizer
@@ -66,7 +66,7 @@ def run(args):
6666

6767
print("TensorRT-LLM example outputs:")
6868

69-
llm = LLM(args.engine_dir, tokenizer=tokenizer, max_batch_size=len(input_texts))
69+
llm = LLM(args.checkpoint_dir, tokenizer=tokenizer, max_batch_size=len(input_texts))
7070
torch.cuda.cudart().cudaProfilerStart()
7171
outputs = llm.generate_text(input_texts, args.max_output_len)
7272
torch.cuda.cudart().cudaProfilerStop()

examples/llm_ptq/scripts/huggingface_example.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ if [[ $TASKS =~ "quant" ]] || [[ ! -d "$SAVE_PATH" ]] || [[ ! $(ls -A $SAVE_PATH
158158
echo "Quantized model config $MODEL_CONFIG exists, skipping the quantization stage"
159159
fi
160160

161-
# for enc-dec model, users need to refer TRT-LLM example to build engines and deployment
161+
# for enc-dec model, users need to refer TRT-LLM example for deployment
162162
if [[ -f "$SAVE_PATH/encoder/config.json" && -f "$SAVE_PATH/decoder/config.json" && ! -f $MODEL_CONFIG ]]; then
163163
echo "Please continue to deployment with the TRT-LLM enc_dec example, https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/enc_dec. Checkpoint export_path: $SAVE_PATH"
164164
exit 0
@@ -187,7 +187,7 @@ if [[ $TASKS =~ "quant" ]] || [[ ! -d "$SAVE_PATH" ]] || [[ ! $(ls -A $SAVE_PATH
187187
RUN_ARGS+=" --trust_remote_code "
188188
fi
189189

190-
python run_tensorrt_llm.py --engine_dir=$SAVE_PATH $RUN_ARGS
190+
python run_tensorrt_llm.py --checkpoint_dir=$SAVE_PATH $RUN_ARGS
191191
fi
192192

193193
if [[ -d "${MODEL_PATH}" ]]; then
@@ -229,7 +229,7 @@ if [[ $TASKS =~ "lm_eval" ]]; then
229229

230230
python lm_eval_tensorrt_llm.py \
231231
--model trt-llm \
232-
--model_args tokenizer=$MODEL_PATH,engine_dir=$SAVE_PATH,max_gen_toks=$BUILD_MAX_OUTPUT_LEN \
232+
--model_args tokenizer=$MODEL_PATH,checkpoint_dir=$SAVE_PATH,max_gen_toks=$BUILD_MAX_OUTPUT_LEN \
233233
--tasks $LM_EVAL_TASKS \
234234
--batch_size $BUILD_MAX_BATCH_SIZE $lm_eval_flags | tee $LM_EVAL_RESULT
235235

@@ -259,7 +259,7 @@ if [[ $TASKS =~ "mmlu" ]]; then
259259
python mmlu.py \
260260
--model_name causal \
261261
--model_path $MODEL_ABS_PATH \
262-
--engine_dir $SAVE_PATH \
262+
--checkpoint_dir $SAVE_PATH \
263263
--data_dir $MMLU_DATA_PATH | tee $MMLU_RESULT
264264
popd
265265

examples/vlm_ptq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Please refer to the [llm_ptq/README.md](../llm_ptq/README.md#current-out-of-the-
5656

5757
Please refer to the [llm_ptq/README.md](../llm_ptq/README.md) about the details of model quantization.
5858

59-
The following scripts provide an all-in-one and step-by-step model quantization example for Llava, VILA, Phi-3-vision and Qwen2.5-VL models. The quantization format and the number of GPUs will be supplied as inputs to these scripts. By default, we build the engine for the fp8 format and 1 GPU.
59+
The following scripts provide an all-in-one and step-by-step model quantization example for the supported Hugging Face multi-modal models. The quantization format and the number of GPUs will be supplied as inputs to these scripts.
6060

6161
### Hugging Face Example [Script](./scripts/huggingface_example.sh)
6262

0 commit comments

Comments
 (0)