Skip to content

Commit be355d1

Browse files
committed
Update doc
Signed-off-by: Chenjie Luo <[email protected]>
1 parent 888a89e commit be355d1

File tree

3 files changed

+14
-5
lines changed

3 files changed

+14
-5
lines changed

examples/llm_eval/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ If `trust_remote_code` needs to be true, please append the command with the `--t
9393
### TensorRT-LLM
9494

9595
```sh
96-
python lm_eval_tensorrt_llm.py --model trt-llm --model_args tokenizer=<HF model folder>,engine_dir=<TRT LLM engine dir> --tasks <comma separated tasks> --batch_size <engine batch size>
96+
python lm_eval_tensorrt_llm.py --model trt-llm --model_args tokenizer=<HF model folder>,engine_dir=<Quantized checkpoint dir> --tasks <comma separated tasks> --batch_size <engine batch size>
9797
```
9898

9999
## MMLU
@@ -140,7 +140,7 @@ python mmlu.py --model_name causal --model_path <HF model folder or model card>
140140
### Evaluate the TensorRT-LLM engine
141141

142142
```bash
143-
python mmlu.py --model_name causal --model_path <HF model folder or model card> --engine_dir <built TensorRT-LLM folder>
143+
python mmlu.py --model_name causal --model_path <HF model folder or model card> --engine_dir <Quantized checkpoint dir>
144144
```
145145

146146
## MT-Bench
@@ -163,7 +163,7 @@ bash run_fastchat.sh -h <HF model folder or model card> --quant_cfg MODELOPT_QUA
163163
### Evaluate the TensorRT-LLM engine
164164

165165
```bash
166-
bash run_fastchat.sh -h <HF model folder or model card> <built TensorRT-LLM folder>
166+
bash run_fastchat.sh -h <HF model folder or model card> <Quantized checkpoint dir>
167167
```
168168

169169
### Judging the responses

examples/llm_ptq/hf_ptq.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -696,6 +696,13 @@ def output_decode(generated_ids, input_shape):
696696
choices=KV_QUANT_CFG_CHOICES.keys(),
697697
help="Specify KV cache quantization format, default to fp8 if not provided",
698698
)
699+
parser.add_argument(
700+
"--export_fmt",
701+
required=False,
702+
default="hf",
703+
choices=["tensorrt_llm", "hf"],
704+
help="Deprecated. Please avoid using this argument.",
705+
)
699706
parser.add_argument(
700707
"--trust_remote_code",
701708
help="Set trust_remote_code for Huggingface models and tokenizers",
@@ -749,6 +756,9 @@ def output_decode(generated_ids, input_shape):
749756

750757
args = parser.parse_args()
751758

759+
if args.export_fmt != "hf":
760+
warnings.warn("Deprecated. --export_fmt will be ignored.")
761+
752762
args.dataset = args.dataset.split(",") if args.dataset else None
753763
args.calib_size = [int(num_sample) for num_sample in args.calib_size.split(",")]
754764
main(args)

examples/llm_sparsity/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,5 +148,4 @@ python export_trtllm_ckpt.py --model_name_or_path meta-llama/Llama-2-7b-hf \
148148

149149
## Build TensorRT-LLM Engine
150150

151-
For guidance on how to build TensorRT-LLM engines, please refer to [link](../llm_ptq/README.md#TensorRT-LLM-Engine-Build).
152-
To validate the built TensorRT-LLM engines, please follow the instructions at [link](../llm_ptq/README.md#TensorRT-LLM-Engine-Validation).
151+
For guidance on how to build TensorRT-LLM engines, please refer to [link](https://nvidia.github.io/TensorRT-LLM/commands/trtllm-build.html#trtllm-build) and use the `--weight_sparsity` flag.

0 commit comments

Comments
 (0)