Skip to content

Commit f5f91ab

Browse files
committed
Updated README
Signed-off-by: Suguna Velury <[email protected]>
1 parent b81b4de commit f5f91ab

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

examples/llm_qat/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -303,12 +303,10 @@ See more details on running LLM evaluation benchmarks [here](../llm_eval/README.
303303

304304
The final model after QAT is similar in architecture to that of PTQ model. QAT model simply have updated weights as compared to the PTQ model. It can be deployed to TensorRT-LLM (TRTLLM) or to TensorRT just like a regular **ModelOpt** PTQ model if the quantization format is supported for deployment.
305305

306-
To run QAT model with TRTLLM, run:
306+
To run QAT model with vLLM/TRTLLM, run:
307307

308308
```sh
309-
cd ../llm_ptq
310-
311-
./scripts/huggingface_example.sh --model ../llm_qat/llama3-qat --quant w4a8_awq
309+
python export.py --pyt_ckpt_path llama3-qat --export_path llama3-qat-deploy
312310
```
313311

314312
Note: The QAT checkpoint for `w4a8_awq` config can be created by using `--quant_cfg W4A8_AWQ_BETA_CFG` in [QAT example](#end-to-end-qat-example).
@@ -345,6 +343,8 @@ To perform QLoRA training, run:
345343
--lora True
346344
```
347345

346+
## QLoRA deployment
347+
348348
After performing QLoRA training the final checkpoint can be exported for deployment with vLLM using the following command.
349349

350350
```sh

0 commit comments

Comments
 (0)