Skip to content

Commit 5342ca9

Browse files
QiJunedominicshanshan
authored andcommitted
[TRTLLM-9092][doc] Add a pre-quantized example in quick start guide (#9223)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
1 parent 9eeb0b6 commit 5342ca9

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

docs/source/quick-start-guide.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,13 @@ To start the server, you can run a command like the following example inside a D
2424
trtllm-serve "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
2525
```
2626

27+
You may also deploy pre-quantized models to improve performance.
28+
Ensure your GPU supports FP8 quantization before running the following:
29+
30+
```bash
31+
trtllm-serve "nvidia/Qwen3-8B-FP8"
32+
```
33+
2734
```{note}
2835
If you are running `trtllm-serve` inside a Docker container, you have two options for sending API requests:
2936
1. Expose a port (e.g., 8000) to allow external access to the server from outside the container.

0 commit comments

Comments
 (0)