[TRTLLM-9092][doc] Add a pre-quantized example in quick start guide (#9223)

QiJune · dominicshanshan · commit 5342ca9372bb · 2025-11-27T17:41:28.000-08:00
Signed-off-by: junq &lt;22017000+QiJune@users.noreply.github.com&gt;
Signed-off-by: Mike Iovine &lt;6158008+mikeiovine@users.noreply.github.com&gt;
Signed-off-by: Mike Iovine &lt;miovine@nvidia.com&gt;
Signed-off-by: Wangshanshan &lt;30051912+dominicshanshan@users.noreply.github.com&gt;
diff --git a/docs/source/quick-start-guide.md b/docs/source/quick-start-guide.md
@@ -24,6 +24,13 @@ To start the server, you can run a command like the following example inside a D
 trtllm-serve "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
 ```
 
+You may also deploy pre-quantized models to improve performance.
+Ensure your GPU supports FP8 quantization before running the following:
+
+```bash
+trtllm-serve "nvidia/Qwen3-8B-FP8"
+```
+
 ```{note}
 If you are running `trtllm-serve` inside a Docker container, you have two options for sending API requests:
 1. Expose a port (e.g., 8000) to allow external access to the server from outside the container.