Skip to content

Commit e66bd01

Browse files
authored
Update 3-run-inference-and-serve.md
1 parent d9fca9d commit e66bd01

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

content/learning-paths/servers-and-cloud-computing/vllm-acceleration/3-run-inference-and-serve.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,4 +147,6 @@ Extend your workflow to other models on Hugging Face that are compatible with vL
147147

148148
You can quantize and serve them using the same `quantize_vllm_models.py` recipe, just update the model name.
149149

150-
* **Connect a chat client:** Link your server with OpenAI-compatible UIs like [Open WebUI](https://github.com/open-webui/open-webui)
150+
**Connect a chat client:** Link your server with OpenAI-compatible UIs like [Open WebUI](https://github.com/open-webui/open-webui)
151+
152+
You can continue exploring how Arm’s efficiency, oneDNN+ACL acceleration, and vLLM’s dynamic batching combine to deliver fast, sustainable, and scalable AI inference on modern Arm architectures.

0 commit comments

Comments
 (0)