Skip to content

Commit fdc3c72

Browse files
authored
Update _index.md
1 parent 6b13787 commit fdc3c72

File tree

1 file changed

+5
-2
lines changed
  • content/learning-paths/servers-and-cloud-computing/vllm-acceleration

1 file changed

+5
-2
lines changed

content/learning-paths/servers-and-cloud-computing/vllm-acceleration/_index.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
---
22
title: High throughput LLM serving using vLLM on Arm Servers
33

4+
draft: true
5+
cascade:
6+
draft: true
7+
48
minutes_to_complete: 60
59

610
who_is_this_for: This learning path is for software developers and AI engineers who want to build an optimized vLLM for Arm servers, quantize models to INT4, and serve them through an OpenAI‑compatible API.
@@ -53,8 +57,7 @@ further_reading:
5357
link: /learning-paths/servers-and-cloud-computing/vllm/
5458
type: website
5559

56-
### Notes
57-
This path focuses on CPU inference on Arm servers using an optimized vLLM build with oneDNN and the Arm Compute Library (ACL), 4‑bit quantization accelerated by Arm KleidiAI microkernels, and OpenAI‑compatible serving. You can apply these steps to many LLMs; the examples use `deepseek-ai/DeepSeek-V2-Lite` for concreteness. As vLLM’s CPU support matures, manual builds will be replaced by a simple `pip install` flow.
60+
5861

5962
### FIXED, DO NOT MODIFY
6063
# ================================================================================

0 commit comments

Comments
 (0)