Update vLLM learning path details and resources

madeline-underwood · web-flow · commit a363c8f27648 · 2025-11-18T17:29:42.000Z
diff --git a/content/learning-paths/servers-and-cloud-computing/vllm-acceleration/_index.md b/content/learning-paths/servers-and-cloud-computing/vllm-acceleration/_index.md
@@ -1,28 +1,24 @@
 ---
-title: Optimized LLM Inference with vLLM on Arm-Based Servers
-
-draft: true
-cascade:
-    draft: true
+title: Accelerate vLLM inference on Azure Cobalt 100 virtual machines
     
 minutes_to_complete: 60
 
-who_is_this_for: This learning path is designed for software developers and AI engineers who want to build and optimize vLLM for Arm-based servers, quantize large language models (LLMs) to INT4, serve them efficiently through an OpenAI-compatible API, and benchmark model accuracy using the LM Evaluation Harness.
+who_is_this_for: This is an introductory topic for developers interested in building and optimizing vLLM for Arm-based servers. This Learning Path shows you how to quantize large language models (LLMs) to INT4, serve them efficiently using an OpenAI-compatible API, and benchmark model accuracy with the LM Evaluation Harness.
 
 learning_objectives:
-    - Build an optimized vLLM for aarch64 with oneDNN and the Arm Compute Library(ACL).
-    - Set up all runtime dependencies including PyTorch, llmcompressor, and Arm-optimized libraries.
-    - Quantize an LLM (DeepSeek‑V2‑Lite) to 4-bit integer (INT4) precision.
-    - Run and serve both quantized and BF16 (non-quantized) variants using vLLM.
-    - Use OpenAI‑compatible endpoints and understand sequence and batch limits.
-    - Evaluate accuracy using the LM Evaluation Harness on BF16 and INT4 models with vLLM.
+    - Build an optimized vLLM for aarch64 with oneDNN and the Arm Compute Library (ACL)
+    - Set up all runtime dependencies including PyTorch, llmcompressor, and Arm-optimized libraries
+    - Quantize an LLM (DeepSeek‑V2‑Lite) to 4-bit integer (INT4) precision
+    - Run and serve both quantized and BF16 (non-quantized) variants using vLLM
+    - Use OpenAI‑compatible endpoints and understand sequence and batch limits
+    - Evaluate accuracy using the LM Evaluation Harness on BF16 and INT4 models with vLLM
 
 prerequisites:
-    - An Arm-based Linux server (Ubuntu 22.04+ recommended) with a minimum of 32 vCPUs, 64 GB RAM, and 64 GB free disk space. 
-    - Python 3.12 and basic familiarity with Hugging Face Transformers and quantization.
+    - An Arm-based Linux server (Ubuntu 22.04+ recommended) with a minimum of 32 vCPUs, 64 GB RAM, and 64 GB free disk space
+    - Python 3.12 and basic familiarity with Hugging Face Transformers and quantization
 
 author:
-   - Nikhil Gupta, Pareena Verma
+   - Nikhil Gupta
 
 ### Tags
 skilllevels: Introductory
@@ -47,7 +43,7 @@ further_reading:
     - resource:
         title: vLLM GitHub Repository
         link: https://github.com/vllm-project/vllm
-        type: github
+        type: website
     - resource:
         title: Hugging Face Model Hub
         link: https://huggingface.co/models
@@ -59,7 +55,7 @@ further_reading:
     - resource:
         title: LM Evaluation Harness (GitHub)
         link: https://github.com/EleutherAI/lm-evaluation-harness
-        type: github
+        type: website