Merge pull request #2432 from chrismoroney/cmoroney-vllm-on-arm-last-reviewed-10-2025

jasonrandrews · web-flow · commit 80b9f8ed9bcc · 2025-10-21T08:27:41.000-05:00
Build and Run vLLM on Arm Servers LP - update dtype
diff --git a/content/learning-paths/servers-and-cloud-computing/vllm/vllm-run.md b/content/learning-paths/servers-and-cloud-computing/vllm/vllm-run.md
@@ -41,11 +41,14 @@ prompts = [
     "Write a hello world program in Rust",
 ]
 
+# Modify model here
+MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
+
 # Create a sampling params object.
 sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
 
 # Create an LLM.
-llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct", dtype="bfloat16")
+llm = LLM(model=MODEL, dtype="bfloat16")
 
 # Generate texts from the prompts. The output is a list of RequestOutput objects
 # that contain the prompt, generated text, and other information.
diff --git a/content/learning-paths/servers-and-cloud-computing/vllm/vllm-setup.md b/content/learning-paths/servers-and-cloud-computing/vllm/vllm-setup.md
@@ -8,7 +8,7 @@ layout: learningpathall
 
 ## Before you begin
 
-To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 24.04 LTS with at least 8 cores, 16GB of RAM, and 50GB of disk storage.
+To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 24.04 LTS with at least 8 cores, 16GB of RAM, and 50GB of disk storage. The instructions have been tested on an AWS Graviton3 m7g.2xlarge instance.
 
 ## What is vLLM?