Overview

This guide provides step-by-step instructions for compiling vLLM from source for AMD-EPYC bare metal CPU only servers.

NB this is specifically for Fedora/Rhel based systems (using dnf).

It includes instructions for building with ZenDNN support.

For Debian builds please follow this guide

1.Prerequisites (System Dependencies)

First, install all necessary system-level packages, including development tools, compilers, and required libraries.

# Install essential build tools

sudo dnf install git cmake make autoconf binutils gcc g++ pkgconf-pkg-config build-essentials

# Install required libraries for vLLM and Python

sudo dnf install numactl numactl-devel python-devel

2. Environment Setup

We will use uv for package management and a specific compiler toolset.

2.1. Install uv

# Install the uv Python package manager:

curl -LsSf https://astral.sh/uv/install.sh | sh

Note: You may need to restart your shell or source your .bashrc/.zshrc file after this step.

2.2. Create & Activate Virtual Environment

Create a new virtual environment using Python 3.12 and activate it.

# Create the environment

uv venv --python 3.12 --seed

# Activate the environment

source .venv/bin/activate

3. Build and Install Dependencies

This process requires manually building libkineto and installing a specific ZenDNN-enabled version of PyTorch.

3.1. Install libkineto

# Check out the repo and its submodules

git clone --recursive https://github.com/pytorch/kineto.git

cd kineto/libkineto

# Build libkineto with cmake

mkdir build && cd build

cmake ..

make

sudo make install

# Return to your original project directory

cd ../../..

3.2. Install ZenDNN-enabled PyTorch

# Uninstall any existing zentorch

uv pip uninstall zentorch

# Install ZenDNN-enabled PyTorch (v2.6.0)

uv pip install torch==2.9.1 --index-url https://download.pytorch.org/whl/cpu

# Install ZenDNN

uv pip install zentorch==5.1.0

4. Build vLLM from Source

Now, we can clone and build the vLLM project itself. (N.B. for ZenDNN 5.1.0 we need to use vLLM 0.9.2)

# Clone the vLLM repository

git clone https://github.com/vllm-project/vllm.git vllm_source

cd vllm_source

git checkout v0.13.0

# Install vLLM build-time and CPU runtime dependencies

uv pip install -r requirements/cpu-build.txt --torch-backend cpu --index-strategy unsafe-best-match

uv pip install -r requirements/cpu.txt --torch-backend cpu --index-strategy unsafe-best-match

# Build vLLM, targeting the CPU

VLLM_TARGET_DEVICE=cpu uv pip install . --no-build-isolation

# if you get aimv2 error already in use install the following

uv pip install "transformers<4.54.0"

5. Runtime Configuration

Before running the vLLM server, you must export several environment variables to configure ZenDNN and vLLM performance.

# ZenDNN settings

export TORCHINDUCTOR_FREEZING=0 
export ZENTORCH_LINEAR=1 
export USE_ZENDNN_MATMUL_DIRECT=1 
export USE_ZENDNN_SDPA_MATMUL_DIRECT=1 
export ZENDNNL_MATMUL_WEIGHT_CACHE=1 
export ZENDNNL_MATMUL_ALGO=1

# vLLM CPU settings
export VLLM_CPU_KVCACHE_SPACE=90        # GB for KV cache
export VLLM_CPU_OMP_THREADS_BIND=0-95   # CPU cores to use
export HUGGING_FACE_HUB_TOKEN=$(cat ~/.cache/huggingface/token)
export VLLM_PLUGINS="torch==2.9.1"

6. Usage Example

Finally, we can run the vLLM server. Ensure your environment variables (from Step 5) are set in your current shell.

N.B. - There is a script provided in this repo (scripts/launch.sh) that handles all the envars and launches vllm

# parameters  provided by AMD engineers
# serve:
vllm serve meta-llama/Llama-3.2-1B-Instruct --dtype=bfloat16 --trust_remote_code --host 0.0.0.0 --port 8000 --max-log-len 0 --max-num-seqs 256 --enable-chunked-prefil --enable-prefix-caching 

# test:
vllm bench serve --dataset-name random --model meta-llama/Llama-3.2-1B-Instruct --host 0.0.0.0 --port 8000 --num-prompts 100 --random-prefix-len 512 --random-input-len 512 --random-output-len 512

7. Use GuideLLM for benchmarking

guidellm benchmark --target http://<host>/v1 --model meta-llama/Llama-3.2-1B-Instruct --data "prompt_tokens=128,output_tokens=256" --rate-type sweep --max-seconds 90

8. Curl Test

curl -X POST http://<route-url>/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "meta-llama/Llama-3.2-1B-Instruct",
    "messages": [
      {
        "role": "user",
        "content": "<prompt>"
      }
    ]
  }'

9. Fine tuning (optional)

Execute the follow commands

# interrupt noise elimination
sudo systemctl disable --now irqbalance

# frequency stability
sudo systemctl enable --now tuned
sudo tuned-adm profile throughput-performance

10. Adding libtcmalloc and libomp libraries (optional)

Install libomp

sudo dnf install llvm-toolset

For libtcmalloc you will need to build from source

git clone https://github.com/gperftools/gperftools.git
cd gperftools/
./configure
make
sudo make install

Export the libraries

export LD_PRELOAD=/usr/local/lib/libtcmalloc_minimal.so.4:/usr/lib64/libomp.so:$LD_PRELOAD

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
container		container
openshift		openshift
scripts		scripts
GuideLLM-Benchmark-Setup-and-Run-Guide.md		GuideLLM-Benchmark-Setup-and-Run-Guide.md
README.md		README.md
TEST-MATRIX.md		TEST-MATRIX.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

1.Prerequisites (System Dependencies)

2. Environment Setup

2.1. Install uv

2.2. Create & Activate Virtual Environment

3. Build and Install Dependencies

3.1. Install libkineto

3.2. Install ZenDNN-enabled PyTorch

4. Build vLLM from Source

5. Runtime Configuration

6. Usage Example

7. Use GuideLLM for benchmarking

8. Curl Test

9. Fine tuning (optional)

10. Adding libtcmalloc and libomp libraries (optional)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

1.Prerequisites (System Dependencies)

2. Environment Setup

2.1. Install uv

2.2. Create & Activate Virtual Environment

3. Build and Install Dependencies

3.1. Install libkineto

3.2. Install ZenDNN-enabled PyTorch

4. Build vLLM from Source

5. Runtime Configuration

6. Usage Example

7. Use GuideLLM for benchmarking

8. Curl Test

9. Fine tuning (optional)

10. Adding libtcmalloc and libomp libraries (optional)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages