| title |
|---|
Installation |
There are three ways to run vLLM Hardware Plugin for Intel® Gaudi®:
- Using Docker Compose: The easiest method that requires no image building and is supported only in 1.22 and later releases on Ubuntu. For more information and detailed instructions, see the Quick Start guide.
- Using a Dockerfile: Allows building a container with the Intel Gaudi software suite using the provided Dockerfiles, either Ubuntu-based or UBI-based.
- Building from source: Allows installing and running vLLM directly on your Intel Gaudi machine by building from source. It's supported as a standard installation and an enhanced setup with NIXL.
This guide explains how to run vLLM Hardware Plugin for Intel Gaudi from source and using a Dockerfile.
Before you start, ensure that your environment meets the following requirements:
- Python 3.10
- Intel® Gaudi® 2 or 3 AI accelerator
- Intel® Gaudi® software version {{ VERSION }} or later
Additionally, ensure that the Gaudi execution environment is properly set up. If it is not, complete the setup by using the Gaudi Installation Guide instructions.
vLLM Hardware Plugin for Intel Gaudi provides two Dockerfile options:
- Ubuntu-based Dockerfile: A setup for Ubuntu systems.
- UBI-based Dockerfile: A setup for Red Hat Enterprise Linux (RHEL) Universal Base Image (UBI) 9 environments.
Choose the option that matches your deployment environment and requirements.
Use the following commands to set up the container with the latest Intel Gaudi software suite release using the Ubuntu-based Dockerfile.
$ docker build -f .cd/Dockerfile.ubuntu.pytorch.vllm -t vllm-hpu-env .
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --entrypoint='' --rm vllm-hpu-env
!!! tip
If you are facing the following error: docker: Error response from daemon: Unknown runtime specified habana., refer to the "Install Optional Packages" section
of Install Driver and Software and "Configure Container
Runtime" section of Docker Installation.
Make sure you have habanalabs-container-runtime package installed and that habana container runtime is registered.
To achieve the best performance on HPU, please follow the methods outlined in the Optimizing Training Platform Guide.
The UBI-based Dockerfile is designed for RHEL UBI 9 environments and provides extensive customization through build arguments.
To build with default settings, run the following command from the repository root:
$ docker build -f .cd/Dockerfile.rhel.ubi.vllm -t vllm-gaudi:ubi .
The Dockerfile supports the following build arguments:
| Build argument | Default value | Description |
|---|---|---|
ARTIFACTORY_URL |
vault.habana.ai |
Intel Gaudi software repository URL. |
SYNAPSE_VERSION |
1.22.2 |
Intel Gaudi software suite version. |
SYNAPSE_REVISION |
32 |
Specific revision of the software suite. |
BASE_NAME |
rhel9.6 |
Base RHEL UBI image version. |
PT_VERSION |
2.7.1 |
PyTorch version. |
TORCH_TYPE |
upstream |
PyTorch distribution type. |
VLLM_GAUDI_COMMIT |
main |
vLLM Hardware Plugin for Intel Gaudi commit or branch. |
VLLM_PROJECT_COMMIT |
empty | Specific vLLM project commit. |
To override build arguments, use the --build-arg flag, as in this example:
$ docker build -f .cd/Dockerfile.rhel.ubi.vllm -t vllm-gaudi:ubi \
--build-arg SYNAPSE_VERSION=1.22.2 \
--build-arg SYNAPSE_REVISION=32 \
--build-arg PT_VERSION=2.7.1 \
--build-arg TORCH_TYPE=upstream \
--build-arg VLLM_GAUDI_COMMIT=main \
--build-arg VLLM_PROJECT_COMMIT= \
.
The UBI Dockerfile includes commented lines to install additional Python packages and copy scripts from .cd/templates, .cd/entrypoints, .cd/server, and .cd/benchmark into /root/scripts/. If you need these packages in your workflow, uncomment the relevant lines in .cd/Dockerfile.rhel.ubi.vllm and adjust the container ENTRYPOINT.
There are two ways to install vLLM Hardware Plugin for Intel Gaudi from source: a standard installation for typical usage, and an enhanced setup with NIXL for optimized performance with large-scale or distributed inference.
-
Verify that the Intel Gaudi software was correctly installed.
$ hl-smi # verify that hl-smi is in your PATH and each Gaudi accelerator is visible $ apt list --installed | grep habana # verify that habanalabs-firmware-tools, habanalabs-graph, habanalabs-rdma-core, habanalabs-thunk and habanalabs-container-runtime are installed $ pip list | grep habana # verify that habana-torch-plugin, habana-torch-dataloader, habana-pyhlml and habana-media-loader are installed $ pip list | grep neural # verify that neural-compressor is installedFor more information about verification, see System Verification and Final Tests.
-
Run the latest Docker image from the Intel Gaudi vault as in the following code sample. Make sure to provide your versions of vLLM Hardware Plugin for Intel Gaudi, operating system, and PyTorch. Ensure that these versions are supported, according to the Support Matrix.
docker pull vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latestFor more information, see the Intel Gaudi documentation.
-
Get the last verified vLLM commit. While vLLM Hardware Plugin for Intel Gaudi follows the latest vLLM commits, upstream API updates may introduce compatibility issues. The saved commit has been thoroughly validated.
git clone https://github.com/vllm-project/vllm-gaudi cd vllm-gaudi export VLLM_COMMIT_HASH=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null) cd .. -
Install vLLM using
pipor build it from source.# Build vLLM from source for empty platform, reusing existing torch installation git clone https://github.com/vllm-project/vllm cd vllm git checkout $VLLM_COMMIT_HASH pip install -r <(sed '/^torch/d' requirements/build.txt) VLLM_TARGET_DEVICE=empty pip install --no-build-isolation -e . cd .. -
Install vLLM Hardware Plugin for Intel Gaudi from source.
cd vllm-gaudi pip install -e . cd ..
To achieve the best performance on HPU, please follow the methods outlined in the Optimizing Training Platform Guide.
Verify that the Intel Gaudi software was correctly installed.
$ hl-smi # verify that hl-smi is in your PATH and each Gaudi accelerator is visible
$ apt list --installed | grep habana # verify that habanalabs-firmware-tools, habanalabs-graph, habanalabs-rdma-core, habanalabs-thunk and habanalabs-container-runtime are installed
$ pip list | grep habana # verify that habana-torch-plugin, habana-torch-dataloader, habana-pyhlml and habana-media-loader are installed
$ pip list | grep neural # verify that neural-compressor is installed
For more information about verification, see [System Verification and Final Tests](https://docs.habana.ai/en/latest/Installation_Guide/System_Verification_and_Final_Tests.html).
To install vLLM Hardware Plugin for Intel Gaudi and NIXL using a Dockerfile:
git clone https://github.com/vllm-project/vllm-gaudi
docker build -t ubuntu.pytorch.vllm.nixl.latest \
-f vllm-gaudi/.cd/Dockerfile.ubuntu.pytorch.vllm.nixl.latest vllm-gaudi
docker run -it --rm --runtime=habana \
--name=ubuntu.pytorch.vllm.nixl.latest \
--network=host \
-e HABANA_VISIBLE_DEVICES=all \
ubuntu.pytorch.vllm.nixl.latest /bin/bash
-
Get the last verified vLLM commit. While vLLM Hardware Plugin for Intel Gaudi follows the latest vLLM commits, upstream API updates may introduce compatibility issues. The saved commit has been thoroughly validated
git clone https://github.com/vllm-project/vllm-gaudi cd vllm-gaudi export VLLM_COMMIT_HASH=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null) -
Build vLLM from source for empty platform, reusing existing torch installation.
cd .. git clone https://github.com/vllm-project/vllm cd vllm git checkout $VLLM_COMMIT_HASH pip install -r <(sed '/^torch/d' requirements/build.txt) VLLM_TARGET_DEVICE=empty pip install --no-build-isolation -e . cd .. -
Install vLLM Hardware Plugin for Intel Gaudi from source.
cd vllm-gaudi pip install -e . -
Build NIXL.
python install_nixl.py
To achieve the best performance on HPU, please follow the methods outlined in the Optimizing Training Platform Guide.