title
Installation

Installation

There are three ways to run vLLM Hardware Plugin for Intel® Gaudi®:

Using Docker Compose: The easiest method that requires no image building and is supported only in 1.22 and later releases on Ubuntu. For more information and detailed instructions, see the Quick Start guide.
Using a Dockerfile: Allows building a container with the Intel Gaudi software suite using the provided Dockerfiles, either Ubuntu-based or UBI-based.
Building from source: Allows installing and running vLLM directly on your Intel Gaudi machine by building from source. It's supported as a standard installation and an enhanced setup with NIXL.

This guide explains how to run vLLM Hardware Plugin for Intel Gaudi from source and using a Dockerfile.

Requirements

Before you start, ensure that your environment meets the following requirements:

Python 3.10
Intel® Gaudi® 2 or 3 AI accelerator
Intel® Gaudi® software version {{ VERSION }} or later

Additionally, ensure that the Gaudi execution environment is properly set up. If it is not, complete the setup by using the Gaudi Installation Guide instructions.

Running vLLM Hardware Plugin for Intel Gaudi Using Dockerfile

--8<-- [start:docker_quickstart]

vLLM Hardware Plugin for Intel Gaudi provides two Dockerfile options:

Ubuntu-based Dockerfile: A setup for Ubuntu systems.
UBI-based Dockerfile: A setup for Red Hat Enterprise Linux (RHEL) Universal Base Image (UBI) 9 environments.

Choose the option that matches your deployment environment and requirements.

Ubuntu-based Dockerfile

Use the following commands to set up the container with the latest Intel Gaudi software suite release using the Ubuntu-based Dockerfile.

$ docker build -f .cd/Dockerfile.ubuntu.pytorch.vllm -t vllm-hpu-env  .
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --entrypoint='' --rm vllm-hpu-env

!!! tip If you are facing the following error: docker: Error response from daemon: Unknown runtime specified habana., refer to the "Install Optional Packages" section of Install Driver and Software and "Configure Container Runtime" section of Docker Installation. Make sure you have habanalabs-container-runtime package installed and that habana container runtime is registered.

To achieve the best performance on HPU, please follow the methods outlined in the Optimizing Training Platform Guide.

UBI-based Dockerfile

The UBI-based Dockerfile is designed for RHEL UBI 9 environments and provides extensive customization through build arguments.

To build with default settings, run the following command from the repository root:

$ docker build -f .cd/Dockerfile.rhel.ubi.vllm -t vllm-gaudi:ubi .

The Dockerfile supports the following build arguments:

Build argument	Default value	Description
`ARTIFACTORY_URL`	`vault.habana.ai`	Intel Gaudi software repository URL.
`SYNAPSE_VERSION`	`1.22.2`	Intel Gaudi software suite version.
`SYNAPSE_REVISION`	`32`	Specific revision of the software suite.
`BASE_NAME`	`rhel9.6`	Base RHEL UBI image version.
`PT_VERSION`	`2.7.1`	PyTorch version.
`TORCH_TYPE`	`upstream`	PyTorch distribution type.
`VLLM_GAUDI_COMMIT`	`main`	vLLM Hardware Plugin for Intel Gaudi commit or branch.
`VLLM_PROJECT_COMMIT`	empty	Specific vLLM project commit.

To override build arguments, use the --build-arg flag, as in this example:

    $ docker build -f .cd/Dockerfile.rhel.ubi.vllm -t vllm-gaudi:ubi \
            --build-arg SYNAPSE_VERSION=1.22.2 \
            --build-arg SYNAPSE_REVISION=32 \
            --build-arg PT_VERSION=2.7.1 \
            --build-arg TORCH_TYPE=upstream \
            --build-arg VLLM_GAUDI_COMMIT=main \
            --build-arg VLLM_PROJECT_COMMIT= \
            .

The UBI Dockerfile includes commented lines to install additional Python packages and copy scripts from .cd/templates, .cd/entrypoints, .cd/server, and .cd/benchmark into /root/scripts/. If you need these packages in your workflow, uncomment the relevant lines in .cd/Dockerfile.rhel.ubi.vllm and adjust the container ENTRYPOINT.

--8<-- [end:docker_quickstart]

Building vLLM Hardware Plugin for Intel Gaudi from Source

There are two ways to install vLLM Hardware Plugin for Intel Gaudi from source: a standard installation for typical usage, and an enhanced setup with NIXL for optimized performance with large-scale or distributed inference.

Standard Plugin Deployment

Verify that the Intel Gaudi software was correctly installed.

 $ hl-smi # verify that hl-smi is in your PATH and each Gaudi accelerator is visible
 $ apt list --installed | grep habana # verify that habanalabs-firmware-tools, habanalabs-graph, habanalabs-rdma-core, habanalabs-thunk and habanalabs-container-runtime are installed
 $ pip list | grep habana # verify that habana-torch-plugin, habana-torch-dataloader, habana-pyhlml and habana-media-loader are installed
 $ pip list | grep neural # verify that neural-compressor is installed

For more information about verification, see System Verification and Final Tests.

Run the latest Docker image from the Intel Gaudi vault as in the following code sample. Make sure to provide your versions of vLLM Hardware Plugin for Intel Gaudi, operating system, and PyTorch. Ensure that these versions are supported, according to the Support Matrix.

 docker pull vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
 docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest

For more information, see the Intel Gaudi documentation.

Get the last verified vLLM commit. While vLLM Hardware Plugin for Intel Gaudi follows the latest vLLM commits, upstream API updates may introduce compatibility issues. The saved commit has been thoroughly validated.
```
 git clone https://github.com/vllm-project/vllm-gaudi
 cd vllm-gaudi
 export VLLM_COMMIT_HASH=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null)
 cd ..
```

Install vLLM using pip or build it from source.

 # Build vLLM from source for empty platform, reusing existing torch installation
 git clone https://github.com/vllm-project/vllm
 cd vllm
 git checkout $VLLM_COMMIT_HASH
 pip install -r <(sed '/^torch/d' requirements/build.txt)
 VLLM_TARGET_DEVICE=empty pip install --no-build-isolation -e .
 cd ..

Install vLLM Hardware Plugin for Intel Gaudi from source.
```
 cd vllm-gaudi
 pip install -e .
 cd ..
```

To achieve the best performance on HPU, please follow the methods outlined in the Optimizing Training Platform Guide.

Plugin Deployment with NIXL

Verify that the Intel Gaudi software was correctly installed.

    $ hl-smi # verify that hl-smi is in your PATH and each Gaudi accelerator is visible
    $ apt list --installed | grep habana # verify that habanalabs-firmware-tools, habanalabs-graph, habanalabs-rdma-core, habanalabs-thunk and habanalabs-container-runtime are installed
    $ pip list | grep habana # verify that habana-torch-plugin, habana-torch-dataloader, habana-pyhlml and habana-media-loader are installed
    $ pip list | grep neural # verify that neural-compressor is installed

For more information about verification, see [System Verification and Final Tests](https://docs.habana.ai/en/latest/Installation_Guide/System_Verification_and_Final_Tests.html).

Dockerfile deployment

To install vLLM Hardware Plugin for Intel Gaudi and NIXL using a Dockerfile:

    git clone https://github.com/vllm-project/vllm-gaudi
    docker build -t ubuntu.pytorch.vllm.nixl.latest \
      -f vllm-gaudi/.cd/Dockerfile.ubuntu.pytorch.vllm.nixl.latest vllm-gaudi
    docker run -it --rm --runtime=habana \
      --name=ubuntu.pytorch.vllm.nixl.latest \
      --network=host \
      -e HABANA_VISIBLE_DEVICES=all \
      ubuntu.pytorch.vllm.nixl.latest /bin/bash

Building Plugin with NIXL using sources

Get the last verified vLLM commit. While vLLM Hardware Plugin for Intel Gaudi follows the latest vLLM commits, upstream API updates may introduce compatibility issues. The saved commit has been thoroughly validated
```
 git clone https://github.com/vllm-project/vllm-gaudi
 cd vllm-gaudi
 export VLLM_COMMIT_HASH=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null)
```

Build vLLM from source for empty platform, reusing existing torch installation.

 cd ..
 git clone https://github.com/vllm-project/vllm
 cd vllm
 git checkout $VLLM_COMMIT_HASH
 pip install -r <(sed '/^torch/d' requirements/build.txt)
 VLLM_TARGET_DEVICE=empty pip install --no-build-isolation -e .
 cd ..

Install vLLM Hardware Plugin for Intel Gaudi from source.
```
 cd vllm-gaudi
 pip install -e .
```
Build NIXL.
```
 python install_nixl.py
```

To achieve the best performance on HPU, please follow the methods outlined in the Optimizing Training Platform Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation

Requirements

Running vLLM Hardware Plugin for Intel Gaudi Using Dockerfile

--8<-- [start:docker_quickstart]

Ubuntu-based Dockerfile

UBI-based Dockerfile

--8<-- [end:docker_quickstart]

Building vLLM Hardware Plugin for Intel Gaudi from Source

Standard Plugin Deployment

Plugin Deployment with NIXL

Dockerfile deployment

Building Plugin with NIXL using sources

FilesExpand file tree

installation.md

Latest commit

History

installation.md

File metadata and controls

Installation

Requirements

Running vLLM Hardware Plugin for Intel Gaudi Using Dockerfile

--8<-- [start:docker_quickstart]

Ubuntu-based Dockerfile

UBI-based Dockerfile

--8<-- [end:docker_quickstart]

Building vLLM Hardware Plugin for Intel Gaudi from Source

Standard Plugin Deployment

Plugin Deployment with NIXL

Dockerfile deployment

Building Plugin with NIXL using sources