Skip to content

Commit c4ecd71

Browse files
authored
Update Dockerfiles and documentation for v0.14.1 release (#919)
Signed-off-by: PatrykWo <patryk.wolsza@intel.com>
1 parent 3523084 commit c4ecd71

9 files changed

+36
-17
lines changed

.cd/Dockerfile.rhel.tenc.pytorch.vllm

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ ARG TORCH_TYPE_SUFFIX
1313
FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-${TORCH_TYPE_SUFFIX}installer-${PT_VERSION}:${REVISION}
1414

1515
# Parameterize commit/branch for vllm-plugin checkout
16-
ARG VLLM_GAUDI_COMMIT=main
16+
ARG VLLM_GAUDI_COMMIT=v0.14.1
1717
# leave empty to use last-good-commit-for-vllm-gaudi
18-
ARG VLLM_PROJECT_COMMIT=
18+
ARG VLLM_PROJECT_COMMIT=v0.14.1
1919

2020
ARG BASE_NAME
2121
ENV BASE_NAME=${BASE_NAME}

.cd/Dockerfile.rhel.ubi.vllm

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ ARG BASE_NAME=rhel9.6
99
ARG PT_VERSION=2.9.0
1010
# can be upstream or fork
1111
ARG TORCH_TYPE=upstream
12-
ARG VLLM_GAUDI_COMMIT=main
13-
ARG VLLM_PROJECT_COMMIT=
12+
ARG VLLM_GAUDI_COMMIT=v0.14.1
13+
ARG VLLM_PROJECT_COMMIT=v0.14.1
1414

1515
# ============================================================================
1616
# Stage 1: gaudi-base - Base system setup with Habana drivers

.cd/Dockerfile.ubuntu.pytorch.vllm

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,8 @@ ARG TORCH_TYPE_SUFFIX
1313
FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-${TORCH_TYPE_SUFFIX}installer-${PT_VERSION}:${REVISION}
1414

1515
# Parameterize commit/branch for vllm-project & vllm-gaudi checkout
16-
ARG VLLM_GAUDI_COMMIT=main
17-
# leave empty to use last-good-commit-for-vllm-gaudi
18-
ARG VLLM_PROJECT_COMMIT=
16+
ARG VLLM_GAUDI_COMMIT=v0.14.1
17+
ARG VLLM_PROJECT_COMMIT=v0.14.1
1918
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
2019

2120
RUN apt update && \

.cd/Dockerfile.ubuntu.pytorch.vllm.nixl.latest

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-${TORCH_TYPE_SUF
1414

1515
# Parameterize commit/branch for vllm-project & vllm-gaudi checkout
1616
# leave empty to use last-good-commit-for-vllm-gaudi
17-
ARG VLLM_PROJECT_COMMIT=
18-
ARG VLLM_GAUDI_COMMIT=main
17+
ARG VLLM_PROJECT_COMMIT=v0.14.1
18+
ARG VLLM_GAUDI_COMMIT=v0.14.1
1919

2020
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
2121

@@ -47,7 +47,7 @@ RUN \
4747
echo "Using vLLM commit : ${VLLM_PROJECT_COMMIT}"; \
4848
fi && \
4949
mkdir -p $VLLM_PATH && \
50-
# Clone vllm-project/vllm and use configured or last good commit hash
50+
# Clone vllm-project/vllm and use configured or last good commit hash
5151
git clone https://github.com/vllm-project/vllm.git $VLLM_PATH && \
5252
cd $VLLM_PATH && \
5353
git remote add upstream https://github.com/vllm-project/vllm.git && \

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ vLLM Hardware Plugin for Intel® Gaudi®
1414

1515
---
1616
*Latest News* 🔥
17-
17+
- [2026/02] Version 0.14.1 is now available, built on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces support for Granite 4.0h and Qwen 3 VL models.
1818
- [2026/01] Version 0.13.0 is now available, built on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces experimental dynamic quantization for MatMul and KV‑cache operations to improve performance and also supports additional models.
1919
- [2025/11] The 0.11.2 release introduces the production-ready version of the vLLM Hardware Plugin for Intel® Gaudi® v1.22.2. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin. For more information about this release, see the [Release Notes](docs/release_notes.md).
2020
- [2025/06] We introduced an early developer preview of the vLLM Hardware Plugin for Intel® Gaudi®, which is not yet intended for general use.

docs/getting_started/compatibility_matrix.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ title: Compatibility Matrix
55

66
The following table detail the supported vLLM versions for Intel® Gaudi® 2 and Intel® Gaudi® 3 AI accelerators.
77

8-
| Intel Gaudi Software | vLLM v0.10.0 | vLLM v0.10.1 | vLLM v0.11.2 | vLLM v0.12.0 |
8+
| Intel Gaudi Software | vLLM v0.10.1 | vLLM v0.11.2 | vLLM v0.13.0 | vLLM v0.14.1 |
99
| :------------------- | :----------: | :----------: | :----------: | :------------: |
10-
| 1.22.1 |Alfa | ✅ Beta || |
11-
| 1.22.2 || | | |
12-
| 1.23.0 || | | In development |
10+
| 1.22.1 |Beta | |||
11+
| 1.22.2 || | ||
12+
| 1.23.0 || | | |

docs/getting_started/installation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ There are two ways to install vLLM Hardware Plugin for Intel® Gaudi® from sour
6161

6262
2. Run the latest Docker image from the Intel® Gaudi® vault as in the following code sample. Make sure to provide your versions of vLLM Hardware Plugin for Intel® Gaudi®, operating system, and PyTorch. Ensure that these versions are supported, according to the [Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html).
6363

64-
docker pull vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
65-
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
64+
docker pull vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
65+
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
6666

6767
For more information, see the [Intel Gaudi documentation](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#pull-prebuilt-containers).
6868

docs/getting_started/validated_models.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ The following configurations have been validated to function with Intel® Gaudi
3939
| [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 1 | BF16 |Gaudi 3|
4040
| [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 1 | BF16 | Gaudi 3 |
4141
| [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | 4, 8 | BF16, FP8 | Gaudi 2, Gaudi 3|
42+
| [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) | 1 | BF16, FP8 | Gaudi 3|
43+
| [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking) | 1 | BF16, FP8 | Gaudi 3|
44+
| [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) | 8 | BF16 | Gaudi 3|
45+
| [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8) | 4 | FP8 | Gaudi 3|
46+
| [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) | 8 | BF16 | Gaudi 3|
47+
| [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8) | 4 | FP8 | Gaudi 3|
48+
| [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small) | 1 | BF16 | Gaudi 3|
4249

4350
Validation of the following configurations is currently in progress:
4451

docs/release_notes.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,19 @@
22

33
This document provides an overview of the features, changes, and fixes introduced in each release of the vLLM Hardware Plugin for Intel® Gaudi®.
44

5+
## 0.14.1
6+
7+
This version is based on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) with support [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html), and introduces support for the following models on Gaudi 3:
8+
9+
- [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small)
10+
- [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
11+
- [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct)
12+
- [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking)
13+
- [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct)
14+
- [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8)
15+
- [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking)
16+
- [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8)
17+
518
## 0.13.0
619

720
This version is based on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and supports [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html).

0 commit comments

Comments
 (0)