Update Dockerfiles and documentation for v0.14.1 release (#919)

PatrykWo · web-flow · commit c4ecd716a0fa · 2026-02-04T09:21:54.000+01:00
Signed-off-by: PatrykWo &lt;patryk.wolsza@intel.com&gt;
diff --git a/.cd/Dockerfile.rhel.tenc.pytorch.vllm b/.cd/Dockerfile.rhel.tenc.pytorch.vllm
@@ -13,9 +13,9 @@ ARG TORCH_TYPE_SUFFIX
 FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-${TORCH_TYPE_SUFFIX}installer-${PT_VERSION}:${REVISION}
 
 # Parameterize commit/branch for vllm-plugin checkout
-ARG VLLM_GAUDI_COMMIT=main
+ARG VLLM_GAUDI_COMMIT=v0.14.1
 # leave empty to use last-good-commit-for-vllm-gaudi
-ARG VLLM_PROJECT_COMMIT=
+ARG VLLM_PROJECT_COMMIT=v0.14.1
 
 ARG BASE_NAME
 ENV BASE_NAME=${BASE_NAME}
diff --git a/.cd/Dockerfile.rhel.ubi.vllm b/.cd/Dockerfile.rhel.ubi.vllm
@@ -9,8 +9,8 @@ ARG BASE_NAME=rhel9.6
 ARG PT_VERSION=2.9.0
 # can be upstream or fork
 ARG TORCH_TYPE=upstream
-ARG VLLM_GAUDI_COMMIT=main
-ARG VLLM_PROJECT_COMMIT=
+ARG VLLM_GAUDI_COMMIT=v0.14.1
+ARG VLLM_PROJECT_COMMIT=v0.14.1
 
 # ============================================================================
 # Stage 1: gaudi-base - Base system setup with Habana drivers
diff --git a/.cd/Dockerfile.ubuntu.pytorch.vllm b/.cd/Dockerfile.ubuntu.pytorch.vllm
@@ -13,9 +13,8 @@ ARG TORCH_TYPE_SUFFIX
 FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-${TORCH_TYPE_SUFFIX}installer-${PT_VERSION}:${REVISION}
 
 # Parameterize commit/branch for vllm-project & vllm-gaudi checkout
-ARG VLLM_GAUDI_COMMIT=main
-# leave empty to use last-good-commit-for-vllm-gaudi
-ARG VLLM_PROJECT_COMMIT=
+ARG VLLM_GAUDI_COMMIT=v0.14.1
+ARG VLLM_PROJECT_COMMIT=v0.14.1
 ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
 
 RUN apt update && \
diff --git a/.cd/Dockerfile.ubuntu.pytorch.vllm.nixl.latest b/.cd/Dockerfile.ubuntu.pytorch.vllm.nixl.latest
@@ -14,8 +14,8 @@ FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-${TORCH_TYPE_SUF
 
 # Parameterize commit/branch for vllm-project & vllm-gaudi checkout
 # leave empty to use last-good-commit-for-vllm-gaudi
-ARG VLLM_PROJECT_COMMIT=
-ARG VLLM_GAUDI_COMMIT=main
+ARG VLLM_PROJECT_COMMIT=v0.14.1
+ARG VLLM_GAUDI_COMMIT=v0.14.1
 
 ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
 
@@ -47,7 +47,7 @@ RUN \
        echo "Using vLLM commit : ${VLLM_PROJECT_COMMIT}"; \
     fi && \
     mkdir -p $VLLM_PATH && \
-    # Clone vllm-project/vllm and use configured or last good commit hash    
+    # Clone vllm-project/vllm and use configured or last good commit hash
     git clone https://github.com/vllm-project/vllm.git $VLLM_PATH && \
     cd $VLLM_PATH && \
     git remote add upstream https://github.com/vllm-project/vllm.git && \
diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@ vLLM Hardware Plugin for Intel® Gaudi®
 
 ---
 *Latest News* 🔥
-
+- [2026/02] Version 0.14.1 is now available, built on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces support for Granite 4.0h and Qwen 3 VL models.
 - [2026/01] Version 0.13.0 is now available, built on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces experimental dynamic quantization for MatMul and KV‑cache operations to improve performance and also supports additional models.
 - [2025/11] The 0.11.2 release introduces the production-ready version of the vLLM Hardware Plugin for Intel® Gaudi® v1.22.2. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin. For more information about this release, see the [Release Notes](docs/release_notes.md).
 - [2025/06] We introduced an early developer preview of the vLLM Hardware Plugin for Intel® Gaudi®, which is not yet intended for general use.
diff --git a/docs/getting_started/compatibility_matrix.md b/docs/getting_started/compatibility_matrix.md
@@ -5,8 +5,8 @@ title: Compatibility Matrix
 
 The following table detail the supported vLLM versions for Intel® Gaudi® 2 and Intel® Gaudi® 3 AI accelerators.
 
-| Intel Gaudi Software | vLLM v0.10.0 | vLLM v0.10.1 | vLLM v0.11.2 |  vLLM v0.12.0  |
+| Intel Gaudi Software | vLLM v0.10.1 | vLLM v0.11.2 | vLLM v0.13.0 |  vLLM v0.14.1  |
 | :------------------- | :----------: | :----------: | :----------: | :------------: |
-| 1.22.1               |   ✅ Alfa    |   ✅ Beta    |      ❌      |       ❌       |
-| 1.22.2               |      ❌      |      ❌      |      ✅      |       ❌       |
-| 1.23.0               |      ❌      |      ❌      |      ❌      | In development |
+| 1.22.1               |   ✅ Beta    |   ❌        |      ❌      |       ❌      |
+| 1.22.2               |      ❌      |      ✅     |      ❌      |       ❌      |
+| 1.23.0               |      ❌      |      ✅     |      ✅      |       ✅      |
diff --git a/docs/getting_started/installation.md b/docs/getting_started/installation.md
@@ -61,8 +61,8 @@ There are two ways to install vLLM Hardware Plugin for Intel® Gaudi® from sour
 
 2. Run the latest Docker image from the Intel® Gaudi® vault as in the following code sample. Make sure to provide your versions of vLLM Hardware Plugin for Intel® Gaudi®, operating system, and PyTorch. Ensure that these versions are supported, according to the [Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html).
   
-        docker pull vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
-        docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
+        docker pull vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
+        docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
   
     For more information, see the [Intel Gaudi documentation](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#pull-prebuilt-containers).
   
diff --git a/docs/getting_started/validated_models.md b/docs/getting_started/validated_models.md
@@ -39,6 +39,13 @@ The following configurations have been validated to function with Intel® Gaudi
 | [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)     | 1    | BF16   |Gaudi 3|
 | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 1 | BF16 | Gaudi 3 |
 | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)     | 4, 8    | BF16, FP8    | Gaudi 2, Gaudi 3|
+| [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct)     | 1    | BF16, FP8    | Gaudi 3|
+| [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking)     | 1    | BF16, FP8    | Gaudi 3|
+| [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct)     | 8    | BF16   | Gaudi 3|
+| [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8)     | 4    |  FP8    | Gaudi 3|
+| [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking)     | 8    |  BF16    | Gaudi 3|
+| [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8)     | 4    |  FP8    | Gaudi 3|
+| [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small)     | 1    |  BF16    | Gaudi 3|
 
 Validation of the following configurations is currently in progress:
 
diff --git a/docs/release_notes.md b/docs/release_notes.md
@@ -2,6 +2,19 @@
 
 This document provides an overview of the features, changes, and fixes introduced in each release of the vLLM Hardware Plugin for Intel® Gaudi®.
 
+## 0.14.1
+
+This version is based on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) with support [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html), and introduces support for the following models on Gaudi 3:
+
+- [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small)
+- [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
+- [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct)
+- [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking)
+- [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct)
+- [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8)
+- [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking)
+- [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8)
+
 ## 0.13.0
 
 This version is based on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and supports [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html).