Skip to content
This repository was archived by the owner on Sep 4, 2025. It is now read-only.

Commit ed94d42

Browse files
tjohnson31415z103cb
authored andcommitted
ci/build/feat: bump vLLM libs to v0.4.2 and other deps in Dockerfile.ubi (#23)
Changes: - vLLM v0.4.2 was published today, update our build to use pre-built libs from their wheel - bump other dependencies in the image build (base UBI image, miniforge, flash attention, grpcio-tools, accelerate) - little cleanup to remove `PYTORCH_` args that are no longer used --------- Signed-off-by: Travis Johnson <[email protected]>
1 parent 6084d41 commit ed94d42

File tree

1 file changed

+6
-9
lines changed

1 file changed

+6
-9
lines changed

Dockerfile.ubi

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,8 @@
22
# docs/source/dev/dockerfile-ubi/dockerfile-ubi.rst
33

44
## Global Args #################################################################
5-
ARG BASE_UBI_IMAGE_TAG=9.3-1612
5+
ARG BASE_UBI_IMAGE_TAG=9.4-949.1714662671
66
ARG PYTHON_VERSION=3.11
7-
ARG PYTORCH_INDEX="https://download.pytorch.org/whl"
8-
# ARG PYTORCH_INDEX="https://download.pytorch.org/whl/nightly"
9-
ARG PYTORCH_VERSION=2.1.2
107

118
# NOTE: This setting only has an effect when not using prebuilt-wheel kernels
129
ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"
@@ -149,8 +146,8 @@ RUN microdnf install -y \
149146
&& microdnf clean all
150147

151148
ARG PYTHON_VERSION
152-
# 0.4.1 is built for CUDA 12.1 and PyTorch 2.1.2
153-
ARG VLLM_WHEEL_VERSION=0.4.1
149+
# 0.4.2 is built for CUDA 12.1 and PyTorch 2.3.0
150+
ARG VLLM_WHEEL_VERSION=0.4.2
154151

155152
RUN curl -Lo vllm.whl https://github.com/vllm-project/vllm/releases/download/v${VLLM_WHEEL_VERSION}/vllm-${VLLM_WHEEL_VERSION}-cp${PYTHON_VERSION//.}-cp${PYTHON_VERSION//.}-manylinux1_x86_64.whl \
156153
&& unzip vllm.whl \
@@ -223,7 +220,7 @@ RUN microdnf install -y git \
223220
ARG max_jobs=2
224221
ENV MAX_JOBS=${max_jobs}
225222
# flash attention version
226-
ARG flash_attn_version=v2.5.6
223+
ARG flash_attn_version=v2.5.8
227224
ENV FLASH_ATTN_VERSION=${flash_attn_version}
228225

229226
WORKDIR /usr/src/flash-attention-v2
@@ -260,9 +257,9 @@ RUN --mount=type=bind,from=flash-attn-builder,src=/usr/src/flash-attention-v2,ta
260257
RUN --mount=type=cache,target=/root/.cache/pip \
261258
pip install \
262259
# additional dependencies for the TGIS gRPC server
263-
grpcio==1.62.1 \
260+
grpcio-tools==1.63.0 \
264261
# additional dependencies for openai api_server
265-
accelerate==0.28.0 \
262+
accelerate==0.30.0 \
266263
# hf_transfer for faster HF hub downloads
267264
hf_transfer==0.1.6
268265

0 commit comments

Comments
 (0)