Skip to content

Commit c85b80c

Browse files
authored
[Docker] Add cuda arch list as build option (#1950)
1 parent 2b98101 commit c85b80c

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

Dockerfile

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,15 @@ COPY requirements.txt requirements.txt
3030
COPY pyproject.toml pyproject.toml
3131
COPY vllm/__init__.py vllm/__init__.py
3232

33+
ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0+PTX'
34+
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}
3335
# max jobs used by Ninja to build extensions
34-
ENV MAX_JOBS=$max_jobs
36+
ARG max_jobs=2
37+
ENV MAX_JOBS=${max_jobs}
3538
# number of threads used by nvcc
3639
ARG nvcc_threads=8
3740
ENV NVCC_THREADS=$nvcc_threads
41+
3842
RUN python3 setup.py build_ext --inplace
3943

4044
# image to run unit testing suite

docs/source/serving/deploying_with_docker.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,14 @@ You can build and run vLLM from source via the provided dockerfile. To build vLL
3131
3232
$ DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai # optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
3333
34+
35+
.. note::
36+
37+
By default vLLM will build for all GPU types for widest distribution. If you are just building for the
38+
current GPU type the machine is running on, you can add the argument ``--build-arg torch_cuda_arch_list=""``
39+
for vLLM to find the current GPU type and build for that.
40+
41+
3442
To run vLLM:
3543

3644
.. code-block:: console

0 commit comments

Comments
 (0)