File tree Expand file tree Collapse file tree 2 files changed +13
-1
lines changed Expand file tree Collapse file tree 2 files changed +13
-1
lines changed Original file line number Diff line number Diff line change @@ -30,11 +30,15 @@ COPY requirements.txt requirements.txt
30
30
COPY pyproject.toml pyproject.toml
31
31
COPY vllm/__init__.py vllm/__init__.py
32
32
33
+ ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0+PTX'
34
+ ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}
33
35
# max jobs used by Ninja to build extensions
34
- ENV MAX_JOBS=$max_jobs
36
+ ARG max_jobs=2
37
+ ENV MAX_JOBS=${max_jobs}
35
38
# number of threads used by nvcc
36
39
ARG nvcc_threads=8
37
40
ENV NVCC_THREADS=$nvcc_threads
41
+
38
42
RUN python3 setup.py build_ext --inplace
39
43
40
44
# image to run unit testing suite
Original file line number Diff line number Diff line change @@ -31,6 +31,14 @@ You can build and run vLLM from source via the provided dockerfile. To build vLL
31
31
32
32
$ DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai # optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
33
33
34
+
35
+ .. note ::
36
+
37
+ By default vLLM will build for all GPU types for widest distribution. If you are just building for the
38
+ current GPU type the machine is running on, you can add the argument ``--build-arg torch_cuda_arch_list="" ``
39
+ for vLLM to find the current GPU type and build for that.
40
+
41
+
34
42
To run vLLM:
35
43
36
44
.. code-block :: console
You can’t perform that action at this time.
0 commit comments