3
3
Installation
4
4
============
5
5
6
- vLLM is a Python library that also contains some C++ and CUDA code.
7
- This additional code requires compilation on the user's machine.
6
+ vLLM is a Python library that also contains pre-compiled C++ and CUDA (11.8) binaries.
8
7
9
8
Requirements
10
9
------------
11
10
12
11
* OS: Linux
13
- * Python: 3.8 or higher
14
- * CUDA: 11.0 -- 11.8
12
+ * Python: 3.8 -- 3.11
15
13
* GPU: compute capability 7.0 or higher (e.g., V100, T4, RTX20xx, A100, L4, etc.)
16
14
17
- .. note ::
18
- As of now, vLLM does not support CUDA 12.
19
- If you are using Hopper or Lovelace GPUs, please use CUDA 11.8 instead of CUDA 12.
20
-
21
- .. tip ::
22
- If you have trouble installing vLLM, we recommend using the NVIDIA PyTorch Docker image.
23
-
24
- .. code-block :: console
25
-
26
- $ # Pull the Docker image with CUDA 11.8.
27
- $ docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3
28
-
29
- Inside the Docker container, please execute :code: `pip uninstall torch ` before installing vLLM.
30
-
31
15
Install with pip
32
16
----------------
33
17
@@ -40,7 +24,7 @@ You can install vLLM using pip:
40
24
$ conda activate myenv
41
25
42
26
$ # Install vLLM.
43
- $ pip install vllm # This may take 5-10 minutes.
27
+ $ pip install vllm
44
28
45
29
46
30
.. _build_from_source :
@@ -55,3 +39,11 @@ You can also build and install vLLM from source:
55
39
$ git clone https://github.com/vllm-project/vllm.git
56
40
$ cd vllm
57
41
$ pip install -e . # This may take 5-10 minutes.
42
+
43
+ .. tip ::
44
+ If you have trouble building vLLM, we recommend using the NVIDIA PyTorch Docker image.
45
+
46
+ .. code-block :: console
47
+
48
+ $ # Pull the Docker image with CUDA 11.8.
49
+ $ docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3
0 commit comments