Skip to content

Commit 26c52a5

Browse files
authored
[Docs] Add CUDA graph support to docs (#2148)
1 parent c3372e8 commit 26c52a5

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ vLLM is fast with:
3535
- State-of-the-art serving throughput
3636
- Efficient management of attention key and value memory with **PagedAttention**
3737
- Continuous batching of incoming requests
38+
- Fast model execution with CUDA/HIP graph
3839
- Quantization: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), [SqueezeLLM](https://arxiv.org/abs/2306.07629)
3940
- Optimized CUDA kernels
4041

@@ -45,7 +46,7 @@ vLLM is flexible and easy to use with:
4546
- Tensor parallelism support for distributed inference
4647
- Streaming outputs
4748
- OpenAI-compatible API server
48-
- Support NVIDIA GPUs and AMD GPUs.
49+
- Support NVIDIA GPUs and AMD GPUs
4950

5051
vLLM seamlessly supports many Hugging Face models, including the following architectures:
5152

docs/source/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ vLLM is fast with:
3030
* State-of-the-art serving throughput
3131
* Efficient management of attention key and value memory with **PagedAttention**
3232
* Continuous batching of incoming requests
33+
* Fast model execution with CUDA/HIP graph
3334
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
3435
* Optimized CUDA kernels
3536

@@ -40,7 +41,7 @@ vLLM is flexible and easy to use with:
4041
* Tensor parallelism support for distributed inference
4142
* Streaming outputs
4243
* OpenAI-compatible API server
43-
* Support NVIDIA GPUs and AMD GPUs.
44+
* Support NVIDIA GPUs and AMD GPUs
4445

4546
For more information, check out the following:
4647

0 commit comments

Comments
 (0)