Skip to content

Commit b81a6a6

Browse files
authored
[Docs] Add supported quantization methods to docs (#2135)
1 parent 0fbfc4b commit b81a6a6

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ vLLM is fast with:
3535
- State-of-the-art serving throughput
3636
- Efficient management of attention key and value memory with **PagedAttention**
3737
- Continuous batching of incoming requests
38+
- Quantization: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), [SqueezeLLM](https://arxiv.org/abs/2306.07629)
3839
- Optimized CUDA kernels
3940

4041
vLLM is flexible and easy to use with:
@@ -44,7 +45,7 @@ vLLM is flexible and easy to use with:
4445
- Tensor parallelism support for distributed inference
4546
- Streaming outputs
4647
- OpenAI-compatible API server
47-
- Support NVIDIA CUDA and AMD ROCm.
48+
- Support NVIDIA GPUs and AMD GPUs.
4849

4950
vLLM seamlessly supports many Hugging Face models, including the following architectures:
5051

docs/source/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ vLLM is fast with:
3030
* State-of-the-art serving throughput
3131
* Efficient management of attention key and value memory with **PagedAttention**
3232
* Continuous batching of incoming requests
33+
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
3334
* Optimized CUDA kernels
3435

3536
vLLM is flexible and easy to use with:
@@ -39,7 +40,7 @@ vLLM is flexible and easy to use with:
3940
* Tensor parallelism support for distributed inference
4041
* Streaming outputs
4142
* OpenAI-compatible API server
42-
* Support NVIDIA CUDA and AMD ROCm.
43+
* Support NVIDIA GPUs and AMD GPUs.
4344

4445
For more information, check out the following:
4546

0 commit comments

Comments
 (0)