@@ -58,8 +58,8 @@ vLLM is fast with:
5858- Efficient management of attention key and value memory with [ ** PagedAttention** ] ( https://blog.vllm.ai/2023/06/20/vllm.html )
5959- Continuous batching of incoming requests
6060- Fast model execution with CUDA/HIP graph
61- - Quantizations: [ GPTQ] ( https://arxiv.org/abs/2210.17323 ) , [ AWQ] ( https://arxiv.org/abs/2306.00978 ) , [ AutoRound] ( https://arxiv.org/abs/2309.05516 ) ,INT4, INT8, and FP8.
62- - Optimized CUDA kernels, including integration with FlashAttention and FlashInfer.
61+ - Quantizations: [ GPTQ] ( https://arxiv.org/abs/2210.17323 ) , [ AWQ] ( https://arxiv.org/abs/2306.00978 ) , [ AutoRound] ( https://arxiv.org/abs/2309.05516 ) , INT4, INT8, and FP8
62+ - Optimized CUDA kernels, including integration with FlashAttention and FlashInfer
6363- Speculative decoding
6464- Chunked prefill
6565
@@ -72,14 +72,14 @@ vLLM is flexible and easy to use with:
7272- Tensor parallelism and pipeline parallelism support for distributed inference
7373- Streaming outputs
7474- OpenAI-compatible API server
75- - Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Neuron.
75+ - Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Neuron
7676- Prefix caching support
7777- Multi-LoRA support
7878
7979vLLM seamlessly supports most popular open-source models on HuggingFace, including:
8080- Transformer-like LLMs (e.g., Llama)
8181- Mixture-of-Expert LLMs (e.g., Mixtral, Deepseek-V2 and V3)
82- - Embedding Models (e.g. E5-Mistral)
82+ - Embedding Models (e.g., E5-Mistral)
8383- Multi-modal LLMs (e.g., LLaVA)
8484
8585Find the full list of supported models [ here] ( https://docs.vllm.ai/en/latest/models/supported_models.html ) .
@@ -162,4 +162,4 @@ If you use vLLM for your research, please cite our [paper](https://arxiv.org/abs
162162
163163## Media Kit
164164
165- - If you wish to use vLLM's logo, please refer to [ our media kit repo] ( https://github.com/vllm-project/media-kit ) .
165+ - If you wish to use vLLM's logo, please refer to [ our media kit repo] ( https://github.com/vllm-project/media-kit )
0 commit comments