Skip to content

v0.5.1

Compare
Choose a tag to compare
@github-actions github-actions released this 05 Jul 19:47
· 8399 commits to main since this release
79d406e

Highlights

  • vLLM now has pipeline parallelism! (#4412, #5408, #6115, #6120). You can now run the API server with --pipeline-parallel-size. This feature is in early stage, please let us know your feedback.

Model Support

  • Support Gemma 2 (#5908, #6051). Please note that for correctness, Gemma should run with FlashInfer backend which supports logits soft cap. The wheels for FlashInfer can be downloaded here
  • Support Jamba (#4115). This is vLLM's first state space model!
  • Support Deepseek-V2 (#4650). Please note that MLA (Multi-head Latent Attention) is not implemented and we are looking for contribution!
  • Vision Language Model adding support for Phi3-Vision, dynamic image size, and a registry for processing model inputs (#4986, #5276, #5214)
    • Notably, it has a breaking change that all VLM specific arguments are now removed from engine APIs so you no longer need to set it globally via CLI. However, you now only need to pass in <image> into the prompt instead of complicated prompt formatting. See more here
    • There is also a new guide on adding VLMs! We would love your contribution for new models!

Hardware Support

Production Service

  • Support for sharded tensorized models (#4990)
  • Continous streaming of OpenAI response token stats (#5742)

Performance

  • Enhancement in distributed communication via shared memory (#5399)
  • Latency enhancement in block manager (#5584)
  • Enhancements to compressed-tensors supporting Marlin, W4A16 (#5435, #5385)
  • Faster FP8 quantize kernel (#5396), FP8 on Ampere (#5975)
  • Option to use FlashInfer for prefill, decode, and CUDA Graph for decode (#4628)
  • Speculative Decoding
  • Draft Model Runner (#5799)

Development Productivity

  • Post merge benchmark is now available at perf.vllm.ai!
  • Addition of A100 in CI environment (#5658)
  • Step towards nightly wheel publication (#5610)

What's Changed

New Contributors

Full Changelog: v0.5.0...v0.5.1