Skip to content

vLLM v0.1.4

Compare
Choose a tag to compare
@github-actions github-actions released this 25 Aug 03:31
· 10019 commits to main since this release
791d79d

Major changes

  • From now on, vLLM is published with pre-built CUDA binaries. Users don't have to compile the vLLM's CUDA kernels on their machine.
  • New models: InternLM, Qwen, Aquila.
  • Optimizing CUDA kernels for paged attention and GELU.
  • Many bug fixes.

What's Changed

New Contributors

Full Changelog: v0.1.3...v0.1.4