Skip to content

v0.30.4

Choose a tag to compare

@zcbenz zcbenz released this 27 Jan 22:27
· 38 commits to main since this release
2f324cc

Highlights

  • Metal: Much faster vector fused grouped-query attention for long context
  • CUDA: Several improvements to speed up LLM inference for CUDA backend
  • CUDA: Support for dense MoEs
  • CUDA: Better support for consumer GPUs (4090, 5090, RTX 6000, ...)

What's Changed

New Contributors

Full Changelog: v0.30.3...v0.30.4