Skip to content

v0.2.6

Compare
Choose a tag to compare
@github-actions github-actions released this 17 Dec 18:35
· 9613 commits to main since this release
671af2b

Major changes

  • Fast model execution with CUDA/HIP graph
  • W4A16 GPTQ support (thanks to @chu-tianxiang)
  • Fix memory profiling with tensor parallelism
  • Fix *.bin weight loading for Mixtral models

What's Changed

New Contributors

Full Changelog: v0.2.5...v0.2.6