Skip to content

v0.3.3

Compare
Choose a tag to compare
@github-actions github-actions released this 01 Mar 20:58
· 9394 commits to main since this release
82091b8

Major changes

  • StarCoder2 support
  • Performance optimization and LoRA support for Gemma
  • 2/3/8-bit GPTQ support
  • Integrate Marlin Kernels for Int4 GPTQ inference
  • Performance optimization for MoE kernel
  • [Experimental] AWS Inferentia2 support
  • [Experimental] Structured output (JSON, Regex) in OpenAI Server

What's Changed

New Contributors

Full Changelog: v0.3.2...v0.3.3