Skip to content

0.0.7

Choose a tag to compare

@github-actions github-actions released this 28 Sep 15:44
· 93 commits to master since this release
  • Support SeedOssForCausalLM
  • Support ApertusForCausalLM
  • Support Qwen3NextForCausalLM¹
  • Reduced CPU overhead
  • Fix support for non-AVX2 CPUs
  • Optimized GEMM kernels
  • Faster quantization, especially on Blackwell
  • Quant optimizer utils
  • Much lower overhead from quantized cache
  • Tensor split option for MoE layers with large experts
  • Add recurrent model support to generator
  • Generator now allows allocating pages on the fly
  • Many more improvements and bugfixes

¹ Qwen3-Next currently requires Triton and Flash Linear Attention. causal-conv1d is recommended but not required. Triton-free implementation is in the works for v0.0.8.

Full Changelog: v0.0.6...v0.0.7