🚀 Core Highlights
- Native Kimi-K2-Thinking support with RAWINT4 method, enabling CPU and GPU to share the same INT4 weights without separate conversion.
- AMD BLIS backend for INT8 MoE inference, expanding hardware support beyond Intel AMX.
- AVX-based Kimi-K2 support for CPUs without AMX instructions.
📌 Models, Hardware & Tooling
- Add Qwen3-VL weights conversion and DeepSeek-V3.2 tutorial.
- Fix OOM in weight conversion, llamafile data race, and AVX2 build issues.
- Add CI pipeline with accuracy and performance tests.
📝 Docs & Community
- Add full KTransformers introduction and AMD BLIS usage guide.
- Add Native Kimi-K2-Thinking tutorial with Claude Code Router integration.
- Update Ascend NPU docs and Python 3.12 support.
🌟 Contributors
Thanks to all contributors who helped ship this release.
Full Changelog: v0.4.2...v0.4.3
CC: @SkqLiao @JimmyPeilinLi @ouqingliang @ovowei @KMSorSMS @poryfly @ouqingliang @Azure-Tang @mrhaoxx @DocShotgun @RICHARDNAN @Atream @chenht2022 @qiyuxinlin @ErvinXie @james0zan