3.2.5
Pre-release
Pre-release
MNN 3.2.5 Release Note
核心功能更新
1. 新增HQQ量化算法支持
- 在MNNConvert工具中集成HQQ量化算法,可通过
--hqq参数启用 - HQQ量化支持非对称量化,能显著提升量化模型的精度
- 支持与分块量化结合使用,进一步优化模型精度
2. 支持EAGLE-3推测解码算法
- 新增EAGLE-3推测解码算法实现,提升大语言模型推理效率
- 实现了EagleGeneration类,支持基于草稿模型的推测解码
- 提供了Eagle模型导出工具,支持导出eagle、eagle_fc和eagle_d2t三个组件
3. Qwen系列模型增强支持
- 修复并优化Qwen3-Embedding模型的推理问题
- 新增对Qwen3-VL多模态大模型的支持
- 完善了llmexport工具对Qwen系列模型的导出支持
详细变更内容
模型推理优化
- 重构了LLM模型加载逻辑,在Llm::load()方法中增加了更完善的错误处理
- 优化了KV Cache管理器的实现,提升了推理过程中的内存管理效率
- 改进了Metal后端的注意力机制实现
- 优化了OpenCL后端的卷积执行效率
量化工具改进
- 在WeightQuantAndCoding.cpp中集成了HQQ量化器,支持更精确的权重量化
- 优化了量化参数配置逻辑,当启用HQQ时自动设置非对称量化
- 修复了量化过程中的一些bug,提升了量化稳定性
模型导出增强
- 完善了llmexport工具的错误处理和日志输出
- 优化了模型导出流程,提升了导出稳定性
- 修订了压缩工具相关文档,增加了HQQ量化使用说明
Core Feature Updates
1. Added Support for HQQ Quantization Algorithm
- Integrated HQQ quantization algorithm into MNNConvert tool, which can be enabled via the
--hqqparameter - HQQ quantization supports asymmetric quantization, significantly improving the accuracy of quantized models
- Supports combination with block-wise quantization to further optimize model accuracy
2. Added Support for EAGLE-3 Speculative Decoding Algorithm
- Implemented EAGLE-3 speculative decoding algorithm to improve large language model inference efficiency
- Implemented EagleGeneration class to support draft model-based speculative decoding
- Provided Eagle model export tools supporting export of three components: eagle, eagle_fc, and eagle_d2t
3. Enhanced Support for Qwen Series Models
- Fixed and optimized inference issues with Qwen3-Embedding model
- Added support for Qwen3-VL multimodal large model
- Improved llmexport tool's export support for Qwen series models
Detailed Changes
Model Inference Optimization
- Refactored LLM model loading logic with enhanced error handling in the Llm::load() method
- Optimized KV Cache manager implementation to improve memory management efficiency during inference
- Improved attention mechanism implementation in Metal backend
- Optimized convolution execution efficiency in OpenCL backend
Quantization Tool Improvements
- Integrated HQQ quantizer in WeightQuantAndCoding.cpp for more precise weight quantization
- Optimized quantization parameter configuration logic to automatically set asymmetric quantization when HQQ is enabled
- Fixed bugs in the quantization process, improving quantization stability
Model Export Enhancements
- Improved error handling and log output in llmexport tool
- Optimized model export process to improve export stability
- Revised compression tool documentation with added HQQ quantization usage instructions