3.2.5

Pre-release

Pre-release

github-actions released this 16 Oct 10:53

9a0546b

MNN 3.2.5 Release Note

核心功能更新

1. 新增HQQ量化算法支持

在MNNConvert工具中集成HQQ量化算法，可通过--hqq参数启用
HQQ量化支持非对称量化，能显著提升量化模型的精度
支持与分块量化结合使用，进一步优化模型精度

2. 支持EAGLE-3推测解码算法

新增EAGLE-3推测解码算法实现，提升大语言模型推理效率
实现了EagleGeneration类，支持基于草稿模型的推测解码
提供了Eagle模型导出工具，支持导出eagle、eagle_fc和eagle_d2t三个组件

3. Qwen系列模型增强支持

修复并优化Qwen3-Embedding模型的推理问题
新增对Qwen3-VL多模态大模型的支持
完善了llmexport工具对Qwen系列模型的导出支持

详细变更内容

模型推理优化

重构了LLM模型加载逻辑，在Llm::load()方法中增加了更完善的错误处理
优化了KV Cache管理器的实现，提升了推理过程中的内存管理效率
改进了Metal后端的注意力机制实现
优化了OpenCL后端的卷积执行效率

量化工具改进

在WeightQuantAndCoding.cpp中集成了HQQ量化器，支持更精确的权重量化
优化了量化参数配置逻辑，当启用HQQ时自动设置非对称量化
修复了量化过程中的一些bug，提升了量化稳定性

模型导出增强

完善了llmexport工具的错误处理和日志输出
优化了模型导出流程，提升了导出稳定性
修订了压缩工具相关文档，增加了HQQ量化使用说明

Core Feature Updates

1. Added Support for HQQ Quantization Algorithm

Integrated HQQ quantization algorithm into MNNConvert tool, which can be enabled via the --hqq parameter
HQQ quantization supports asymmetric quantization, significantly improving the accuracy of quantized models
Supports combination with block-wise quantization to further optimize model accuracy

2. Added Support for EAGLE-3 Speculative Decoding Algorithm

Implemented EAGLE-3 speculative decoding algorithm to improve large language model inference efficiency
Implemented EagleGeneration class to support draft model-based speculative decoding
Provided Eagle model export tools supporting export of three components: eagle, eagle_fc, and eagle_d2t

3. Enhanced Support for Qwen Series Models

Fixed and optimized inference issues with Qwen3-Embedding model
Added support for Qwen3-VL multimodal large model
Improved llmexport tool's export support for Qwen series models

Detailed Changes

Model Inference Optimization

Refactored LLM model loading logic with enhanced error handling in the Llm::load() method
Optimized KV Cache manager implementation to improve memory management efficiency during inference
Improved attention mechanism implementation in Metal backend
Optimized convolution execution efficiency in OpenCL backend

Quantization Tool Improvements

Integrated HQQ quantizer in WeightQuantAndCoding.cpp for more precise weight quantization
Optimized quantization parameter configuration logic to automatically set asymmetric quantization when HQQ is enabled
Fixed bugs in the quantization process, improving quantization stability

Model Export Enhancements

Improved error handling and log output in llmexport tool
Optimized model export process to improve export stability
Revised compression tool documentation with added HQQ quantization usage instructions

Assets 7