v24.11
·
4837 commits
to main
since this release
v24.11 Public Major Release
Feat
- Add SVE SoftmaxLayer kernel for BF16
- Provide stateless API for CpuGemmLowpMatrixMultiplyCore, CpuQuantize, and DequantizationLayer
- Extend static quantization interface for both matmul and convolution operations
Fix
- Clarify Third-Party IP licenses
- Check if CpuGemmAssemblyDispatch is configured in CpuMatMul before continue
- Add BF16 support for CpuGemmAssemblyDispatchWrapper
- Detect SVE support on Windows® to run the available kernels
- Fixed missing cstdint include which occurs with GCC 15
- Disable -O2 when building for Windows® as this crashes when certain compiler versions are used
- Make cast on CPU truncate float to int instead of round to be consistent with other ML frameworks
- Return error in validate() for CpuGemmLowpMatrixMultiplyCore if pretransposed A or B are true as this is not supported
- Avoid implicit conversion from __fp16 to arm_compute::bfloat16 to avoid illegal instructions in hardware with FP16 but no BF16 support
- Softmax SME2 kernel selection now correctly detects if SME2 is supported
- Requantization rounding issues in CPU/GPU Quantize
- Scale normalising coefficient in GPU LogSoftmax
- Apply consistent rounding policy in NEReduceMean
- Revert default memory manager for NEQLSTMLayer
- Create default memory manager when none is provided
Refactor
- Turn duplicated code in the elementwise_binary kernel into templates to reduce code size
- Move CpuSoftmaxKernel LUT to LUTManager to consolidate location of all LUTs
Perf
- Use SME instead of SVE for subtractions in SoftmaxLayer for Q8 relating to LUT address calculation
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.11/index.xhtml