v24.11

developer-compute released this 18 Nov 11:53

· 4837 commits to main since this release

f44f09d

v24.11 Public Major Release

Feat

Add SVE SoftmaxLayer kernel for BF16
Provide stateless API for CpuGemmLowpMatrixMultiplyCore, CpuQuantize, and DequantizationLayer
Extend static quantization interface for both matmul and convolution operations

Fix

Clarify Third-Party IP licenses
Check if CpuGemmAssemblyDispatch is configured in CpuMatMul before continue
Add BF16 support for CpuGemmAssemblyDispatchWrapper
Detect SVE support on Windows® to run the available kernels
Fixed missing cstdint include which occurs with GCC 15
Disable -O2 when building for Windows® as this crashes when certain compiler versions are used
Make cast on CPU truncate float to int instead of round to be consistent with other ML frameworks
Return error in validate() for CpuGemmLowpMatrixMultiplyCore if pretransposed A or B are true as this is not supported
Avoid implicit conversion from __fp16 to arm_compute::bfloat16 to avoid illegal instructions in hardware with FP16 but no BF16 support
Softmax SME2 kernel selection now correctly detects if SME2 is supported
Requantization rounding issues in CPU/GPU Quantize
Scale normalising coefficient in GPU LogSoftmax
Apply consistent rounding policy in NEReduceMean
Revert default memory manager for NEQLSTMLayer
Create default memory manager when none is provided

Refactor

Turn duplicated code in the elementwise_binary kernel into templates to reduce code size
Move CpuSoftmaxKernel LUT to LUTManager to consolidate location of all LUTs

Perf

Use SME instead of SVE for subtractions in SoftmaxLayer for Q8 relating to LUT address calculation
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.11/index.xhtml

Assets 10