|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.27 |
| 4 | + 4-Apr-2024 |
| 5 | + |
| 6 | +general: |
| 7 | +- added initial (generic) support for the CSKY architecture |
| 8 | +- capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating |
| 9 | + underutilized or idle threads |
| 10 | +- sped up multithreaded POTRF on all platforms |
| 11 | +- added extension openblas_set_num_threads_local() that returns the previous thread count |
| 12 | +- re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading |
| 13 | + for too small workloads |
| 14 | +- improved the fallback code used when the precompiled number of threads is exceeded, |
| 15 | + and made it callable multiple times during the lifetime of an instance |
| 16 | +- added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC |
| 17 | +- fixed a potential buffer overflow in the interface to the GEMMT kernels |
| 18 | +- fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14 |
| 19 | +- fixed unwanted case sensitivity of the character parameters in ?TRTRS |
| 20 | +- sped up the OpenMP thread management code |
| 21 | +- fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK |
| 22 | +- fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library |
| 23 | +- added a testsuite for the BLAS extensions |
| 24 | +- modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress |
| 25 | + spurious errors |
| 26 | +- added support for building the benchmark collection with CMAKE |
| 27 | +- added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds |
| 28 | + with OpenMP enabled that use clang with gfortran |
| 29 | +- fixed building on systems with ucLibc |
| 30 | +- added support for calling ?NRM2 with a negative increment value on all architectures |
| 31 | +- added support for the LLVM18 version of the flang-new compiler |
| 32 | +- fixed handling of the OPENBLAS_LOOPS variable in several benchmarks |
| 33 | +- Integrated fixes from the Reference-LAPACK project: |
| 34 | + - Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981) |
| 35 | + |
| 36 | +x86: |
| 37 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 38 | +- fixed GEMM3M functions failing in CMAKE builds |
| 39 | + |
| 40 | +x86-64: |
| 41 | +- removed all instances of sched_yield() on Linux and BSD |
| 42 | +- fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26) |
| 43 | +- fixed GEMM3M functions failing in CMAKE builds |
| 44 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 45 | +- added compiler checks for AVX512BF16 compatibility |
| 46 | +- fixed LLVM compiler options for Sapphire Rapids |
| 47 | +- fixed cpu handling fallbacks for Sapphire Rapids with |
| 48 | + disabled AVX2 in DYNAMIC_ARCH mode |
| 49 | +- fixed extensions SCSUM and DZSUM |
| 50 | +- improved GEMM performance for ZEN targets |
| 51 | + |
| 52 | +arm: |
| 53 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 54 | + |
| 55 | +arm64: |
| 56 | +- added initial support for the Cortex-A76 cpu |
| 57 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 58 | +- fixed default compiler options for gcc (-march and -mtune) |
| 59 | +- added support for ArmCompilerForLinux |
| 60 | +- added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds |
| 61 | +- fixed mishandling of the INTERFACE64 option in CMAKE builds |
| 62 | +- corrected SCSUM kernels (erroneously duplicating SCASUM behaviour) |
| 63 | +- added SVE-enabled kernels for CSUM/ZSUM |
| 64 | +- worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M |
| 65 | + |
| 66 | +power: |
| 67 | +- improved performance of SGEMM on POWER8/9/10 |
| 68 | +- improved performance of DGEMM on POWER10 |
| 69 | +- added support for OpenMP builds with xlc/xlf on AIX |
| 70 | +- improved cpu autodetection for DYNAMIC_ARCH builds on older AIX |
| 71 | +- fixed cpu core counting on AIX |
| 72 | +- added support for building a shared library on AIX |
| 73 | + |
| 74 | +riscv64: |
| 75 | +- added support for the X280 cpu |
| 76 | +- added support for semi-generic RISCV models with vector length 128 or 256 |
| 77 | +- added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers |
| 78 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 79 | +- improved cpu model autodetection |
| 80 | +- fixed corner cases in ?AXPBY for C910V |
| 81 | +- fixed handling of zero increments in ?AXPY kernels for C910V |
| 82 | + |
| 83 | +loongarch64: |
| 84 | +- added optimized kernels for ?AMIN and ?AMAX |
| 85 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 86 | +- fixed handling of corner cases in ?AXPBY |
| 87 | +- fixed computation of SAMIN and DAMIN in LSX mode |
| 88 | +- fixed computation of ?ROT |
| 89 | +- added optimized SSYMV and DSYMV kernels for LSX and LASX mode |
| 90 | +- added optimized CGEMM and ZGEMM kernels for LSX and LASX mode |
| 91 | +- added optimized CGEMV and ZGEMV kernels |
| 92 | + |
| 93 | +mips: |
| 94 | +- fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22) |
| 95 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 96 | +- fixed mishandling of the INTERFACE64 option in CMAKE builds |
| 97 | + |
| 98 | +zarch: |
| 99 | +- fixed handling of NaN and Inf arguments in ZSCAL |
| 100 | +- fixed calculation of ?SUM on Z13 |
| 101 | + |
2 | 102 | ====================================================================
|
3 | 103 | Version 0.3.26
|
4 | 104 | 2-Jan-2024
|
|
0 commit comments