Merge pull request #4605 from martin-frbg/changelog0327

martin-frbg · web-flow · commit 1dcbc4e0bb9b · 2024-04-04T21:35:47.000+02:00
Update Changelog.txt for 0.3.27
diff --git a/Changelog.txt b/Changelog.txt
@@ -1,4 +1,104 @@
 OpenBLAS ChangeLog
+====================================================================
+Version 0.3.27
+ 4-Apr-2024
+
+general:
+- added initial (generic) support for the CSKY architecture
+- capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating
+  underutilized or idle threads
+- sped up multithreaded POTRF on all platforms
+- added extension openblas_set_num_threads_local() that returns the previous thread count
+- re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading 
+  for too small workloads
+- improved the fallback code used when the precompiled number of threads is exceeded,
+  and made it callable multiple times during the lifetime of an instance
+- added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC
+- fixed a potential buffer overflow in the interface to the GEMMT kernels
+- fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14
+- fixed unwanted case sensitivity of the character parameters in ?TRTRS
+- sped up the OpenMP thread management code
+- fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK
+- fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library
+- added a testsuite for the BLAS extensions
+- modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress
+  spurious errors
+- added support for building the benchmark collection with CMAKE
+- added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds
+  with OpenMP enabled that use clang with gfortran
+- fixed building on systems with ucLibc
+- added support for calling ?NRM2 with a negative increment value on all architectures
+- added support for the LLVM18 version of the flang-new compiler
+- fixed handling of the OPENBLAS_LOOPS variable in several benchmarks
+- Integrated fixes from the Reference-LAPACK project:
+  - Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981)
+  
+x86:
+- fixed handling of NaN and Inf arguments in ZSCAL
+- fixed GEMM3M functions failing in CMAKE builds
+
+x86-64:
+- removed all instances of sched_yield() on Linux and BSD
+- fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26)
+- fixed GEMM3M functions failing in CMAKE builds
+- fixed handling of NaN and Inf arguments in ZSCAL
+- added compiler checks for AVX512BF16 compatibility
+- fixed LLVM compiler options for Sapphire Rapids 
+- fixed cpu handling fallbacks for Sapphire Rapids with
+  disabled AVX2 in DYNAMIC_ARCH mode
+- fixed extensions SCSUM and DZSUM
+- improved GEMM performance for ZEN targets
+
+arm:
+- fixed handling of NaN and Inf arguments in ZSCAL
+
+arm64:
+- added initial support for the Cortex-A76 cpu
+- fixed handling of NaN and Inf arguments in ZSCAL
+- fixed default compiler options for gcc (-march and -mtune)
+- added support for ArmCompilerForLinux
+- added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds
+- fixed mishandling of the INTERFACE64 option in CMAKE builds
+- corrected SCSUM kernels (erroneously duplicating SCASUM behaviour)  
+- added SVE-enabled kernels for CSUM/ZSUM
+- worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M
+
+power:
+- improved performance of SGEMM on POWER8/9/10
+- improved performance of DGEMM on POWER10
+- added support for OpenMP builds with xlc/xlf on AIX
+- improved cpu autodetection for DYNAMIC_ARCH builds on older AIX
+- fixed cpu core counting on AIX
+- added support for building a shared library on AIX
+
+riscv64:
+- added support for the X280 cpu
+- added support for semi-generic RISCV models with vector length 128 or 256
+- added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers
+- fixed handling of NaN and Inf arguments in ZSCAL
+- improved cpu model autodetection
+- fixed corner cases in ?AXPBY for C910V
+- fixed handling of zero increments in ?AXPY kernels for C910V
+
+loongarch64:
+- added optimized kernels for ?AMIN and ?AMAX
+- fixed handling of NaN and Inf arguments in ZSCAL
+- fixed handling of corner cases in ?AXPBY
+- fixed computation of SAMIN and DAMIN in LSX mode
+- fixed computation of ?ROT
+- added optimized SSYMV and DSYMV kernels for LSX and LASX mode
+- added optimized CGEMM and ZGEMM kernels for LSX and LASX mode
+- added optimized CGEMV and ZGEMV kernels
+
+mips:
+- fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22)
+- fixed handling of NaN and Inf arguments in ZSCAL
+- fixed mishandling of the INTERFACE64 option in CMAKE builds
+
+zarch:
+- fixed handling of NaN and Inf arguments in ZSCAL
+- fixed calculation of ?SUM on Z13
+
 ====================================================================
 Version 0.3.26
  2-Jan-2024