Skip to content

Commit 1dcbc4e

Browse files
authored
Merge pull request #4605 from martin-frbg/changelog0327
Update Changelog.txt for 0.3.27
2 parents f5e5109 + c518407 commit 1dcbc4e

File tree

1 file changed

+100
-0
lines changed

1 file changed

+100
-0
lines changed

Changelog.txt

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,104 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.27
4+
4-Apr-2024
5+
6+
general:
7+
- added initial (generic) support for the CSKY architecture
8+
- capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating
9+
underutilized or idle threads
10+
- sped up multithreaded POTRF on all platforms
11+
- added extension openblas_set_num_threads_local() that returns the previous thread count
12+
- re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading
13+
for too small workloads
14+
- improved the fallback code used when the precompiled number of threads is exceeded,
15+
and made it callable multiple times during the lifetime of an instance
16+
- added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC
17+
- fixed a potential buffer overflow in the interface to the GEMMT kernels
18+
- fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14
19+
- fixed unwanted case sensitivity of the character parameters in ?TRTRS
20+
- sped up the OpenMP thread management code
21+
- fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK
22+
- fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library
23+
- added a testsuite for the BLAS extensions
24+
- modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress
25+
spurious errors
26+
- added support for building the benchmark collection with CMAKE
27+
- added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds
28+
with OpenMP enabled that use clang with gfortran
29+
- fixed building on systems with ucLibc
30+
- added support for calling ?NRM2 with a negative increment value on all architectures
31+
- added support for the LLVM18 version of the flang-new compiler
32+
- fixed handling of the OPENBLAS_LOOPS variable in several benchmarks
33+
- Integrated fixes from the Reference-LAPACK project:
34+
- Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981)
35+
36+
x86:
37+
- fixed handling of NaN and Inf arguments in ZSCAL
38+
- fixed GEMM3M functions failing in CMAKE builds
39+
40+
x86-64:
41+
- removed all instances of sched_yield() on Linux and BSD
42+
- fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26)
43+
- fixed GEMM3M functions failing in CMAKE builds
44+
- fixed handling of NaN and Inf arguments in ZSCAL
45+
- added compiler checks for AVX512BF16 compatibility
46+
- fixed LLVM compiler options for Sapphire Rapids
47+
- fixed cpu handling fallbacks for Sapphire Rapids with
48+
disabled AVX2 in DYNAMIC_ARCH mode
49+
- fixed extensions SCSUM and DZSUM
50+
- improved GEMM performance for ZEN targets
51+
52+
arm:
53+
- fixed handling of NaN and Inf arguments in ZSCAL
54+
55+
arm64:
56+
- added initial support for the Cortex-A76 cpu
57+
- fixed handling of NaN and Inf arguments in ZSCAL
58+
- fixed default compiler options for gcc (-march and -mtune)
59+
- added support for ArmCompilerForLinux
60+
- added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds
61+
- fixed mishandling of the INTERFACE64 option in CMAKE builds
62+
- corrected SCSUM kernels (erroneously duplicating SCASUM behaviour)
63+
- added SVE-enabled kernels for CSUM/ZSUM
64+
- worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M
65+
66+
power:
67+
- improved performance of SGEMM on POWER8/9/10
68+
- improved performance of DGEMM on POWER10
69+
- added support for OpenMP builds with xlc/xlf on AIX
70+
- improved cpu autodetection for DYNAMIC_ARCH builds on older AIX
71+
- fixed cpu core counting on AIX
72+
- added support for building a shared library on AIX
73+
74+
riscv64:
75+
- added support for the X280 cpu
76+
- added support for semi-generic RISCV models with vector length 128 or 256
77+
- added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers
78+
- fixed handling of NaN and Inf arguments in ZSCAL
79+
- improved cpu model autodetection
80+
- fixed corner cases in ?AXPBY for C910V
81+
- fixed handling of zero increments in ?AXPY kernels for C910V
82+
83+
loongarch64:
84+
- added optimized kernels for ?AMIN and ?AMAX
85+
- fixed handling of NaN and Inf arguments in ZSCAL
86+
- fixed handling of corner cases in ?AXPBY
87+
- fixed computation of SAMIN and DAMIN in LSX mode
88+
- fixed computation of ?ROT
89+
- added optimized SSYMV and DSYMV kernels for LSX and LASX mode
90+
- added optimized CGEMM and ZGEMM kernels for LSX and LASX mode
91+
- added optimized CGEMV and ZGEMV kernels
92+
93+
mips:
94+
- fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22)
95+
- fixed handling of NaN and Inf arguments in ZSCAL
96+
- fixed mishandling of the INTERFACE64 option in CMAKE builds
97+
98+
zarch:
99+
- fixed handling of NaN and Inf arguments in ZSCAL
100+
- fixed calculation of ?SUM on Z13
101+
2102
====================================================================
3103
Version 0.3.26
4104
2-Jan-2024

0 commit comments

Comments
 (0)