Skip to content

Commit fc33cbc

Browse files
authored
Merge pull request #1728 from martin-frbg/changelog
Add changes from the 0.3.x releases
2 parents 66da767 + c52a831 commit fc33cbc

File tree

1 file changed

+111
-0
lines changed

1 file changed

+111
-0
lines changed

Changelog.txt

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,115 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.2
4+
30-Jul-2018
5+
6+
common:
7+
* fixes for regressions caused by the rewrite of the thread
8+
initialization code in 0.3.1
9+
10+
POWER:
11+
* fixed cpu autodetection for the BSDs
12+
13+
MIPS64:
14+
* fixed utest errors in AXPY, DSDOT, ROT and SWAP
15+
16+
x86_64:
17+
* added autodetection of AMD Ryzen 2
18+
* fixed build with older versions of MSVC
19+
20+
====================================================================
21+
Version 0.3.1
22+
01-Jul-2018
23+
24+
common:
25+
* rewritten thread initialization code with significantly reduced overhead
26+
* added CBLAS interfaces to the IxAMIN BLAS extension functions
27+
* fixed the lapack-test target
28+
* CMAKE builds now create an OpenBLASConfig.cmake file
29+
* ZAXPY now uses a single thread for small input sizes
30+
* the LAPACK code was updated from Reference-LAPACK/lapack#253
31+
(fixing LAPACKE interfaces to Aasen's functions)
32+
33+
POWER:
34+
* corrected CROT and ZROT behaviour with zero INC_X
35+
36+
ARMV7:
37+
* corrected xDOT behaviour with zero INC_X or INC_Y
38+
39+
x86_64:
40+
* retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER,
41+
this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO
42+
(which will still be supported via the slower PRESCOTT kernels when this option is not set)
43+
* added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to
44+
specify the list of x86_64 targets to include. Any target not on the list will be supported
45+
by the Sandybridge or Nehalem kernels if available, or by Prescott.
46+
* improved SWITCH_RATIO on Haswell for increased GEMM throughput
47+
* added initial support for Intel Skylake X, including an AVX512 SGEMM kernel
48+
* added autodetection of Intel Cannon Lake series as Skylake X
49+
* added a default L2 cache size for hypervisors that return zero here (Chromebook)
50+
* fixed a name clash with recent Windows10 headers that broke the build with (at least)
51+
recent mingw from MSYS2
52+
* fixed a link error in mixed clang/gfortran builds with OpenMP
53+
* updated the OSX deployment target to 10.8
54+
* switched on parallel make for builds on MS Windows by default
55+
56+
x86:
57+
* fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y
58+
59+
====================================================================
60+
Version 0.3.0
61+
23-May-2108
62+
63+
common:
64+
* fixed some more thread race and locking bugs
65+
* added preliminary support for calling an OpenMP build of the library from multiple threads
66+
* removed performance impact of thread locks added in 0.2.20 on OpenMP code
67+
* general code cleanup
68+
* optimized DSDOT implementation
69+
* improved thread distribution for GEMM
70+
* corrected IMATCOPY/OMATCOPY implementation
71+
* fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations
72+
* cmake build improvements
73+
* pkgconfig file now contains build options
74+
* openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build
75+
* corrections and improvements for systems with more than 64 cpus
76+
* LAPACK code updated to 3.8.0 including later fixes
77+
* added ReLAPACK, a recursive implementation of several LAPACK functions
78+
* Rewrote ROTMG to handle cases that the netlib code failed to address
79+
* Disabled (broken) multithreading code for xTRMV
80+
* corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard
81+
* shared memory access failures on startup are now handled more gracefully
82+
* restored utests from earlier releases (and made them pass on all affected systems)
83+
84+
SPARC:
85+
* several fixes for cpu autodetection
86+
87+
POWER:
88+
* corrected vector register overwriting in several Power8 kernels
89+
* optimized additional BLAS functions
90+
91+
ARM:
92+
* added support for CortexA53 and A72
93+
* added autodetection for ThunderX2T99
94+
* made most optimized kernels the default for generic ARMv8 targets
95+
96+
x86_64:
97+
* parallelized DDOT kernel for Haswell
98+
* changed alignment directives in assembly kernels to boost performance on OSX
99+
* fixed register handling in the GEMV microkernels (bug exposed by gcc7)
100+
* added support for building on OpenBSD and Dragonfly
101+
* updated compiler options to work with Intel release 2018
102+
* support fully optimized build with clang/flang on Microsoft Windows
103+
* fixed building on AIX
104+
105+
IBM Z:
106+
* added optimized BLAS 1/2 functions
107+
108+
MIPS:
109+
* fixed cpu autodetection helper code
110+
* added mips32 1004K cpu (Mediatek MT7621 and similar SoC)
111+
* added mips64 I6500 cpu
112+
2113
====================================================================
3114
Version 0.2.20
4115
24-Jul-2017

0 commit comments

Comments
 (0)