|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.2 |
| 4 | +30-Jul-2018 |
| 5 | + |
| 6 | +common: |
| 7 | + * fixes for regressions caused by the rewrite of the thread |
| 8 | + initialization code in 0.3.1 |
| 9 | + |
| 10 | +POWER: |
| 11 | + * fixed cpu autodetection for the BSDs |
| 12 | + |
| 13 | +MIPS64: |
| 14 | + * fixed utest errors in AXPY, DSDOT, ROT and SWAP |
| 15 | + |
| 16 | +x86_64: |
| 17 | + * added autodetection of AMD Ryzen 2 |
| 18 | + * fixed build with older versions of MSVC |
| 19 | + |
| 20 | +==================================================================== |
| 21 | +Version 0.3.1 |
| 22 | +01-Jul-2018 |
| 23 | + |
| 24 | +common: |
| 25 | + * rewritten thread initialization code with significantly reduced overhead |
| 26 | + * added CBLAS interfaces to the IxAMIN BLAS extension functions |
| 27 | + * fixed the lapack-test target |
| 28 | + * CMAKE builds now create an OpenBLASConfig.cmake file |
| 29 | + * ZAXPY now uses a single thread for small input sizes |
| 30 | + * the LAPACK code was updated from Reference-LAPACK/lapack#253 |
| 31 | + (fixing LAPACKE interfaces to Aasen's functions) |
| 32 | + |
| 33 | +POWER: |
| 34 | + * corrected CROT and ZROT behaviour with zero INC_X |
| 35 | + |
| 36 | +ARMV7: |
| 37 | + * corrected xDOT behaviour with zero INC_X or INC_Y |
| 38 | + |
| 39 | +x86_64: |
| 40 | + * retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER, |
| 41 | + this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO |
| 42 | + (which will still be supported via the slower PRESCOTT kernels when this option is not set) |
| 43 | + * added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to |
| 44 | + specify the list of x86_64 targets to include. Any target not on the list will be supported |
| 45 | + by the Sandybridge or Nehalem kernels if available, or by Prescott. |
| 46 | + * improved SWITCH_RATIO on Haswell for increased GEMM throughput |
| 47 | + * added initial support for Intel Skylake X, including an AVX512 SGEMM kernel |
| 48 | + * added autodetection of Intel Cannon Lake series as Skylake X |
| 49 | + * added a default L2 cache size for hypervisors that return zero here (Chromebook) |
| 50 | + * fixed a name clash with recent Windows10 headers that broke the build with (at least) |
| 51 | + recent mingw from MSYS2 |
| 52 | + * fixed a link error in mixed clang/gfortran builds with OpenMP |
| 53 | + * updated the OSX deployment target to 10.8 |
| 54 | + * switched on parallel make for builds on MS Windows by default |
| 55 | + |
| 56 | +x86: |
| 57 | + * fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y |
| 58 | + |
| 59 | +==================================================================== |
| 60 | +Version 0.3.0 |
| 61 | +23-May-2108 |
| 62 | + |
| 63 | +common: |
| 64 | + * fixed some more thread race and locking bugs |
| 65 | + * added preliminary support for calling an OpenMP build of the library from multiple threads |
| 66 | + * removed performance impact of thread locks added in 0.2.20 on OpenMP code |
| 67 | + * general code cleanup |
| 68 | + * optimized DSDOT implementation |
| 69 | + * improved thread distribution for GEMM |
| 70 | + * corrected IMATCOPY/OMATCOPY implementation |
| 71 | + * fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations |
| 72 | + * cmake build improvements |
| 73 | + * pkgconfig file now contains build options |
| 74 | + * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build |
| 75 | + * corrections and improvements for systems with more than 64 cpus |
| 76 | + * LAPACK code updated to 3.8.0 including later fixes |
| 77 | + * added ReLAPACK, a recursive implementation of several LAPACK functions |
| 78 | + * Rewrote ROTMG to handle cases that the netlib code failed to address |
| 79 | + * Disabled (broken) multithreading code for xTRMV |
| 80 | + * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard |
| 81 | + * shared memory access failures on startup are now handled more gracefully |
| 82 | + * restored utests from earlier releases (and made them pass on all affected systems) |
| 83 | + |
| 84 | +SPARC: |
| 85 | + * several fixes for cpu autodetection |
| 86 | + |
| 87 | +POWER: |
| 88 | + * corrected vector register overwriting in several Power8 kernels |
| 89 | + * optimized additional BLAS functions |
| 90 | + |
| 91 | +ARM: |
| 92 | + * added support for CortexA53 and A72 |
| 93 | + * added autodetection for ThunderX2T99 |
| 94 | + * made most optimized kernels the default for generic ARMv8 targets |
| 95 | + |
| 96 | +x86_64: |
| 97 | + * parallelized DDOT kernel for Haswell |
| 98 | + * changed alignment directives in assembly kernels to boost performance on OSX |
| 99 | + * fixed register handling in the GEMV microkernels (bug exposed by gcc7) |
| 100 | + * added support for building on OpenBSD and Dragonfly |
| 101 | + * updated compiler options to work with Intel release 2018 |
| 102 | + * support fully optimized build with clang/flang on Microsoft Windows |
| 103 | + * fixed building on AIX |
| 104 | + |
| 105 | +IBM Z: |
| 106 | + * added optimized BLAS 1/2 functions |
| 107 | + |
| 108 | +MIPS: |
| 109 | + * fixed cpu autodetection helper code |
| 110 | + * added mips32 1004K cpu (Mediatek MT7621 and similar SoC) |
| 111 | + * added mips64 I6500 cpu |
| 112 | + |
2 | 113 | ====================================================================
|
3 | 114 | Version 0.2.20
|
4 | 115 | 24-Jul-2017
|
|
0 commit comments