|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.10 |
| 4 | + 14-Jun-2020 |
| 5 | + |
| 6 | +common: |
| 7 | + * Improved thread locking behaviour in blas_server and parallel getrf |
| 8 | + * Imported bugfix 394 from LAPACK (spurious reference to "XERBL" |
| 9 | + due to overlong lines) |
| 10 | + * Imported bugfix 403 from LAPACK (compile option "recursive" required |
| 11 | + for correctness with Intel and PGI) |
| 12 | + * Imported bugfix 408 from LAPACK (wrong scaling in ZHEEQUB) |
| 13 | + * Imported bugfix 411 from LAPACK (infinite loop in LARGV/LARTG/LARTGP) |
| 14 | + * Fixed mismatches between BUFFERSIZE and GEMM_UNROLL parameters that |
| 15 | + could lead to crashes at large matrix sizes |
| 16 | + * Restored internal soname in dynamic libraries on FreeBSD and Dragonfly |
| 17 | + * Added API (openblas_setaffinity) to set the thread affinity on Linux |
| 18 | + * Added initial infrastructure for half-precision floating point |
| 19 | + (bfloat16) support with a generic implementation of SHGEMM |
| 20 | + * Added CMAKE build system support for building the cblas_Xgemm3m |
| 21 | + functions |
| 22 | + * Fixed CMAKE support for building in a path with embedded spaces |
| 23 | + * Fixed CMAKE (non)handling of NO_EXPRECISION and MAX_STACK_ALLOC |
| 24 | + * Fixed GCC version detection in the Makefiles |
| 25 | + * Allowed overriding the names of AR, AS and LD in Makefile builds |
| 26 | + |
| 27 | +POWER: |
| 28 | + * Fixed big-endian POWER8 ELFv2 builds on FreeBSD |
| 29 | + * Fixed GCC version checks and DYNAMIC_ARCH builds on POWER9 |
| 30 | + * Fixed CMAKE build support for POWER9 |
| 31 | + * fixed a potential race condition in the thread buffer allocation |
| 32 | + * Worked around LAPACK test failures on PPC G4 |
| 33 | + |
| 34 | +MIPS: |
| 35 | + * Fixed a potential race condition in the thread buffer allocation |
| 36 | + * Added support for MIPS 24K/24KE family based on P5600 kernels |
| 37 | + |
| 38 | +MIPS64: |
| 39 | + * fixed a potential race condition in the thread buffer allocation |
| 40 | + * Added TARGET=GENERIC |
| 41 | + |
| 42 | +ARMV7: |
| 43 | + * Fixed a race condition in the thread buffer allocation |
| 44 | + |
| 45 | +ARMV8: |
| 46 | + * Fixed a race condition in the thread buffer allocation |
| 47 | + * Fixed zero initialisation in the assembly for SGEMM and DGEMM BETA |
| 48 | + * Improved performance of the ThunderX2 DAXPY kernel |
| 49 | + * Added an optimized SGEMM kernel for Cortex A53 |
| 50 | + * Fixed Makefile support for INTERFACE64 (8-byte integer) |
| 51 | + |
| 52 | +x86_64: |
| 53 | + * Fixed a syntax error in the CMAKE setup for SkylakeX |
| 54 | + * Improved performance of STRSM on Haswell, SkylakeX and Ryzen |
| 55 | + * Improved SGEMM performance on SGEMM for workloads with ldc a |
| 56 | + multiple of 1024 |
| 57 | + * Improved DGEMM performance on Skylake X |
| 58 | + * Fixed unwanted AVX512-dependency of SGEMM in DYNAMIC_ARCH |
| 59 | + builds created on SkylakeX |
| 60 | + * Removed data alignment requirement in the SSE2 copy kernels |
| 61 | + that could cause spurious crashes |
| 62 | + * Added a workaround for an optimizer bug in AppleClang 11.0.3 |
| 63 | + * Fixed LAPACK test failures due to wrong options for Intel Fortran |
| 64 | + * Fixed compilation and LAPACK test results with recent Flang |
| 65 | + and AMD AOCC |
| 66 | + * Fixed DYNAMIC_ARCH builds with CMAKE on OS X |
| 67 | + * Fixed missing exports of cblas_i?amin, cblas_i?min, cblas_i?max, |
| 68 | + cblas_?sum, cblas_?gemm3m in the shared library on OS |
| 69 | + * Fixed reporting of cpu name in DYNAMIC_ARCH builds (would sometimes |
| 70 | + show the name of an older generation chip supported by the same kernels) |
| 71 | + |
| 72 | +IBM Z: |
| 73 | + * Improved performance of SGEMM/STRMM and DGEMM/DTRMM on Z14 |
| 74 | + |
2 | 75 | ====================================================================
|
3 | 76 | Version 0.3.9
|
4 | 77 | 1-Mar-2020
|
|
0 commit comments