|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.11 |
| 4 | + 17-Oct-2020 |
| 5 | + |
| 6 | + common: |
| 7 | + * API change: |
| 8 | + the newly added BFLOAT16 functions were renamed to use the |
| 9 | + letter "B" instead of "H" to avoid potential confusion with |
| 10 | + the IEEE "half precision float" type, i.e. the 0.3.10 |
| 11 | + SHGEMM is now SBGEMM and the corresponding build option |
| 12 | + was changed from "BUILD_HALF" to "BUILD_BFLOAT16". |
| 13 | + * Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper |
| 14 | + limit for placing temporary arrays on the stack) to be compatible |
| 15 | + with a stack size of 1mb (as imposed by the JAVA runtime library) |
| 16 | + * Added mixed-precision dot function SBDOT and utility functions |
| 17 | + shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between |
| 18 | + single or double precision float arrays and bfloat16 arrays |
| 19 | + * Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions |
| 20 | + in lapack.h |
| 21 | + * Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2 |
| 22 | + (causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263) |
| 23 | + * Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415) |
| 24 | + * Fixed several bugs in the LAPACK testsuite |
| 25 | + * Improved performance of TRMM and TRSM for certain problem sizes |
| 26 | + * Fixed infinite recursions and workspace miscalculations in ReLAPACK |
| 27 | + * CMAKE builds no longer require pkg-config for creating the .pc file |
| 28 | + * Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as |
| 29 | + enabling these options |
| 30 | + * Fixed detection of gfortran when invoked through an mpi wrapper |
| 31 | + * Improve thread reinitialization performance with OpenMP xafter a fork |
| 32 | + * Added support for building only the subset of the library required |
| 33 | + for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE |
| 34 | + * Optional function name prefixes and suffixes are now correctly |
| 35 | + reflected in the generated cblas.h |
| 36 | + * Added CMAKE build support for the LAPACK and multithreading tests |
| 37 | + |
| 38 | +POWER: |
| 39 | + * Added optimized support for POWER10 |
| 40 | + * Added support for compiling for POWER8 in 32bit mode |
| 41 | + * Added support for compilation with LLVM/clang |
| 42 | + * Added support for compilation with NVIDIA/PGI compilers |
| 43 | + * Fixed building on big-endian POWER8 |
| 44 | + * Fixed miscompilation of ZDOTC by gcc10 |
| 45 | + * Fixed alignment errors in the POWER8 SAXPY kernel |
| 46 | + * Improved CPU detection on AIX |
| 47 | + * Supported building with older compilers on POWER9 |
| 48 | + |
| 49 | +x86_64: |
| 50 | + * Added support for Intel Cooperlake |
| 51 | + * Added autodetection of AMD Renoir/Matisse/Zen3 cpus |
| 52 | + * Added autodetection of Intel Comet Lake cpus |
| 53 | + * Reimplemented ?sum, ?dot and daxpy using universal intrinsics |
| 54 | + * Reset the fpu state before using the fpu on Windows as a workaround |
| 55 | + for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004) |
| 56 | + * Fixed potentially undefined behaviour in the dot and gemv_t kernels |
| 57 | + * Fixed a potential segmentation fault in DYNAMIC_ARCH builds |
| 58 | + * Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers |
| 59 | + |
| 60 | +ARMV7: |
| 61 | + * Fixed cpu detection on BSD-like systems |
| 62 | + |
| 63 | +ARMV8: |
| 64 | + * Added preliminary support for Apple Vortex cpus |
| 65 | + * Added support for the Cavium ThunderX3T110 cpu |
| 66 | + * Fixed cpu detection on BSD-like systems |
| 67 | + * Fixed compilation in -std=C18 mode |
| 68 | + |
| 69 | + |
| 70 | +IBM Z: |
| 71 | + * Added support for compiling with the clang compiler |
| 72 | + * Improved GEMM performance on Z14 |
| 73 | + |
2 | 74 | ====================================================================
|
3 | 75 | Version 0.3.10
|
4 | 76 | 14-Jun-2020
|
|
0 commit comments