|
| 1 | +# ######################################################################## |
| 2 | +# Copyright 2013 Advanced Micro Devices, Inc. |
| 3 | +# |
| 4 | +# Licensed under the Apache License, Version 2.0 (the "License"); |
| 5 | +# you may not use this file except in compliance with the License. |
| 6 | +# You may obtain a copy of the License at |
| 7 | +# |
| 8 | +# http://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +# |
| 10 | +# Unless required by applicable law or agreed to in writing, software |
| 11 | +# distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | +# See the License for the specific language governing permissions and |
| 14 | +# limitations under the License. |
| 15 | +# ######################################################################## |
| 16 | + |
| 17 | +clBLAS Readme |
| 18 | + |
| 19 | +Version: 1.10 |
| 20 | +Release Date: April 2013 |
| 21 | + |
| 22 | +ChangeLog: |
| 23 | +____________ |
| 24 | +Current Version: |
| 25 | +New: |
| 26 | + * New Level 1 routines added (an 'x' implies all 4 precisions) |
| 27 | + xSWAP, xCOPY, xSCAL, CSSCAL, ZDSCAL, xAXPY, SDOT, DDOT, |
| 28 | + CDOTU, ZDOTU, CDOTC, ZDOTC, xROTG, SROTMG, DROTMG, |
| 29 | + SROT, DROT, CSROT, ZDROT, SROTM, DROTM, SNRM2, DNRM2, |
| 30 | + SCNRM2, DZNRM2, ixAMAX, SASUM, DASUM, SCASUM, DZASUM |
| 31 | + * Samples have been added for the new functions |
| 32 | + * This release tested using the 9.012 runtime driver and the 2.8 APPSDK |
| 33 | +Fixed: |
| 34 | + * Failures in *trsm functions with clMAGMA tests |
| 35 | +Known Issues: |
| 36 | + * Failures & hangs in ztrmm, *trsv, *tpsv functions on Southern Island GPU devices |
| 37 | + * Failures in zgemm functions on Northern Island GPU devices |
| 38 | + * Failures & hangs are expected to be fixed in the upcoming AMD graphics driver versions. |
| 39 | + It is strongly recommended that users keep their graphics driver versions up to date. |
| 40 | + |
| 41 | +____________ |
| 42 | +Version 1.8.291: |
| 43 | +Fixed: |
| 44 | + * Failures in the following functions: ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, |
| 45 | + ctrsv, csymm, cher2, ztrmm on Southern Island GPU devices. |
| 46 | + * Failures in the following functions: dsyr, dsyr2, dgemv, dsyrk, |
| 47 | + dsyr2k, zsyr2k on Trinity platforms. |
| 48 | +Known Issues: |
| 49 | + * Failures in *trsm functions with clMAGMA tests |
| 50 | + |
| 51 | +____________ |
| 52 | +Version 1.8.269 (Beta, clMAGMA support): |
| 53 | +New: |
| 54 | + * No new routines |
| 55 | + * This release tested using the 8.961 runtime driver and the 2.6 APPSDK |
| 56 | + |
| 57 | +Known Issues: |
| 58 | + * The clBLASTune executable has been observed to hang on Windows. If |
| 59 | + this happens, abort execution of the tune program; it is not required |
| 60 | + for correct operation of the BLAS routines (as of 8.872). |
| 61 | + * clBLAS can return invalid results on CPU devices (as |
| 62 | + of 8.961). The CPU device is primarily a test/debug device, and GPU |
| 63 | + devices are unaffected. |
| 64 | + * clBLAS can return invalid results for double precision functions (dsyr, |
| 65 | + dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of |
| 66 | + 8.961). |
| 67 | + * clBLAS can return invalid results (ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, |
| 68 | + ctrsv, csymm, cher2, ztrmm) on Southern Island GPU devices (as of 8.961). |
| 69 | + |
| 70 | +____________ |
| 71 | +Version 1.7 (Beta, clMAGMA support): |
| 72 | +New: |
| 73 | + * New Level 3 routines added (an 'x' implies all 4 precisions) |
| 74 | + CHER2K, ZHER2K |
| 75 | + * New Level 2 routines added (an 'x' implies all 4 precisions) |
| 76 | + xTPMV, xTPSV, SSPVM, DSPMV, CHPMV, ZHPMV, SSPR, DSPR, CHPR, ZHPR, |
| 77 | + SSPR2, DSPR2, CHPR2, ZHPR2, xGBMV, CHBMV, ZHBMV, SSBMV, DSBMV, |
| 78 | + xTBMV, xTBSV |
| 79 | + * Samples have been added for the new functions, but are not fully tested |
| 80 | + * This release tested using the 8.951 runtime driver and the 2.6 APPSDK |
| 81 | + * Note that documentation is incomplete for the new functions |
| 82 | + |
| 83 | +Known Issues: |
| 84 | + * The clBLASTune executable has been observed to hang on Windows. If |
| 85 | + this happens, abort execution of the tune program; it is not required |
| 86 | + for correct operation of the BLAS routines (as of 8.872). |
| 87 | + * clBLAS can return invalid results on CPU devices that support AVX (as |
| 88 | + of 8.951). CPU devices that support up to SSE3 are unaffected. The |
| 89 | + CPU device is primarily a test/debug device, and GPU devices are |
| 90 | + unaffected. |
| 91 | + * clBLAS can return invalid results for double precision functions (dsyr, |
| 92 | + dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of |
| 93 | + 8.951). |
| 94 | + * clBLAS can return invalid results (ssyr, ssyr2, strsv, ctrsv, ssyrk, |
| 95 | + ssyr2k, ztrmm) on Southern Island GPU devices (as of 8.951). |
| 96 | + |
| 97 | +____________ |
| 98 | +Version 1.6: |
| 99 | +New: |
| 100 | + * New Level 3 routines added (an 'x' implies all 4 precisions) |
| 101 | + CSYRK, ZSYRK, CSYR2K, ZSYR2K, CHEMM, ZHEMM, CHERK, ZHERK, xSYMM |
| 102 | + * New Level 2 routines added (an 'x' implies all 4 precisions) |
| 103 | + CGEMV, ZGEMV, xTRMV, xTRSV, CHEMV, ZHEMV, SGER, DGER, CGERU, ZGERU, |
| 104 | + CGERC, ZGERC, CHER, ZHER, CHER2, ZHER2, SSYR, DSYR, SSYR2, DSYR2 |
| 105 | + * For all the original functions prior to 1.6, a new API has been introduced |
| 106 | + with an *Ex suffix. These extended API's add new parameters that allow |
| 107 | + users to specify an offset to a matrix argument. This allows efficient |
| 108 | + sub-matrix indexing within a clBLAS routine without requiring expensive |
| 109 | + sub-matrix copy operations. |
| 110 | + * Samples have been added for the new functions |
| 111 | + * Preview: Support for AMD Radeon� HD7000 series GPUs |
| 112 | + * This release tested using the 8.92 runtime driver and the 2.6 APP SDK |
| 113 | + |
| 114 | +Known Issues: |
| 115 | + * The clBLASTune executable has been observed to hang on Windows. If this |
| 116 | + happens, abort execution of the tune program; it is not required for |
| 117 | + correct operation of the BLAS routines (as of 8.872). |
| 118 | + * The CPU device for clBLAS is not functioning for this release (as of |
| 119 | + 8.872). The CPU device is primarily a test/debug device, and GPU |
| 120 | + devices are unaffected. |
| 121 | + |
| 122 | +____________ |
| 123 | +Version 1.4: |
| 124 | +New: |
| 125 | + * New Level 3 routines added |
| 126 | + SSYRK, DSYRK, SSYR2K, DSYR2K |
| 127 | + * New Level 2 routines added |
| 128 | + SGEMV, DGEMV, SSYMV, DSYMV |
| 129 | + * The image support functions (clblasAddScratchImage, |
| 130 | + clblasRemoveScratchImage) have been deprecated. Images are no |
| 131 | + longer required for the highest performance. |
| 132 | + * InstallShield is now used for APPML libraries. The default install |
| 133 | + location has changed from c:\amd\clBLAS to |
| 134 | + C:\Program Files (x86)\AMD\clBLAS. It is recommended that previous |
| 135 | + versions of clBLAS are uninstalled first. |
| 136 | + * Samples have been added for the new functions |
| 137 | + * This release tested using the 8.872 runtime driver and the 2.5 APP SDK |
| 138 | + |
| 139 | +Known Issues: |
| 140 | + * The clBLASTune executable has been observed to hang on Windows. If this |
| 141 | + happens, abort execution of the tune program; it is not required for |
| 142 | + correct operation of the BLAS routines (as of 8.872). |
| 143 | + * The CPU device for clBLAS is not functioning for this release (as of |
| 144 | + 8.872). The CPU device is primarily a test/debug device, and GPU |
| 145 | + devices are unaffected. |
| 146 | + |
| 147 | + |
| 148 | +____________ |
| 149 | +Version 1.2: |
| 150 | + * The library now supports both 32- and 64-bit Windows and Linux operating |
| 151 | + systems. |
| 152 | + * xTRSM routines are available in 1.2. |
| 153 | + * clBLAS routines return clBLASStatus error codes, instead of native |
| 154 | + OpenCL error codes |
| 155 | + |
| 156 | +Fixed: |
| 157 | + * xTRMM routines were not properly handling implicit unit diagonal |
| 158 | + elements and implicit off-diagonal zero values specified by the BLAS |
| 159 | + parameters SIDE, UPLO and DIAG. |
| 160 | + * Possible crash with CPU device on 32-bit systems. |
| 161 | + * clblasDgemm routine return an invalid event as its last argument. |
| 162 | + * clBLAS routines return clblasStatus error codes, instead of |
| 163 | + native OpenCL error codes. |
| 164 | + |
| 165 | +Known Issues: |
| 166 | + * The clBLASTune executable has been observed to hang on Windows. If this |
| 167 | + happens, abort execution of the tune program; it is not required for |
| 168 | + correct operation of the BLAS routines (as of 8.872). |
| 169 | + * The CPU device for clBLAS is not functioning for this release (as of |
| 170 | + 8.872). The CPU device is primarily a test/debug device, and GPU |
| 171 | + devices are unaffected. |
| 172 | + |
| 173 | +____________________ |
| 174 | +Version 1.0: |
| 175 | + * Initial release |
| 176 | + |
| 177 | +Known Issues: |
| 178 | + * Available only on Linux64. |
| 179 | + * xTRMM routines were not properly handling implicit unit diagonal elements |
| 180 | + and implicit off-diagonal zero values specified by the BLAS parameters |
| 181 | + SIDE, UPLO and DIAG |
| 182 | + * clblasDgemm returned an invalid event as its last argument |
| 183 | + |
| 184 | +_____________ |
| 185 | +Building the Samples: |
| 186 | + |
| 187 | +To install the Linux versions of clBLAS, uncompress the initial download, then |
| 188 | +execute the install script. |
| 189 | + |
| 190 | +For example: |
| 191 | + |
| 192 | + tar -xf clBLAS-${version}-Linux.tar.gz |
| 193 | + - This installs three files into the local directory, one being an |
| 194 | + executable bash script. |
| 195 | + |
| 196 | + sudo mkdir /opt/clBLAS-${version} |
| 197 | + - This pre-creates the install directory with proper permissions |
| 198 | + in /opt if it is to be installed there. (This is the default.) |
| 199 | + |
| 200 | + ./install-clBLAS-${version}.sh |
| 201 | + - This prints an EULA and uncompresses files into the chosen install |
| 202 | + directory. |
| 203 | + |
| 204 | + cd ${installDir}/bin64 |
| 205 | + export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clBLASLibDir} |
| 206 | + - Be sure to export library dependencies to resolve all external |
| 207 | + linkages to the client program; you can create a bash script to |
| 208 | + help automate this procedure. |
| 209 | + |
| 210 | + ./example_sgemm |
| 211 | + - Run a simple client; one example is provided for each supported |
| 212 | + main BLAS function family. |
| 213 | + |
| 214 | +The sample program does not ship with native build files; instead, a CMake |
| 215 | +file is shipped, and the user generates a native build file for their system. |
| 216 | + |
| 217 | +For example: |
| 218 | + |
| 219 | + cd ${installDir} |
| 220 | + |
| 221 | + mkdir samplesBin/ |
| 222 | + - This creates a sister directory to the samples directory that |
| 223 | + houses the native makefiles and the generated files from the |
| 224 | + build. |
| 225 | + |
| 226 | + cd samplesBin/ |
| 227 | + ccmake ../samples/ |
| 228 | + - ccmake is a curses-based cmake program; it takes a parameter |
| 229 | + that specifies the location of the source code to compile. |
| 230 | + - Hit 'c' to configure for the platform; ensure that the |
| 231 | + dependencies to external libraries are satisfied, including |
| 232 | + paths to 'ATI Stream SDK'. |
| 233 | + - After dependencies are satisfied, hit 'c' again to finalize |
| 234 | + configuration. Then, hit 'g' to generate a makefile and |
| 235 | + exit ccmake. |
| 236 | + |
| 237 | + make help |
| 238 | + - Look at the options available for make. |
| 239 | + |
| 240 | + make |
| 241 | + - Build the sample client program. |
| 242 | + |
| 243 | + ./example_sgemm |
| 244 | + - Run a simple client; one example is provided for each supported main |
| 245 | + BLAS function family. |
| 246 | +_______________________________________________________________________________ |
| 247 | +(C) 2010-2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD |
| 248 | +Arrow logo, ATI, the ATI logo, Radeon, FireStream, FireGL, Catalyst, and |
| 249 | +combinations thereof are trademarks of Advanced Micro Devices, Inc. Microsoft |
| 250 | +(R), Windows, and Windows Vista (R) are registered trademarks of Microsoft |
| 251 | +Corporation in the U.S. and/or other jurisdictions. OpenCL and the OpenCL logo |
| 252 | +are trademarks of Apple Inc. used by permission by Khronos. Other names are for |
| 253 | +informational purposes only and may be trademarks of their respective owners. |
| 254 | + |
| 255 | +The contents of this document are provided in connection with Advanced Micro |
| 256 | +Devices, Inc. ("AMD") products. AMD makes no representations or warranties with |
| 257 | +respect to the accuracy or completeness of the contents of this publication and |
| 258 | +reserves the right to make changes to specifications and product descriptions |
| 259 | +at any time without notice. The information contained herein may be of a |
| 260 | +preliminary or advance nature and is subject to change without notice. No |
| 261 | +license, whether express, implied, arising by estoppel or otherwise, to any |
| 262 | +intellectual property rights is granted by this publication. Except as set forth |
| 263 | +in AMD's Standard Terms and Conditions of Sale, AMD assumes no liability |
| 264 | +whatsoever, and disclaims any express or implied warranty, relating to its |
| 265 | +products including, but not limited to, the implied warranty of |
| 266 | +merchantability, fitness for a particular purpose, or infringement of any |
| 267 | +intellectual property right. |
| 268 | + |
| 269 | +AMD's products are not designed, intended, authorized or warranted for use as |
| 270 | +components in systems intended for surgical implant into the body, or in other |
| 271 | +applications intended to support or sustain life, or in any other application |
| 272 | +in which the failure of AMD's product could create a situation where personal |
| 273 | +injury, death, or severe property or environmental damage may occur. AMD |
| 274 | +reserves the right to discontinue or make changes to its products at any time |
| 275 | +without notice. |
| 276 | +_______________________________________________________________________________ |
0 commit comments