Skip to content

Commit db35dbd

Browse files
author
Kent Knox
committed
Initial check-in of open source clBLAS code
0 parents  commit db35dbd

File tree

540 files changed

+214287
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

540 files changed

+214287
-0
lines changed

.gitattributes

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Auto detect text files and perform LF normalization
2+
* text=auto
3+
4+
# Custom for Visual Studio
5+
*.cs diff=csharp
6+
*.sln merge=union
7+
*.csproj merge=union
8+
*.vbproj merge=union
9+
*.fsproj merge=union
10+
*.dbproj merge=union
11+
12+
# Standard to msysgit
13+
*.doc diff=astextplain
14+
*.DOC diff=astextplain
15+
*.docx diff=astextplain
16+
*.DOCX diff=astextplain
17+
*.dot diff=astextplain
18+
*.DOT diff=astextplain
19+
*.pdf diff=astextplain
20+
*.PDF diff=astextplain
21+
*.rtf diff=astextplain
22+
*.RTF diff=astextplain

.gitignore

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Compiled Object files
2+
*.slo
3+
*.lo
4+
*.o
5+
*.obj
6+
7+
# Compiled Dynamic libraries
8+
*.so
9+
*.dylib
10+
*.dll
11+
12+
# Compiled Static libraries
13+
*.lai
14+
*.la
15+
*.a
16+
*.lib
17+
18+
# Generated kernel template files
19+
*.clT

CHANGELOG

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
# ########################################################################
2+
# Copyright 2013 Advanced Micro Devices, Inc.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# ########################################################################
16+
17+
clBLAS Readme
18+
19+
Version: 1.10
20+
Release Date: April 2013
21+
22+
ChangeLog:
23+
____________
24+
Current Version:
25+
New:
26+
* New Level 1 routines added (an 'x' implies all 4 precisions)
27+
xSWAP, xCOPY, xSCAL, CSSCAL, ZDSCAL, xAXPY, SDOT, DDOT,
28+
CDOTU, ZDOTU, CDOTC, ZDOTC, xROTG, SROTMG, DROTMG,
29+
SROT, DROT, CSROT, ZDROT, SROTM, DROTM, SNRM2, DNRM2,
30+
SCNRM2, DZNRM2, ixAMAX, SASUM, DASUM, SCASUM, DZASUM
31+
* Samples have been added for the new functions
32+
* This release tested using the 9.012 runtime driver and the 2.8 APPSDK
33+
Fixed:
34+
* Failures in *trsm functions with clMAGMA tests
35+
Known Issues:
36+
* Failures & hangs in ztrmm, *trsv, *tpsv functions on Southern Island GPU devices
37+
* Failures in zgemm functions on Northern Island GPU devices
38+
* Failures & hangs are expected to be fixed in the upcoming AMD graphics driver versions.
39+
It is strongly recommended that users keep their graphics driver versions up to date.
40+
41+
____________
42+
Version 1.8.291:
43+
Fixed:
44+
* Failures in the following functions: ssyr2, ssyr2k, strsm, strsv, ssyrk, cher,
45+
ctrsv, csymm, cher2, ztrmm on Southern Island GPU devices.
46+
* Failures in the following functions: dsyr, dsyr2, dgemv, dsyrk,
47+
dsyr2k, zsyr2k on Trinity platforms.
48+
Known Issues:
49+
* Failures in *trsm functions with clMAGMA tests
50+
51+
____________
52+
Version 1.8.269 (Beta, clMAGMA support):
53+
New:
54+
* No new routines
55+
* This release tested using the 8.961 runtime driver and the 2.6 APPSDK
56+
57+
Known Issues:
58+
* The clBLASTune executable has been observed to hang on Windows. If
59+
this happens, abort execution of the tune program; it is not required
60+
for correct operation of the BLAS routines (as of 8.872).
61+
* clBLAS can return invalid results on CPU devices (as
62+
of 8.961). The CPU device is primarily a test/debug device, and GPU
63+
devices are unaffected.
64+
* clBLAS can return invalid results for double precision functions (dsyr,
65+
dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of
66+
8.961).
67+
* clBLAS can return invalid results (ssyr2, ssyr2k, strsm, strsv, ssyrk, cher,
68+
ctrsv, csymm, cher2, ztrmm) on Southern Island GPU devices (as of 8.961).
69+
70+
____________
71+
Version 1.7 (Beta, clMAGMA support):
72+
New:
73+
* New Level 3 routines added (an 'x' implies all 4 precisions)
74+
CHER2K, ZHER2K
75+
* New Level 2 routines added (an 'x' implies all 4 precisions)
76+
xTPMV, xTPSV, SSPVM, DSPMV, CHPMV, ZHPMV, SSPR, DSPR, CHPR, ZHPR,
77+
SSPR2, DSPR2, CHPR2, ZHPR2, xGBMV, CHBMV, ZHBMV, SSBMV, DSBMV,
78+
xTBMV, xTBSV
79+
* Samples have been added for the new functions, but are not fully tested
80+
* This release tested using the 8.951 runtime driver and the 2.6 APPSDK
81+
* Note that documentation is incomplete for the new functions
82+
83+
Known Issues:
84+
* The clBLASTune executable has been observed to hang on Windows. If
85+
this happens, abort execution of the tune program; it is not required
86+
for correct operation of the BLAS routines (as of 8.872).
87+
* clBLAS can return invalid results on CPU devices that support AVX (as
88+
of 8.951). CPU devices that support up to SSE3 are unaffected. The
89+
CPU device is primarily a test/debug device, and GPU devices are
90+
unaffected.
91+
* clBLAS can return invalid results for double precision functions (dsyr,
92+
dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of
93+
8.951).
94+
* clBLAS can return invalid results (ssyr, ssyr2, strsv, ctrsv, ssyrk,
95+
ssyr2k, ztrmm) on Southern Island GPU devices (as of 8.951).
96+
97+
____________
98+
Version 1.6:
99+
New:
100+
* New Level 3 routines added (an 'x' implies all 4 precisions)
101+
CSYRK, ZSYRK, CSYR2K, ZSYR2K, CHEMM, ZHEMM, CHERK, ZHERK, xSYMM
102+
* New Level 2 routines added (an 'x' implies all 4 precisions)
103+
CGEMV, ZGEMV, xTRMV, xTRSV, CHEMV, ZHEMV, SGER, DGER, CGERU, ZGERU,
104+
CGERC, ZGERC, CHER, ZHER, CHER2, ZHER2, SSYR, DSYR, SSYR2, DSYR2
105+
* For all the original functions prior to 1.6, a new API has been introduced
106+
with an *Ex suffix. These extended API's add new parameters that allow
107+
users to specify an offset to a matrix argument. This allows efficient
108+
sub-matrix indexing within a clBLAS routine without requiring expensive
109+
sub-matrix copy operations.
110+
* Samples have been added for the new functions
111+
* Preview: Support for AMD Radeon� HD7000 series GPUs
112+
* This release tested using the 8.92 runtime driver and the 2.6 APP SDK
113+
114+
Known Issues:
115+
* The clBLASTune executable has been observed to hang on Windows. If this
116+
happens, abort execution of the tune program; it is not required for
117+
correct operation of the BLAS routines (as of 8.872).
118+
* The CPU device for clBLAS is not functioning for this release (as of
119+
8.872). The CPU device is primarily a test/debug device, and GPU
120+
devices are unaffected.
121+
122+
____________
123+
Version 1.4:
124+
New:
125+
* New Level 3 routines added
126+
SSYRK, DSYRK, SSYR2K, DSYR2K
127+
* New Level 2 routines added
128+
SGEMV, DGEMV, SSYMV, DSYMV
129+
* The image support functions (clblasAddScratchImage,
130+
clblasRemoveScratchImage) have been deprecated. Images are no
131+
longer required for the highest performance.
132+
* InstallShield is now used for APPML libraries. The default install
133+
location has changed from c:\amd\clBLAS to
134+
C:\Program Files (x86)\AMD\clBLAS. It is recommended that previous
135+
versions of clBLAS are uninstalled first.
136+
* Samples have been added for the new functions
137+
* This release tested using the 8.872 runtime driver and the 2.5 APP SDK
138+
139+
Known Issues:
140+
* The clBLASTune executable has been observed to hang on Windows. If this
141+
happens, abort execution of the tune program; it is not required for
142+
correct operation of the BLAS routines (as of 8.872).
143+
* The CPU device for clBLAS is not functioning for this release (as of
144+
8.872). The CPU device is primarily a test/debug device, and GPU
145+
devices are unaffected.
146+
147+
148+
____________
149+
Version 1.2:
150+
* The library now supports both 32- and 64-bit Windows and Linux operating
151+
systems.
152+
* xTRSM routines are available in 1.2.
153+
* clBLAS routines return clBLASStatus error codes, instead of native
154+
OpenCL error codes
155+
156+
Fixed:
157+
* xTRMM routines were not properly handling implicit unit diagonal
158+
elements and implicit off-diagonal zero values specified by the BLAS
159+
parameters SIDE, UPLO and DIAG.
160+
* Possible crash with CPU device on 32-bit systems.
161+
* clblasDgemm routine return an invalid event as its last argument.
162+
* clBLAS routines return clblasStatus error codes, instead of
163+
native OpenCL error codes.
164+
165+
Known Issues:
166+
* The clBLASTune executable has been observed to hang on Windows. If this
167+
happens, abort execution of the tune program; it is not required for
168+
correct operation of the BLAS routines (as of 8.872).
169+
* The CPU device for clBLAS is not functioning for this release (as of
170+
8.872). The CPU device is primarily a test/debug device, and GPU
171+
devices are unaffected.
172+
173+
____________________
174+
Version 1.0:
175+
* Initial release
176+
177+
Known Issues:
178+
* Available only on Linux64.
179+
* xTRMM routines were not properly handling implicit unit diagonal elements
180+
and implicit off-diagonal zero values specified by the BLAS parameters
181+
SIDE, UPLO and DIAG
182+
* clblasDgemm returned an invalid event as its last argument
183+
184+
_____________
185+
Building the Samples:
186+
187+
To install the Linux versions of clBLAS, uncompress the initial download, then
188+
execute the install script.
189+
190+
For example:
191+
192+
tar -xf clBLAS-${version}-Linux.tar.gz
193+
- This installs three files into the local directory, one being an
194+
executable bash script.
195+
196+
sudo mkdir /opt/clBLAS-${version}
197+
- This pre-creates the install directory with proper permissions
198+
in /opt if it is to be installed there. (This is the default.)
199+
200+
./install-clBLAS-${version}.sh
201+
- This prints an EULA and uncompresses files into the chosen install
202+
directory.
203+
204+
cd ${installDir}/bin64
205+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clBLASLibDir}
206+
- Be sure to export library dependencies to resolve all external
207+
linkages to the client program; you can create a bash script to
208+
help automate this procedure.
209+
210+
./example_sgemm
211+
- Run a simple client; one example is provided for each supported
212+
main BLAS function family.
213+
214+
The sample program does not ship with native build files; instead, a CMake
215+
file is shipped, and the user generates a native build file for their system.
216+
217+
For example:
218+
219+
cd ${installDir}
220+
221+
mkdir samplesBin/
222+
- This creates a sister directory to the samples directory that
223+
houses the native makefiles and the generated files from the
224+
build.
225+
226+
cd samplesBin/
227+
ccmake ../samples/
228+
- ccmake is a curses-based cmake program; it takes a parameter
229+
that specifies the location of the source code to compile.
230+
- Hit 'c' to configure for the platform; ensure that the
231+
dependencies to external libraries are satisfied, including
232+
paths to 'ATI Stream SDK'.
233+
- After dependencies are satisfied, hit 'c' again to finalize
234+
configuration. Then, hit 'g' to generate a makefile and
235+
exit ccmake.
236+
237+
make help
238+
- Look at the options available for make.
239+
240+
make
241+
- Build the sample client program.
242+
243+
./example_sgemm
244+
- Run a simple client; one example is provided for each supported main
245+
BLAS function family.
246+
_______________________________________________________________________________
247+
(C) 2010-2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD
248+
Arrow logo, ATI, the ATI logo, Radeon, FireStream, FireGL, Catalyst, and
249+
combinations thereof are trademarks of Advanced Micro Devices, Inc. Microsoft
250+
(R), Windows, and Windows Vista (R) are registered trademarks of Microsoft
251+
Corporation in the U.S. and/or other jurisdictions. OpenCL and the OpenCL logo
252+
are trademarks of Apple Inc. used by permission by Khronos. Other names are for
253+
informational purposes only and may be trademarks of their respective owners.
254+
255+
The contents of this document are provided in connection with Advanced Micro
256+
Devices, Inc. ("AMD") products. AMD makes no representations or warranties with
257+
respect to the accuracy or completeness of the contents of this publication and
258+
reserves the right to make changes to specifications and product descriptions
259+
at any time without notice. The information contained herein may be of a
260+
preliminary or advance nature and is subject to change without notice. No
261+
license, whether express, implied, arising by estoppel or otherwise, to any
262+
intellectual property rights is granted by this publication. Except as set forth
263+
in AMD's Standard Terms and Conditions of Sale, AMD assumes no liability
264+
whatsoever, and disclaims any express or implied warranty, relating to its
265+
products including, but not limited to, the implied warranty of
266+
merchantability, fitness for a particular purpose, or infringement of any
267+
intellectual property right.
268+
269+
AMD's products are not designed, intended, authorized or warranted for use as
270+
components in systems intended for surgical implant into the body, or in other
271+
applications intended to support or sustain life, or in any other application
272+
in which the failure of AMD's product could create a situation where personal
273+
injury, death, or severe property or environmental damage may occur. AMD
274+
reserves the right to discontinue or make changes to its products at any time
275+
without notice.
276+
_______________________________________________________________________________

CONTRIBUTING.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
## Contributor guidelines
2+
3+
Contributing code to this project is intended to be light weight and intuitive to users familiar with GitHub to actively encourage contributions, but a process is documented and should be followed to prevent chaos, confusion and despair.
4+
5+
## The mechanics of contributing code
6+
Firstly, in order to contribute code to this project, a contributor must have a valid and current [GitHub account](https://help.github.com/articles/set-up-git) available to use. Given an account,
7+
* The potential contributor forks this project into his/her account following the traditional [forking](https://help.github.com/articles/fork-a-repo) model native to GitHub
8+
* After forking, the contributor [clones their repository](https://help.github.com/articles/create-a-repo) locally on their machine
9+
* Code is developed and checked into the contributor's repository. These commits are eventually pushed upstream to their GitHub repository
10+
* The contributor then issues a [pull-request](https://help.github.com/articles/using-pull-requests) against the **develop** branch of this repository, which is the [git flow](http://nvie.com/posts/a-successful-git-branching-model/) workflow which is well suited for working with GitHub
11+
* A [git extention](https://github.com/nvie/gitflow) has been developed to ease the use of the 'git flow' methodology, but requires manual installation by the user. Refer to the projects wiki
12+
13+
At this point, the repository maintainers will be notified by GitHub that a 'pull request' exists pending against their repository. A code review should be completed within a few days, depending on the scope of submitted code, and the code will either be accepted, rejected or commented on for extra feedback.
14+
15+
## Code submission guidelines
16+
We want to ensure that the project code base maintains a level of quality over time, such that future contributors find it as easy to jump into the code as hopefully it is today. As such, pull requests should
17+
* remember that clMath is a project licensed under the [Apache License, Version 2.0]( http://www.apache.org/licenses/LICENSE-2.0 ). If you are not already familiar, please review the license before issuing a pull request. We intend this project to be open to external contributors, and encourage developers to contribute code back that they believe will provide value to the overall community. We will interpret an explicit 'pull request' back to this repository as an implicit acknowledgement from the contributor that they wish to share the code with the community under the terms of the Apache license v2.0.
18+
* follow the [code style guidelines]( ) of the project as posted to the project wiki. Unfortunately, there was no unifying code guidelines defined between the BLAS & FFT projects, but code submissions should not mix styles within an individual file. We have since defined and posted a code style guideline for the projects and we expect the code to slowly transition to the new
19+
guidelines over time
20+
* separate check-ins that modify a files style from the ones that add/change/delete code.
21+
* target the **develop** branch in the repository
22+
* ensure that the [code properly builds]( https://github.com/kknox/clBLAS/wiki/Build )
23+
* cannot break existing test cases
24+
* we encourage contributors to [run the test-short]( https://github.com/kknox/clBLAS/wiki/Testing ) suite of tests on their end before the pull-request
25+
* if possible, upload the test results associated with the pull request to a personal [gist repository]( https://gist.github.com/ ) and insert a link to the test results in the pull request so that collaborators can browse the results
26+
* if no test results are provided with the pull request, official collaborators will run the test suite on their test machines against the patch before we will accept the pull-request
27+
* if we detect failing test cases, we will request that the code associated with the pull request be fixed before the pull request will be merged
28+
* if new functionality is introduced with the pull request, sufficient test cases should be added to verify the new functionality is correct
29+
* new tests should integrate with the existing [googletest framework]( https://code.google.com/p/googletest/wiki/Primer ) located in the src/tests directory of the repo
30+
* if the collaborators feel the new tests do not provide sufficient coverage, feedback on the pull request will be left with suggestions on how to improve the tests before the pull request will be merged
31+
32+
Pull requests will be reviewed by the set of collaborators that are assigned for the repository. Pull requests may be accepted, declined or a conversation may start on the pull request thread with feedback. If the pull request is trivial and all the submission guidelines defined above are honored, the pull request may be accepted without delay. If the pull request is good, but the guidelines defined above are not followed, the collaborators may leave feedback on the pull request and engage in a conversation with the contributor with what they can do to improve the pull request. At any time, collaborators may decline a pull request if they decide the contribution is not appropriate for the project, or the feedback from reviewers on a pull request is not being addressed in an appropriate amount of time.
33+
34+
## Is it possible to become an official collaborator of the repository?
35+
Yes, we hope to promote trusted members of the community, who have proven themselves to be competent and request to take on the extra responsibility to be official collaborators of the project. When an individual requests to be an official collaborator, current project collaborators will browse through the history of the requester's prior pull requests and take a vote amongst themselves if the requester should be promoted to collaborator. These individuals will then have the right to approve/decline pull requests and help shape the path that the project goes. It is worth noting, that on GitHub everybody has read-only access to the source and that everybody has the ability to issue a pull request to contribute to the project. The benefit of being a repository collaborator allows you to be able to be able to manage other peoples pull requests.
36+

0 commit comments

Comments
 (0)