Skip to content

Commit 3f032e7

Browse files
author
Timmy
committed
merge develop branch to master branch. Bump master branch version number to 2.6
2 parents a6b3f9d + 5005205 commit 3f032e7

File tree

134 files changed

+48875
-1264
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

134 files changed

+48875
-1264
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,6 @@
1717

1818
# Generated kernel template files
1919
*.clT
20+
21+
# flags.txt file
22+
*flags.txt

.travis.yml

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,34 @@ compiler:
55

66
before_install:
77
- sudo apt-get update -qq
8-
- sudo apt-get install -qq fglrx opencl-headers libboost-program-options-dev
8+
- sudo apt-get install -qq fglrx libboost-program-options-dev
99
# Uncomment below to help verify the installs above work
1010
# - ls -la /usr/lib/libboost*
1111
# - ls -la /usr/include/boost
1212

1313
before_script:
1414
- cd ${TRAVIS_BUILD_DIR}
15+
# download OpenCL 1.2 header files since Travis CI only provides 1.1
16+
- mkdir -p OpenCLInclude/CL
17+
- cd OpenCLInclude/CL
18+
#- wget -r --no-parent -nH --cut-dirs=4 --reject="index.html*" https://www.khronos.org/registry/cl/api/1.2/
19+
- wget https://www.khronos.org/registry/cl/api/1.2/cl.h
20+
- wget https://www.khronos.org/registry/cl/api/1.2/cl.hpp
21+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_d3d10.h
22+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_d3d11.h
23+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_dx9_media_sharing.h
24+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_egl.h
25+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_ext.h
26+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_gl.h
27+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_gl_ext.h
28+
- wget https://www.khronos.org/registry/cl/api/1.2/cl_platform.h
29+
- wget https://www.khronos.org/registry/cl/api/1.2/opencl.h
30+
- ls
31+
- pwd
32+
- cd ../..
1533
- mkdir -p bin/clBLAS
1634
- cd bin/clBLAS
17-
- cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TEST=OFF -DBUILD_CLIENT=ON -DCMAKE_INSTALL_PREFIX:PATH=$PWD/package ../../src
35+
- cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TEST=OFF -DBUILD_CLIENT=ON -DOPENCL_INCLUDE_DIRS:PATH=$PWD/../../OpenCLInclude -DCMAKE_INSTALL_PREFIX:PATH=$PWD/package ../../src
1836

1937
script:
2038
- make install

LICENSE

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -175,28 +175,3 @@
175175
of your accepting any such warranty or additional liability.
176176

177177
END OF TERMS AND CONDITIONS
178-
179-
APPENDIX: How to apply the Apache License to your work.
180-
181-
To apply the Apache License to your work, attach the following
182-
boilerplate notice, with the fields enclosed by brackets "[]"
183-
replaced with your own identifying information. (Don't include
184-
the brackets!) The text should be enclosed in the appropriate
185-
comment syntax for the file format. We also recommend that a
186-
file or class name and description of purpose be included on the
187-
same "printed page" as the copyright notice for easier
188-
identification within third-party archives.
189-
190-
Copyright [yyyy] [name of copyright owner]
191-
192-
Licensed under the Apache License, Version 2.0 (the "License");
193-
you may not use this file except in compliance with the License.
194-
You may obtain a copy of the License at
195-
196-
http://www.apache.org/licenses/LICENSE-2.0
197-
198-
Unless required by applicable law or agreed to in writing, software
199-
distributed under the License is distributed on an "AS IS" BASIS,
200-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201-
See the License for the specific language governing permissions and
202-
limitations under the License.

README.md

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,24 @@ library does generate and enqueue optimized OpenCL kernels, relieving
2020
the user from the task of writing, optimizing and maintaining kernel
2121
code themselves.
2222

23+
## clBLAS update notes 04/2015
24+
- A subset of GEMM and TRSM can be off-line compiled for Hawaii, Bonaire and Tahiti device at compile-time. This feature
25+
eliminates the overhead of calling clBuildProgram() at run-time.
26+
- Off-line compilation can be done with OpenCL 1.1, OpenCL 1.2 and OpenCl 2.0 runtime. However, for better
27+
performance OpenCL 2.0 is recommended. Library user can select "OCL_VERSION" from CMake to ensure the library with
28+
OpenCL version. It is library user's responsibility to ensure compatible hardware and driver.
29+
- Added flags_public.txt file that contains OpenCL compiler flags used by off-line compilation. The flags_public.txt
30+
will only be loaded when OCL_VERSION is 2.0.
31+
- User can off-line compile one or more supported device by selecting
32+
OCL_OFFLINE_BUILD_BONAIRE_KERNEL
33+
OCL_OFFLINE_BUILD_HAWII_KERNEL
34+
OCL_OFFLINE_BUILD_TAHITI_KERNEL.
35+
However, compile for more than one device at a time might result in running out of heap memory. Thus, compile for
36+
one device at a time is recommended.
37+
- User may also supply specific OpenCL compiler path with OCL_COMPILER_DIR or the library will load default OpenCL compiler.
38+
- The minimum driver requirement for off-line compilation is 14.502.
39+
40+
2341
## clBLAS library user documentation
2442

2543
[Library and API documentation][] for developers is available online as
@@ -48,15 +66,12 @@ how to contribute code to this open source project. The code in the
4866
be made against the /develop branch.
4967

5068
## License
51-
52-
The source for clBLAS is licensed under the [Apache License, Version
53-
2.0][]
69+
The source for clBLAS is licensed under the [Apache License, Version 2.0]( http://www.apache.org/licenses/LICENSE-2.0 )
5470

5571
## Example
72+
The simple example below shows how to use clBLAS to compute an OpenCL accelerated SGEMM
5673

57-
The simple example below shows how to use clBLAS to compute an OpenCL
58-
accelerated SGEMM
59-
74+
```c
6075
#include <sys/types.h>
6176
#include <stdio.h>
6277

@@ -170,42 +185,30 @@ accelerated SGEMM
170185

171186
return ret;
172187
}
188+
```
173189
174190
## Build dependencies
175-
176191
### Library for Windows
177-
178-
- Windows® 7/8
179-
180-
- Visual Studio 2010 SP1, 2012
181-
182-
- An OpenCL SDK, such as APP SDK 2.9
183-
184-
- Latest CMake
192+
* Windows® 7/8
193+
* Visual Studio 2010 SP1, 2012
194+
* An OpenCL SDK, such as APP SDK 2.8
195+
* Latest CMake
185196
186197
### Library for Linux
187-
188-
- GCC 4.6 and onwards
189-
190-
- An OpenCL SDK, such as APP SDK 2.9
191-
192-
- Latest CMake
198+
* GCC 4.6 and onwards
199+
* An OpenCL SDK, such as APP SDK 2.9
200+
* Latest CMake
193201
194202
### Library for Mac OSX
195-
196-
- Recommended to generate Unix makefiles with cmake
203+
* Recommended to generate Unix makefiles with cmake
197204
198205
### Test infrastructure
199-
200-
- Googletest v1.6
201-
202-
- ACML on windows/linux; Accelerate on Mac OSX
203-
204-
- Latest Boost
206+
* Googletest v1.6
207+
* ACML on windows/linux; Accelerate on Mac OSX
208+
* Latest Boost
205209
206210
### Performance infrastructure
207-
208-
- Python
211+
* Python
209212
210213
[Library and API documentation]: http://clmathlibraries.github.io/clBLAS/
211214
[[email protected]]: https://groups.google.com/forum/#!forum/clmath

doc/README-BinaryCacheOnDisk.txt

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
S. Chauveau
2+
CAPS Entreprise
3+
clBLAS Project
4+
------------------------------
5+
April 30,2014
6+
7+
8+
The implementation of a binary cache for CL programs can be found in
9+
files src/include/binary_lookup.h and src/library/blas/generic/binary_lookup.cc
10+
11+
The cache is currently disabled by default. It can be enabled by
12+
setting the environment variable 'CLBLAS_CACHE_PATH' to the directory
13+
containing the cache entries.
14+
15+
In the code itself, accesses to the cache are controlled by the
16+
BinaryLookup class. A typical cache query looks as follow:
17+
18+
(1) Create a local instance of BinaryLookup
19+
20+
(2) Specify the additional characteristics (i.e. variants) of the
21+
requested program. That information combined with the program name
22+
and the OpenCL context and device shall form a unique signature
23+
for the binary program.
24+
25+
(3) Perform the effective search by calling the 'found' method
26+
27+
(4a) If the search was successful then cl_program can be retrieved
28+
by a call to the 'getProgram' method
29+
30+
(4b) If the search was not successful then a cl_program
31+
must be created and populated in the cache by a call
32+
to the 'setProgram' method.
33+
34+
(5) Destroy the BinaryLookup local instance.
35+
36+
37+
So in practice a typical query shall looks as follow:
38+
39+
cl_program program ;
40+
41+
// The program name is part of the signature and shall be unique
42+
const char * program_name = "... my unique program name ... " ;
43+
44+
BinaryLookup bl(context, device, program_name);
45+
46+
// Specify some additional information used to build a
47+
// unique signature for that cache entry
48+
49+
bl.variantInt( vectorSize );
50+
bl.variantInt( hasBorder );
51+
...
52+
53+
// Perform the query
54+
if ( bl.found() )
55+
{
56+
// Success! use the cl_program retrieved from the cache
57+
program = bl.getProgram();
58+
}
59+
else
60+
{
61+
// Failure! we need to build the program
62+
program = build_my_program(context,device,vectorSize,...) ;
63+
// and inform the lookup object of the program
64+
bl.setProgram(program);
65+
// and finally populate the cache
66+
bl.populateCache()
67+
}
68+
69+
// The BinaryLookup shall now be destroyed

doc/README-FunctorConcepts.txt

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
S. Chauveau
2+
CAPS Entreprise
3+
April 30, 2014
4+
5+
The Functor concept was introduced in clBLAS to simplify the creation
6+
of specialized versions for dedicated architectures.
7+
8+
The original system, referred as the 'Solver' system in this document,
9+
is very centralized and not flexible enough to insert customized kernels.
10+
11+
The Functor
12+
===========
13+
14+
A functor is simply a C++ object that provides an implementation of
15+
a function. In the current case, that function is one of the BLAS calls
16+
implemented in OpenCL.
17+
18+
The base class of all functors is clblasFunctor
19+
- see src/library/blas/functor/include/functor.h
20+
- see src/library/blas/functor/functor.cc
21+
22+
That class does not provide much by itself but it is supposed to be derived
23+
once for each BLAS function to be implemented.
24+
25+
For instance the clblasSgemmFunctor class will be the base class of all
26+
functors providing a generic or specific implementation of SGEMM.
27+
28+
A generic functor is one that is applicable to all possible arguments of the
29+
function it implements. In most cases, there will be at least one generic
30+
functor that will simply call the existing Solver-based implementation of the
31+
function. For SGEMM, that is the class clblasSgemmFunctorFallback.
32+
33+
A specific functor is one that is applicable to only a subset of the possible
34+
arguments of the function it implements. For instance, a SGEMM functor could
35+
only implement it for matrices of a given block size or only for square
36+
matrices or only for a specific device architecture (e.g. AMD Hawai) etc
37+
38+
The Functor Selector
39+
====================
40+
41+
Multiple generic and specific functors may be available to implement each
42+
clBLAS call. The selection of the proper functor is delegated to the class
43+
clblasFunctorSelector whose default implementation typically returns the
44+
fallback functors.
45+
46+
- see src/library/blas/functor/include/functor_selector.h
47+
- see src/library/blas/functor/functor_selector.cc
48+
49+
So clblasFunctorSelector provides a large set of virtual selection methods.
50+
Typically, a method to select a specific functor will be provided for each
51+
supported BLAS function. Another method may be provided to select a generic
52+
functor but that is not mandatory.
53+
54+
The default implementation of clblasFunctorSelector is typically that the
55+
specific selector is redirected to the generic one returning the fallback
56+
functor (so using the existing Solver-based implementation).
57+
58+
59+
The class clblasFunctorSelector is supposed to be derived once for each
60+
supported architecture (e.g. Hawai, Tahiti, ...) and a single global instance
61+
of each of those derived classes shall be created. This is important because
62+
those instances register themselves in a global data structure that is later
63+
used to find the proper clblasFunctorSelector according to the architecture
64+
(see clblasFunctorSelector::find() )
65+
66+
67+
Functor Management & Cache
68+
==========================
69+
70+
Each functor contains a reference counter that, when it reaches zero, causes
71+
the functor destruction. See the members clblasFunctor::retain() and
72+
clblasFunctor::release().
73+
74+
Of course, to be efficient, functors must be reusable between BLAS calls so
75+
some mechanisms must be implemented to manage the functors.
76+
77+
Some functors, such as the fallback functors, are independent of the
78+
arguments and of the opencl context & device. Those can typically be
79+
implemented using a single global instance that will never be destroyed.
80+
81+
Other functors, such as those that manage a cl_program internally, are
82+
dependent of the opencl context & device and sometimes of some arguments.
83+
They need to be stored in caches using some information as keys.
84+
85+
In the current implementation, we propose that each functor class shall
86+
implement its own private cache. Such functors shall not be created directly
87+
using its constructor but via a dedicated 'provide' function (the name 'provide'
88+
is not mandatory) that will take care of managing the internal cache.
89+
90+
The template class clblasFunctorCache<F> is provided as a simple
91+
implementation of a cache of functors of type F. Use of that cache is not a
92+
mandatory part of the functor design. Another strategies could be to keep a
93+
single instance of the functor and implement a cache for the cl_program or to
94+
implement a global cache shared by multiple functor classes.
95+
96+
97+
98+
99+
100+

0 commit comments

Comments
 (0)