Skip to content

Commit 2c5ab03

Browse files
committed
Merge pull request #139 from guacamoleo/develop
updating README for AutoGemm
2 parents b8ed4fd + 4b34283 commit 2c5ab03

File tree

1 file changed

+8
-18
lines changed

1 file changed

+8
-18
lines changed

README.md

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -20,30 +20,20 @@ library does generate and enqueue optimized OpenCL kernels, relieving
2020
the user from the task of writing, optimizing and maintaining kernel
2121
code themselves.
2222

23-
## clBLAS update notes 04/2015
24-
- A subset of GEMM and TRSM can be off-line compiled for Hawaii, Bonaire and Tahiti device at compile-time. This feature
25-
eliminates the overhead of calling clBuildProgram() at run-time.
26-
- Off-line compilation can be done with OpenCL 1.1, OpenCL 1.2 and OpenCl 2.0 runtime. However, for better
27-
performance OpenCL 2.0 is recommended. Library user can select "OCL_VERSION" from CMake to ensure the library with
28-
OpenCL version. It is library user's responsibility to ensure compatible hardware and driver.
29-
- Added flags_public.txt file that contains OpenCL compiler flags used by off-line compilation. The flags_public.txt
30-
will only be loaded when OCL_VERSION is 2.0.
31-
- User can off-line compile one or more supported device by selecting
32-
OCL_OFFLINE_BUILD_BONAIRE_KERNEL
33-
OCL_OFFLINE_BUILD_HAWII_KERNEL
34-
OCL_OFFLINE_BUILD_TAHITI_KERNEL.
35-
However, compile for more than one device at a time might result in running out of heap memory. Thus, compile for
36-
one device at a time is recommended.
37-
- User may also supply specific OpenCL compiler path with OCL_COMPILER_DIR or the library will load default OpenCL compiler.
38-
- The minimum driver requirement for off-line compilation is 14.502.
39-
23+
## clBLAS update notes 09/2015
24+
25+
- Introducing [AutoGemm](http://github.com/clMathLibraries/clBLAS/wiki/AutoGemm)
26+
- clBLAS's Gemm implementation has been comprehensively overhauled to use AutoGemm. AutoGemm is a suite of python scripts which generate optimized kernels and kernel selection logic, for all precisions, transposes, tile sizes and so on.
27+
- CMake is configured to use AutoGemm for clBLAS so the build and usage experience of Gemm remains unchanged (only performance and maintainability has been improved). Kernel sources are generated at build time (not runtime) and can be configured within CMake to be pre-compiled at build time.
28+
- clBLAS users with unique Gemm requirements can customize AutoGemm to their needs (such as non-default tile sizes for very small or very skinny matrices); see [AutoGemm](http://github.com/clMathLibraries/clBLAS/wiki/AutoGemm) documentation for details.
29+
4030

4131
## clBLAS library user documentation
4232

4333
[Library and API documentation][] for developers is available online as
4434
a GitHub Pages website
4535

46-
### Google Groups
36+
## Google Groups
4737

4838
Two mailing lists have been created for the clMath projects:
4939

0 commit comments

Comments
 (0)