You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- clBLAS's Gemm implementation has been comprehensively overhauled to use AutoGemm. AutoGemm is a suite of python scripts which generate optimized kernels and kernel selection logic, for all precisions, transposes, tile sizes and so on.
30
+
- CMake is configured to use AutoGemm for clBLAS so the build and usage experience of Gemm remains unchanged (only performance and maintainability has been improved). Kernel sources are generated at build time (not runtime) and can be configured within CMake to be pre-compiled at build time.
31
+
- clBLAS users with unique Gemm requirements can customize AutoGemm to their needs (such as non-default tile sizes for very small or very skinny matrices); see [AutoGemm](http://github.com/clMathLibraries/clBLAS/wiki/AutoGemm) documentation for details.
32
+
40
33
41
34
## clBLAS library user documentation
42
35
43
36
[Library and API documentation][] for developers is available online as
44
37
a GitHub Pages website
45
38
46
-
###Google Groups
39
+
## Google Groups
47
40
48
41
Two mailing lists have been created for the clMath projects:
49
42
@@ -108,10 +101,10 @@ The simple example below shows how to use clBLAS to compute an OpenCL accelerate
108
101
static const cl_float beta = 20;
109
102
110
103
static cl_float C[M*N] = {
111
-
11, 12, 13,
112
-
21, 22, 23,
113
-
31, 32, 33,
114
-
41, 42, 43,
104
+
11, 12, 13,
105
+
21, 22, 23,
106
+
31, 32, 33,
107
+
41, 42, 43,
115
108
};
116
109
static const size_t ldc = N; /* i.e. ldc = N */
117
110
@@ -155,13 +148,13 @@ The simple example below shows how to use clBLAS to compute an OpenCL accelerate
0 commit comments