Skip to content

Commit 64d0ba3

Browse files
author
Timmy
committed
add w9100 performance
1 parent 63ca259 commit 64d0ba3

File tree

9 files changed

+1123
-0
lines changed

9 files changed

+1123
-0
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
################################
2+
# #
3+
# Benchmarking Methodology #
4+
# #
5+
################################
6+
7+
############
8+
# Hardware #
9+
############
10+
W9100
11+
12+
############
13+
# Software #
14+
############
15+
CentOS 6.6
16+
clBLAS 2.6.0
17+
driver 14.502
18+
19+
############
20+
# Settings #
21+
############
22+
gpu clocks: set to max level using proprietary tool though public alternatives exist
23+
clBLAS:
24+
m=n=k=lda=ldb=ldc (for simplicity)
25+
alpha=beta=1
26+
gemms were column-major, op(A,B)=N,T
27+
28+
############
29+
# Sampling #
30+
############
31+
For each data point, we took 10 samples. Each sample consists of 10 gemm calls with a wait afterward. Outlying samples beyond 1 standard deviation were removed (rarely if ever did this actually need to happen). Before running the 10 samples, one warm-up sample was executed (but not included in the stastics).
32+
33+
GFlop/s was calculated as
34+
(2*m*n*k flops) / (host time for 10 kernels / 10) // real data
35+
(8*m*n*k flops) / (host time for 10 kernels / 10) // complex data

doc/performance/clBLAS_2.6.0/W9100/clblas_sgemmNT_w9100_14502.csv

Lines changed: 181 additions & 0 deletions
Large diffs are not rendered by default.

doc/performance/clBLAS_2.6.0/W9100/dgemm_32.csv

Lines changed: 181 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
m,n,k,lda,ldb,ldc,offa,offb,offc,alpha,beta,order,transa,transb,side,uplo,diag,function,device,library,label,GFLOPS
2+
96,96,96,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,68.3722
3+
192,192,192,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,354.426
4+
288,288,288,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,661.531
5+
384,384,384,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,671.407
6+
480,480,480,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,809.931
7+
576,576,576,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1002.17
8+
672,672,672,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,964.788
9+
768,768,768,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1243.32
10+
864,864,864,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1569.26
11+
960,960,960,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1314.69
12+
1056,1056,1056,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1583.11
13+
1152,1152,1152,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1491.83
14+
1248,1248,1248,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1743.47
15+
1344,1344,1344,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1644.05
16+
1440,1440,1440,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1603.53
17+
1536,1536,1536,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1761.43
18+
1632,1632,1632,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1768.79
19+
1728,1728,1728,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1748.98
20+
1824,1824,1824,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1745.31
21+
1920,1920,1920,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1731.84
22+
2016,2016,2016,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1761.08
23+
2112,2112,2112,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1904.14
24+
2208,2208,2208,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1792.97
25+
2304,2304,2304,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1788.94
26+
2400,2400,2400,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1835.3
27+
2496,2496,2496,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1859.23
28+
2592,2592,2592,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1886.85
29+
2688,2688,2688,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1907.93
30+
2784,2784,2784,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1862.47
31+
2880,2880,2880,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1892.57
32+
2976,2976,2976,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1928.16
33+
3072,3072,3072,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1036.59
34+
3168,3168,3168,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1925.25
35+
3264,3264,3264,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1900.1
36+
3360,3360,3360,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1935.78
37+
3456,3456,3456,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1910.43
38+
3552,3552,3552,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1903.26
39+
3648,3648,3648,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1939.59
40+
3744,3744,3744,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1929.17
41+
3840,3840,3840,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1791.19
42+
3936,3936,3936,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1918.13
43+
4032,4032,4032,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1917.51
44+
4128,4128,4128,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1915.76
45+
4224,4224,4224,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1934.22
46+
4320,4320,4320,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1922.69
47+
4416,4416,4416,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1925.57
48+
4512,4512,4512,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1930.22
49+
4608,4608,4608,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,610.825
50+
4704,4704,4704,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1941.67
51+
4800,4800,4800,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1949.36
52+
4896,4896,4896,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1933.11
53+
4992,4992,4992,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1934.95
54+
5088,5088,5088,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1950.66
55+
5184,5184,5184,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1938.98
56+
5280,5280,5280,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1946.98
57+
5376,5376,5376,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1009.04
58+
5472,5472,5472,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1943.35
59+
5568,5568,5568,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1942.79
60+
5664,5664,5664,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1937.59
61+
5760,5760,5760,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,w9100,1938.05
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
m,n,k,lda,ldb,ldc,offa,offb,offc,alpha,beta,order,transa,transb,side,uplo,diag,function,device,library,label,GFLOPS
2+
192,192,192,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,9.2894
3+
384,384,384,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,54.8031
4+
576,576,576,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,139.601
5+
768,768,768,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,255.809
6+
960,960,960,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,408.175
7+
1152,1152,1152,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,527.893
8+
1344,1344,1344,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,664.403
9+
1536,1536,1536,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,464.17
10+
1728,1728,1728,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,838.67
11+
1920,1920,1920,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,915.902
12+
2112,2112,2112,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1037.9
13+
2304,2304,2304,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,994.425
14+
2496,2496,2496,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1080.46
15+
2688,2688,2688,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1132.82
16+
2880,2880,2880,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1167.03
17+
3072,3072,3072,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,974.311
18+
3264,3264,3264,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1213.33
19+
3456,3456,3456,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1238.22
20+
3648,3648,3648,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1247.78
21+
3840,3840,3840,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1206.61
22+
4032,4032,4032,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1280.78
23+
4224,4224,4224,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1286.87
24+
4416,4416,4416,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1285.17
25+
4608,4608,4608,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1212.33
26+
4800,4800,4800,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1315.21
27+
4992,4992,4992,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1322.38
28+
5184,5184,5184,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1340.63
29+
5376,5376,5376,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1313.7
30+
5568,5568,5568,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1355.15
31+
5760,5760,5760,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,w9100_dtrsm_14502,1363.73

0 commit comments

Comments
 (0)