Skip to content

Commit 6311c6b

Browse files
committed
adding performance data
1 parent f3471bf commit 6311c6b

File tree

17 files changed

+2119
-0
lines changed

17 files changed

+2119
-0
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
################################
2+
# #
3+
# Benchmarking Methodology #
4+
# #
5+
################################
6+
7+
############
8+
# Hardware #
9+
############
10+
S9150
11+
12+
############
13+
# Software #
14+
############
15+
CentOS 6.6
16+
clBLAS 2.6.0
17+
driver 14.502
18+
19+
############
20+
# Settings #
21+
############
22+
gpu clocks: set to max level using proprietary tool though public alternatives exist
23+
clBLAS:
24+
m=n=k=lda=ldb=ldc (for simplicity)
25+
alpha=beta=1
26+
gemms were column-major, op(A,B)=N,T
27+
28+
############
29+
# Sampling #
30+
############
31+
For each data point, we took 10 samples. Each sample consists of 10 gemm calls with a wait afterward. Outlying samples beyond 1 standard deviation were removed (rarely if ever did this actually need to happen). Before running the 10 samples, one warm-up sample was executed (but not included in the stastics).
32+
33+
GFlop/s was calculated as
34+
(2*m*n*k flops) / (host time for 10 kernels / 10) // real data
35+
(8*m*n*k flops) / (host time for 10 kernels / 10) // complex data

doc/performance/clBLAS_2.6.0/S9150/dgemm_32.csv

Lines changed: 181 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
m,n,k,lda,ldb,ldc,offa,offb,offc,alpha,beta,order,transa,transb,side,uplo,diag,function,device,library,label,GFLOPS
2+
96,96,96,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,62.8587
3+
192,192,192,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,290.018
4+
288,288,288,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,592.678
5+
384,384,384,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,611.778
6+
480,480,480,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,751.024
7+
576,576,576,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,937.285
8+
672,672,672,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,909.679
9+
768,768,768,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1177.76
10+
864,864,864,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1494.86
11+
960,960,960,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1261.75
12+
1056,1056,1056,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1521.25
13+
1152,1152,1152,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1434.59
14+
1248,1248,1248,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1677.97
15+
1344,1344,1344,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1585.51
16+
1440,1440,1440,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1547.27
17+
1536,1536,1536,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1702.19
18+
1632,1632,1632,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1712.63
19+
1728,1728,1728,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1691.28
20+
1824,1824,1824,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1688.98
21+
1920,1920,1920,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1679.2
22+
2016,2016,2016,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1706.24
23+
2112,2112,2112,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1846.23
24+
2208,2208,2208,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1738.61
25+
2304,2304,2304,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1735.26
26+
2400,2400,2400,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1780.01
27+
2496,2496,2496,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1803.69
28+
2592,2592,2592,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1829.92
29+
2688,2688,2688,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1852.93
30+
2784,2784,2784,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1807.14
31+
2880,2880,2880,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1837.23
32+
2976,2976,2976,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1870.96
33+
3072,3072,3072,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1153.64
34+
3168,3168,3168,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1869.52
35+
3264,3264,3264,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1844.1
36+
3360,3360,3360,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1880.59
37+
3456,3456,3456,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1855.69
38+
3552,3552,3552,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1847.63
39+
3648,3648,3648,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1883.89
40+
3744,3744,3744,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1874.28
41+
3840,3840,3840,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1824.94
42+
3936,3936,3936,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1863.14
43+
4032,4032,4032,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1861.9
44+
4128,4128,4128,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1860.46
45+
4224,4224,4224,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1878.57
46+
4320,4320,4320,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1867.23
47+
4416,4416,4416,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1869.6
48+
4512,4512,4512,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1874.17
49+
4608,4608,4608,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,622.126
50+
4704,4704,4704,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1885.94
51+
4800,4800,4800,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1893.31
52+
4896,4896,4896,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1877.67
53+
4992,4992,4992,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1879.83
54+
5088,5088,5088,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1894.9
55+
5184,5184,5184,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1883.3
56+
5280,5280,5280,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1892.44
57+
5376,5376,5376,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1057.62
58+
5472,5472,5472,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1890.31
59+
5568,5568,5568,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1887.85
60+
5664,5664,5664,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1882.64
61+
5760,5760,5760,0,0,0,0,0,0,1.0,1.0,column,none,transpose,left,upper,unit,dgemm,gpu,clblas,S9150,1887.03
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
m,n,k,lda,ldb,ldc,offa,offb,offc,alpha,beta,order,transa,transb,side,uplo,diag,function,device,library,label,GFLOPS
2+
192,192,192,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,8.9202
3+
384,384,384,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,46.185
4+
576,576,576,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,126.686
5+
768,768,768,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,235.366
6+
960,960,960,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,375.406
7+
1152,1152,1152,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,475.497
8+
1344,1344,1344,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,599.527
9+
1536,1536,1536,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,437.835
10+
1728,1728,1728,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,778.815
11+
1920,1920,1920,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,845.844
12+
2112,2112,2112,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,969.624
13+
2304,2304,2304,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,943.48
14+
2496,2496,2496,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1026.58
15+
2688,2688,2688,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1074.56
16+
2880,2880,2880,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1102.6
17+
3072,3072,3072,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,848.076
18+
3264,3264,3264,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1010.06
19+
3456,3456,3456,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1034.51
20+
3648,3648,3648,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1059.02
21+
3840,3840,3840,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1037.95
22+
4032,4032,4032,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1103.8
23+
4224,4224,4224,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1109.83
24+
4416,4416,4416,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1096.15
25+
4608,4608,4608,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1055.28
26+
4800,4800,4800,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1140.07
27+
4992,4992,4992,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1152.31
28+
5184,5184,5184,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1165.47
29+
5376,5376,5376,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1152.36
30+
5568,5568,5568,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1193.66
31+
5760,5760,5760,0,0,0,0,0,0,1.0,1.0,column,none,none,right,upper,unit,dtrsm,gpu,clblas,s9150_dtrsm_14502,1199.05
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
2+
# sgemm AMD vs NVIDIA
3+
python ../../../../src/scripts/perf/plotPerformance.py \
4+
-d peak_sp.csv \
5+
-d ../../cuBLAS_7.0/Tesla_K40/peak_sp.csv \
6+
-d sgemm_96.csv \
7+
-d ../../cuBLAS_7.0/Tesla_K40/sgemm.csv \
8+
-x sizem --x_axis_label "m,n,k" \
9+
-y gflops --y_axis_label "GFlop/s" \
10+
--x_axis_scale linear \
11+
--plot label \
12+
--title "sgemm S9150 vs K40" --outputfile sgemm_S9150_K40.png
13+
14+
# dgemm AMD vs NVIDIA
15+
python ../../../../src/scripts/perf/plotPerformance.py \
16+
-d peak_dp.csv \
17+
-d ../../cuBLAS_7.0/Tesla_K40/peak_dp.csv \
18+
-d dgemm_96.csv \
19+
-d ../../cuBLAS_7.0/Tesla_K40/dgemm.csv \
20+
-x sizem --x_axis_label "m,n,k" \
21+
-y gflops --y_axis_label "GFlop/s" \
22+
--x_axis_scale linear \
23+
--plot label \
24+
--title "dgemm S9150 vs K40" --outputfile dgemm_S9150_K40.png
25+

0 commit comments

Comments
 (0)