|
| 1 | +========================================================================================================================= |
| 2 | +Time to run matmul_8x8x8_col_row 100000000 times = 0.769876 seconds (1.330084e+11 flops) |
| 3 | +Time to run matmul_8x8x8_col_col 100000000 times = 0.890074 seconds (1.150467e+11 flops) |
| 4 | +Time to run matmul_8x8x8_col_row_just_loads 100000000 times = 0.398886 seconds (2.567147e+11 flops) |
| 5 | +Time to run matmul_8x8x8_col_row_with_loads 100000000 times = 0.776930 seconds (1.318008e+11 flops) |
| 6 | +Time to run matmul_8x8x8_col_col_with_loads 100000000 times = 0.957839 seconds (1.069073e+11 flops) |
| 7 | +Time to run matmul_8x8x8_col_row_with_loads_and_stores 100000000 times = 1.532985 seconds (6.679780e+10 flops) |
| 8 | +Time to run matmul_8x8x8_col_col_with_loads_and_stores 100000000 times = 1.828421 seconds (5.600461e+10 flops) |
| 9 | +Time to run matmul_8x8x16_col_row_with_loads_and_stores 100000000 times = 2.342064 seconds (8.744423e+10 flops) |
| 10 | +Time to run matmul_8x8x24_col_row_with_loads_and_stores 100000000 times = 3.108504 seconds (9.882569e+10 flops) |
| 11 | +Time to run matmul_8x8x32_col_row_with_loads_and_stores 100000000 times = 3.872548 seconds (1.057701e+11 flops) |
| 12 | +Time to run matmul_8x8x40_col_row_with_loads_and_stores 100000000 times = 4.644594 seconds (1.102357e+11 flops) |
| 13 | +Time to run matmul_8x8x48_col_row_with_loads_and_stores 100000000 times = 5.396578 seconds (1.138499e+11 flops) |
| 14 | +Time to run matmul_8x8x56_col_row_with_loads_and_stores 100000000 times = 6.150837 seconds (1.165370e+11 flops) |
| 15 | +Time to run matmul_8x8x64_col_row_with_loads_and_stores 100000000 times = 6.908037 seconds (1.185865e+11 flops) |
| 16 | +Time to run matmul_8x8x64_col_col_with_loads_and_stores 100000000 times = 8.524576 seconds (9.609862e+10 flops) |
| 17 | +Time to run matmul_8x8x64_col_col_with_loads_and_stores_store_B 100000000 times = 8.815192 seconds (9.293048e+10 flops) |
| 18 | +Time to run matmul_16x8x64_col_col_with_loads_and_stores 100000000 times = 15.768924 seconds (1.039006e+11 flops) |
| 19 | +Time to run matmul_24x8x64_col_col_with_loads_and_stores 100000000 times = 23.329507 seconds (1.053430e+11 flops) |
| 20 | +Time to run matmul_32x8x64_col_col_with_loads_and_stores 100000000 times = 30.236062 seconds (1.083739e+11 flops) |
| 21 | +Time to run matmul_40x8x64_col_col_with_loads_and_stores 100000000 times = 37.139175 seconds (1.102879e+11 flops) |
| 22 | +Time to run matmul_48x8x64_col_col_with_loads_and_stores 100000000 times = 44.042653 seconds (1.116009e+11 flops) |
| 23 | +Time to run matmul_56x8x64_col_col_with_loads_and_stores 100000000 times = 50.933473 seconds (1.125861e+11 flops) |
| 24 | +Time to run matmul_64x8x64_col_col_with_loads_and_stores 100000000 times = 57.852084 seconds (1.132820e+11 flops) |
| 25 | + |
| 26 | + Performance counter stats for 'numactl -C 8 ./matmul': |
| 27 | + |
| 28 | + 316,780.18 msec task-clock:u # 0.999 CPUs utilized |
| 29 | + 0 context-switches:u # 0.000 /sec |
| 30 | + 0 cpu-migrations:u # 0.000 /sec |
| 31 | + 11,873 page-faults:u # 37.480 /sec |
| 32 | + 3,756,244,221,231 instructions:u # 2.77 insn per cycle (38.45%) |
| 33 | + 1,357,548,589,797 cycles:u # 4.285 GHz (46.15%) |
| 34 | + 6,281,174,371 branches:u # 19.828 M/sec (46.16%) |
| 35 | + 5,360,488 branch-misses:u # 0.09% of all branches (46.15%) |
| 36 | + 1,151,249,509,740 L1-dcache-loads:u # 3.634 G/sec (38.47%) |
| 37 | + 7,050,668 L1-dcache-load-misses:u # 0.00% of all L1-dcache accesses (15.39%) |
| 38 | + 7,049,902 LLC-loads:u # 22.255 K/sec (15.39%) |
| 39 | + 3,763 LLC-load-misses:u # 0.05% of all LL-cache accesses (15.38%) |
| 40 | + 474,687,354,193 L1-icache-loads:u # 1.498 G/sec (23.08%) |
| 41 | + 646,003 L1-icache-load-misses:u # 0.00% of all L1-icache accesses (30.76%) |
| 42 | + 20,516 dTLB-load-misses:u (23.07%) |
| 43 | + 2,990 iTLB-load-misses:u (30.76%) |
| 44 | + 65,107,925 L1-dcache-prefetches:u # 205.530 K/sec (30.76%) |
| 45 | + |
| 46 | + 317.006601119 seconds time elapsed |
| 47 | + |
| 48 | + 316.704260000 seconds user |
| 49 | + 0.069967000 seconds sys |
| 50 | + |
| 51 | + |
| 52 | +========================================================================================================================= |
| 53 | +Time to run matmul_8x8x8_col_row 100000000 times = 0.768971 seconds (1.331650e+11 flops) |
| 54 | +Time to run matmul_8x8x8_col_col 100000000 times = 0.890030 seconds (1.150523e+11 flops) |
| 55 | +Time to run matmul_8x8x8_col_row_just_loads 100000000 times = 0.399246 seconds (2.564838e+11 flops) |
| 56 | +Time to run matmul_8x8x8_col_row_with_loads 100000000 times = 0.778262 seconds (1.315752e+11 flops) |
| 57 | +Time to run matmul_8x8x8_col_col_with_loads 100000000 times = 0.958104 seconds (1.068777e+11 flops) |
| 58 | +Time to run matmul_8x8x8_col_row_with_loads_and_stores 100000000 times = 1.532526 seconds (6.681778e+10 flops) |
| 59 | +Time to run matmul_8x8x8_col_col_with_loads_and_stores 100000000 times = 1.827643 seconds (5.602845e+10 flops) |
| 60 | +Time to run matmul_8x8x16_col_row_with_loads_and_stores 100000000 times = 2.340351 seconds (8.750825e+10 flops) |
| 61 | +Time to run matmul_8x8x24_col_row_with_loads_and_stores 100000000 times = 3.108394 seconds (9.882917e+10 flops) |
| 62 | +Time to run matmul_8x8x32_col_row_with_loads_and_stores 100000000 times = 3.871464 seconds (1.057998e+11 flops) |
| 63 | +Time to run matmul_8x8x40_col_row_with_loads_and_stores 100000000 times = 4.635839 seconds (1.104439e+11 flops) |
| 64 | +Time to run matmul_8x8x48_col_row_with_loads_and_stores 100000000 times = 5.391231 seconds (1.139628e+11 flops) |
| 65 | +Time to run matmul_8x8x56_col_row_with_loads_and_stores 100000000 times = 6.147851 seconds (1.165936e+11 flops) |
| 66 | +Time to run matmul_8x8x64_col_row_with_loads_and_stores 100000000 times = 6.903446 seconds (1.186654e+11 flops) |
| 67 | +Time to run matmul_8x8x64_col_col_with_loads_and_stores 100000000 times = 8.516832 seconds (9.618600e+10 flops) |
| 68 | +Time to run matmul_8x8x64_col_col_with_loads_and_stores_store_B 100000000 times = 8.817721 seconds (9.290383e+10 flops) |
| 69 | +Time to run matmul_16x8x64_col_col_with_loads_and_stores 100000000 times = 15.757389 seconds (1.039766e+11 flops) |
| 70 | +Time to run matmul_24x8x64_col_col_with_loads_and_stores 100000000 times = 23.322153 seconds (1.053762e+11 flops) |
| 71 | +Time to run matmul_32x8x64_col_col_with_loads_and_stores 100000000 times = 30.221118 seconds (1.084275e+11 flops) |
| 72 | +Time to run matmul_40x8x64_col_col_with_loads_and_stores 100000000 times = 37.130963 seconds (1.103122e+11 flops) |
| 73 | +Time to run matmul_48x8x64_col_col_with_loads_and_stores 100000000 times = 44.022626 seconds (1.116517e+11 flops) |
| 74 | +Time to run matmul_56x8x64_col_col_with_loads_and_stores 100000000 times = 50.917068 seconds (1.126224e+11 flops) |
| 75 | +Time to run matmul_64x8x64_col_col_with_loads_and_stores 100000000 times = 57.819558 seconds (1.133457e+11 flops) |
| 76 | + |
| 77 | + Performance counter stats for 'numactl -C 8 ./matmul': |
| 78 | + |
| 79 | + 316,754.96 msec task-clock:u # 0.999 CPUs utilized |
| 80 | + 0 context-switches:u # 0.000 /sec |
| 81 | + 0 cpu-migrations:u # 0.000 /sec |
| 82 | + 11,881 page-faults:u # 37.508 /sec |
| 83 | + 3,759,143,257,580 instructions:u # 2.77 insn per cycle (38.46%) |
| 84 | + 1,357,809,970,649 cycles:u # 4.287 GHz (46.16%) |
| 85 | + 6,558,107,463 branches:u # 20.704 M/sec (46.16%) |
| 86 | + 5,149,489 branch-misses:u # 0.08% of all branches (46.17%) |
| 87 | + 1,151,758,469,873 L1-dcache-loads:u # 3.636 G/sec (38.48%) |
| 88 | + 6,402,139 L1-dcache-load-misses:u # 0.00% of all L1-dcache accesses (15.37%) |
| 89 | + 6,470,866 LLC-loads:u # 20.429 K/sec (15.37%) |
| 90 | + 6,179 LLC-load-misses:u # 0.10% of all LL-cache accesses (15.39%) |
| 91 | + 474,965,018,980 L1-icache-loads:u # 1.499 G/sec (23.08%) |
| 92 | + 569,979 L1-icache-load-misses:u # 0.00% of all L1-icache accesses (30.77%) |
| 93 | + 19,478 dTLB-load-misses:u (23.08%) |
| 94 | + 3,454 iTLB-load-misses:u (30.77%) |
| 95 | + 65,201,429 L1-dcache-prefetches:u # 205.842 K/sec (30.77%) |
| 96 | + |
| 97 | + 317.017656934 seconds time elapsed |
| 98 | + |
| 99 | + 316.700865000 seconds user |
| 100 | + 0.049959000 seconds sys |
| 101 | + |
| 102 | + |
0 commit comments