Skip to content

Commit 9d68e3a

Browse files
committed
chore: fix typo improved benchmark results
1 parent c6eca17 commit 9d68e3a

File tree

1 file changed

+31
-31
lines changed

1 file changed

+31
-31
lines changed

docs_sphinx/submissions/report_25_05_01.rst

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -183,10 +183,10 @@ These 3 different ``fmla`` blocks gets repeated with ``.rept 2`` to achieve the
183183
184184
**Benchmarks**
185185

186-
We run the benchmark with the following command:
186+
We run the benchmark with the following command:
187187

188-
.. code-block::
189-
188+
.. code-block::
189+
190190
./benchmarks --benchmark_counters_tabular=true --benchmark_repetitions=10 --benchmark_report_aggregates_only=true
191191
192192
Therefore we do 10 repetitions of the benchmark which do about ``120 000 000`` iterations each on our matmul kernels.
@@ -197,17 +197,17 @@ Therefore we do 10 repetitions of the benchmark which do about ``120 000 000`` i
197197
----------------------------------------------------------------------------------------------------------------------------------
198198
Benchmark Time CPU Iterations FLOPS
199199
----------------------------------------------------------------------------------------------------------------------------------
200-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_mean 5.89 ns 5.87 ns 10 32.7048G/s
201-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_median 5.89 ns 5.87 ns 10 32.7228G/s
202-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_stddev 0.046 ns 0.044 ns 10 244.331M/s
203-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_cv 0.77 % 0.75 % 10 0.75%
204-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_mean 5.74 ns 5.72 ns 10 33.5453G/s
205-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_median 5.73 ns 5.71 ns 10 33.6103G/s
206-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_stddev 0.051 ns 0.050 ns 10 291.918M/s
207-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_cv 0.90 % 0.88 % 10 0.87%
200+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_mean 5.84 ns 5.82 ns 10 33.0036G/s
201+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_median 5.83 ns 5.81 ns 10 33.0317G/s
202+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_stddev 0.025 ns 0.025 ns 10 143.339M/s
203+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_cv 0.43 % 0.44 % 10 0.43%
204+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_mean 5.71 ns 5.69 ns 10 33.7234G/s
205+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_median 5.70 ns 5.68 ns 10 33.7732G/s
206+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_stddev 0.038 ns 0.038 ns 10 224.892M/s
207+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_cv 0.67 % 0.67 % 10 0.67
208208
209-
We see that the simple first implementation of our matmul kernel gets about **32.7 GFLOPS**.
210-
The optimized unrolled version gets about 0.8 GFLOPS more resulting in **33.5 GFLOPS**.
209+
We see that the simple first implementation of our matmul kernel gets about **33.0 GFLOPS**.
210+
The optimized unrolled version gets about 0.7 GFLOPS more resulting in **33.7 GFLOPS**.
211211

212212

213213
Loops
@@ -395,7 +395,7 @@ Loops
395395

396396
**Optimization**
397397

398-
Usage of already optmiized `matmul_16_6_1` from task 2.
398+
Usage of already optimized `matmul_16_6_1` from task 2.
399399

400400
**Benchmarks**
401401

@@ -412,20 +412,20 @@ We run the benchmark with the following command:
412412
----------------------------------------------------------------------------------------------------------------------------------
413413
Benchmark Time CPU Iterations FLOPS
414414
----------------------------------------------------------------------------------------------------------------------------------
415-
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_mean 396 ns 396 ns 10 31.0266G/s
416-
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_median 396 ns 396 ns 10 31.0281G/s
417-
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_stddev 0.069 ns 0.057 ns 10 4.50274M/s
418-
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_cv 0.02 % 0.01 % 10 0.01%
419-
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_mean 1728 ns 1728 ns 10 28.4438G/s
420-
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_median 1728 ns 1728 ns 10 28.4445G/s
421-
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_stddev 0.115 ns 0.106 ns 10 1.7484M/s
422-
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_cv 0.01 % 0.01 % 10 0.01%
423-
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_mean 13078 ns 13077 ns 10 22.5524G/s
424-
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_median 13078 ns 13077 ns 10 22.552G/s
425-
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_stddev 1.83 ns 1.60 ns 10 2.76464M/s
426-
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_cv 0.01 % 0.01 % 10 0.01%
427-
428-
429-
- Mean FLOPS for loop over K: **31.0 GFLOPS**.
430-
- Mean FLOPS for loop over M: **28.4 GFLOPS**.
431-
- Mean FLOPS for loop over N: **22.6 GFLOPS**.
415+
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_mean 368 ns 367 ns 10 33.4632G/s
416+
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_median 368 ns 367 ns 10 33.5034G/s
417+
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_stddev 1.78 ns 1.75 ns 10 158.697M/s
418+
GemmMxNxKFixture<16, 6, 64>/BM_matmul_16_6_64/min_warmup_time:1.000_cv 0.48 % 0.48 % 10 0.47%
419+
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_mean 1526 ns 1520 ns 10 32.3285G/s
420+
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_median 1526 ns 1520 ns 10 32.3321G/s
421+
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_stddev 10.2 ns 9.97 ns 10 211.542M/s
422+
GemmMxNxKFixture<64, 6, 64>/BM_matmul_64_6_64/min_warmup_time:1.000_cv 0.67 % 0.66 % 10 0.65%
423+
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_mean 12177 ns 12135 ns 10 24.3028G/s
424+
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_median 12167 ns 12126 ns 10 24.3211G/s
425+
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_stddev 54.9 ns 54.1 ns 10 107.995M/s
426+
GemmMxNxKFixture<64, 48, 64>/BM_matmul_64_48_64/min_warmup_time:1.000_cv 0.45 % 0.45 % 10 0.44%
427+
428+
429+
- Mean FLOPS for loop over K: **33.5 GFLOPS**.
430+
- Mean FLOPS for loop over M: **32.3 GFLOPS**.
431+
- Mean FLOPS for loop over N: **24.3 GFLOPS**.

0 commit comments

Comments
 (0)