Skip to content

Commit 153a047

Browse files
committed
Merge branch 'main' of github.com:Integer-Ctrl/machine-learning-compilers
2 parents 86a1527 + 9d68e3a commit 153a047

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

docs_sphinx/submissions/report_25_05_01.rst

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -183,10 +183,10 @@ These 3 different ``fmla`` blocks gets repeated with ``.rept 2`` to achieve the
183183
184184
**Benchmarks**
185185

186-
We run the benchmark with the following command:
186+
We run the benchmark with the following command:
187187

188-
.. code-block::
189-
188+
.. code-block::
189+
190190
./benchmarks --benchmark_counters_tabular=true --benchmark_repetitions=10 --benchmark_report_aggregates_only=true
191191
192192
Therefore we do 10 repetitions of the benchmark which do about ``120 000 000`` iterations each on our matmul kernels.
@@ -197,17 +197,17 @@ Therefore we do 10 repetitions of the benchmark which do about ``120 000 000`` i
197197
----------------------------------------------------------------------------------------------------------------------------------
198198
Benchmark Time CPU Iterations FLOPS
199199
----------------------------------------------------------------------------------------------------------------------------------
200-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_mean 5.89 ns 5.87 ns 10 32.7048G/s
201-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_median 5.89 ns 5.87 ns 10 32.7228G/s
202-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_stddev 0.046 ns 0.044 ns 10 244.331M/s
203-
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_cv 0.77 % 0.75 % 10 0.75%
204-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_mean 5.74 ns 5.72 ns 10 33.5453G/s
205-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_median 5.73 ns 5.71 ns 10 33.6103G/s
206-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_stddev 0.051 ns 0.050 ns 10 291.918M/s
207-
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_cv 0.90 % 0.88 % 10 0.87%
200+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_mean 5.84 ns 5.82 ns 10 33.0036G/s
201+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_median 5.83 ns 5.81 ns 10 33.0317G/s
202+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_stddev 0.025 ns 0.025 ns 10 143.339M/s
203+
Gemm16x6x1Fixture/BM_matmul_16_6_1_simple/min_warmup_time:1.000_cv 0.43 % 0.44 % 10 0.43%
204+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_mean 5.71 ns 5.69 ns 10 33.7234G/s
205+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_median 5.70 ns 5.68 ns 10 33.7732G/s
206+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_stddev 0.038 ns 0.038 ns 10 224.892M/s
207+
Gemm16x6x1Fixture/BM_matmul_16_6_1_unrolled/min_warmup_time:1.000_cv 0.67 % 0.67 % 10 0.67
208208
209-
We see that the simple first implementation of our matmul kernel gets about **32.7 GFLOPS**.
210-
The optimized unrolled version gets about 0.8 GFLOPS more resulting in **33.5 GFLOPS**.
209+
We see that the simple first implementation of our matmul kernel gets about **33.0 GFLOPS**.
210+
The optimized unrolled version gets about 0.7 GFLOPS more resulting in **33.7 GFLOPS**.
211211

212212

213213
Loops
@@ -438,7 +438,7 @@ Loops
438438

439439
**Optimization**
440440

441-
Usage of already optmiized `matmul_16_6_1` from task 2.
441+
Usage of already optimized `matmul_16_6_1` from task 2.
442442

443443
**Benchmarks**
444444

0 commit comments

Comments
 (0)