Skip to content

Commit 0aa1d6c

Browse files
committed
ci: reduced verbose benchmark log
1 parent e510e92 commit 0aa1d6c

File tree

2 files changed

+12
-1
lines changed

2 files changed

+12
-1
lines changed

.github/workflows/benchmarks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,5 @@ jobs:
3030

3131
- name: Benchmark
3232
working-directory: ${{github.workspace}}/build
33-
run: ./benchmarks --benchmark_counters_tabular=true --benchmark_repetitions=10
33+
run: ./benchmarks --benchmark_counters_tabular=true --benchmark_repetitions=10 --benchmark_report_aggregates_only=true
3434

docs_sphinx/submissions/report_25_05_01.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,3 +63,14 @@ This section microbenchmarks the execution throughput and latency of FP32 Neon i
6363
- Compilation: ``g++ -o neon_1_2.exe neon_1_2_driver.cpp neon_1_2.s``
6464
- We have :math:`11.7019 \cdot 10^9` instruction per seconds in a single ALU.
6565
Resulting in a **latency of** :math:`\approx 3` **cycle** for the known clock speed of 4.4 GHz.
66+
67+
68+
Microkernel
69+
-----------
70+
71+
Implement a Neon microkernel that computes C+=AB for M=16, N=6, and K=1. Wrap your microkernel in the `matmul_16_6_1` function.
72+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
73+
74+
75+
Test and optimize your microkernel. Report its performance in GFLOPS.
76+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)