ci: reduced verbose benchmark log

Integer-Ctrl · Integer-Ctrl · commit 0aa1d6ccdef0 · 2025-04-29T21:07:38.000+02:00
diff --git a/.github/workflows/benchmarks.yml b/.github/workflows/benchmarks.yml
@@ -30,5 +30,5 @@ jobs:
 
     - name: Benchmark
       working-directory: ${{github.workspace}}/build
-      run: ./benchmarks --benchmark_counters_tabular=true --benchmark_repetitions=10
+      run: ./benchmarks --benchmark_counters_tabular=true --benchmark_repetitions=10 --benchmark_report_aggregates_only=true
 
diff --git a/docs_sphinx/submissions/report_25_05_01.rst b/docs_sphinx/submissions/report_25_05_01.rst
@@ -63,3 +63,14 @@ This section microbenchmarks the execution throughput and latency of FP32 Neon i
 - Compilation: ``g++ -o neon_1_2.exe neon_1_2_driver.cpp neon_1_2.s``
 - We have :math:`11.7019 \cdot 10^9` instruction per seconds in a single ALU.
   Resulting in a **latency of** :math:`\approx 3` **cycle** for the known clock speed of 4.4 GHz.
+
+
+Microkernel
+-----------
+
+Implement a Neon microkernel that computes C+=AB for M=16, N=6, and K=1. Wrap your microkernel in the `matmul_16_6_1` function.
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+Test and optimize your microkernel. Report its performance in GFLOPS.
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^