Skip to content

Commit 9fbadc1

Browse files
authored
Merge pull request #1600 from Arnaud-de-Grandmaison-ARM/sme2-fixes
[SME2] Address review comments.
2 parents 2e59456 + 1f34e6e commit 9fbadc1

File tree

2 files changed

+16
-4
lines changed

2 files changed

+16
-4
lines changed

content/learning-paths/cross-platform/sme2/2-check-your-environment.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ llvm-objdump --demangle -d sme2_matmul_intr > sme2_matmul_intr.lst
4040
- It creates the assembly listings for the four executables: ``hello.lst``, ``sme2_check.lst``, ``sme2_matmul_asm.lst``, and ``sme2_matmul_intr.lst``.
4141

4242
{{% notice Note %}}
43-
At any point, you can clean the directory of all the files that have been built by invoking the ``make clean`` target:
43+
At any point, you can clean the directory of all the files that have been built by invoking ``make clean``:
4444

4545
```BASH
4646
$ docker run --rm -v "$PWD:/work" -w /work armswdev/sme2-learning-path:sme2-environment-v1 make clean
@@ -170,4 +170,4 @@ Checking in_streaming_mode: 0
170170
Info: /OSCI/SystemC: Simulation stopped by user.
171171
```
172172

173-
You have now checked that the code can be compiled and run with full SME2 support, and are all set to move to the next section.
173+
You have now checked that the code can be compiled and run with full SME2 support, and are all set to move to the next section.

content/learning-paths/cross-platform/sme2/4-outer-product.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,11 @@ order. This means that loading row-data from memory is efficient as the memory
4343
system operates efficiently with contiguous data. An example of this is where caches are loaded row by row, and data prefetching is simple - just load the data from ``current address + sizeof(data)``. This is not the case for loading column-data from memory though, as it requires more work from the memory system.
4444

4545
In order to further improve the effectiveness of the matrix multiplication, it
46-
is therefore desirable to change the layout in memory of the left-hand side matrix, which is called ``matLeft`` in the code examples in this Learning Path, which essentially performs a matrix
47-
transposition so that instead of loading column-data from memory, one loads row-data.
46+
is therefore desirable to change the layout in memory of the left-hand side
47+
matrix, which is called ``matLeft`` in the code examples in this Learning Path.
48+
The improved layout would ensure that elements from the same column are located
49+
next to each other in memory. This is essentially a matrix transposition,
50+
which changes ``matLeft`` from row-major order to column-major order.
4851

4952
{{% notice Important %}}
5053
It is important to note here that this reorganizes the layout of the matrix in
@@ -98,3 +101,12 @@ void preprocess_l(uint64_t nbr, uint64_t nbc, uint64_t SVL,
98101
``preprocess_l`` will be used to check the assembly and intrinsic versions of
99102
the matrix multiplication perform the preprocessing step correctly. This code is
100103
located in file ``preprocess_vanilla.c``.
104+
105+
{{% notice Note %}}
106+
In a real-world application, it may be possible to arrange for ``matLeft`` to
107+
be stored in column-major order, in which case no further transposition would
108+
be needed, and the preprocess step would be unncessary. Matrix processing
109+
frameworks / libraries often have some attributes with the Matrix object to
110+
track if it is row- or column-major order, and / or if it has been transposed
111+
to avoid unncessary computations.
112+
{{% /notice %}}

0 commit comments

Comments
 (0)