Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 39 additions & 2 deletions media/docs/cpp/xe_rearchitecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ Now that we have the basic thread mapping rule, let's apply it to a simple block
An individual DPAS atom's A matrix follows the same pattern, with height ranging from 1 to 8, and width equal to 8 (tf32), 16 (f16/bf16), or 32 (s8/u8). The DPAS C matrix is also organized this way, except that its width is always 16.

As a more complicated example, let's consider a 16-bit VNNI load, with height = 4, width = 16:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the explanation of how to understand this representation which helped you? I think it would also be good to include (/link to) a good explanation of the VNNI format. Such as:
image

Copy link
Author

@sanchitintel sanchitintel Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing!

Peter clarified that the numbers in subgroup view in the current documentation represent the order of elements in the Xe register file. I added that information in the latest commit in this PR.

I think it would also be good to include (/link to) a good explanation of the VNNI format

I added a link to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc#example-2-16-bit-elements that describes VNNI layout.

```math
\begin{array}{c}
\text{Subgroup view}\\
Expand All @@ -312,6 +313,43 @@ As a more complicated example, let's consider a 16-bit VNNI load, with height =
\end{array}
```

If we instead assume that the values in the VNNI-transformed matrix below refer to the corresponding indices of the original plain layout matrix,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it would be helpful to have more than one different representation? I think it only adds to the confusion if the document contains representation with different meaning for the numbers. If this is needed this should either use a different syntax or clear explanation of the different representation.

Copy link
Author

@sanchitintel sanchitintel Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The representation which entails using subgroup-view values as the logical indices of the original plain-layout matrices depicts the VNNI transformation (as it helps visualize where each element would end up post-transformation), and helps understand that for that particular example, elements in each column of the VNNI-transformed matrix would be owned by a different work-item.

Please note that this representation is also similar to that of the plain layout load in the existing documentation - to a reader, the numbers in its corresponding subgroup view may seem to correspond to logical linear indices of data elements in the loaded block.

The current documentation doesn't mention that for VNNI subgroup view, the numbers correspond to the order of elements in the Xe register file.
I also added that clarification in the latest commit it in this PR.

Thanks!

Copy link
Author

@sanchitintel sanchitintel Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only adds to the confusion if the document contains representation with different meaning for the numbers

@rolandschulz , can you please point out where the current documentation mentions what the numbers mean?

then we can view the 16-bit VNNI load from a different perspective.

```math
\begin{array}{c}
\text{Subgroup view of data in global memory}\\
\begin{array}{cccccc}
0 & 1 & 2 & 3 & \cdots & 15\\
16 & 17 & 18 & 19 & \cdots & 31\\
32 & 33 & 34 & 35 & \cdots & 47\\
48 & 49 & 50 & 51 & \cdots & 63
\end{array}
\end{array}
```

```math
\begin{array}{c}
\text{Subgroup view of data in registers after VNNI transformation that happened during the load}\\
\begin{array}{cccccc}
0 & 16 & 1 & 17 & \cdots & 7 & 23\\
8 & 24 & 9 & 25 & \cdots & 15 & 31\\
32 & 48 & 33 & 49 & \cdots & 39 & 55\\
40 & 56 & 41 & 57 & \cdots & 47 & 63
\end{array}
\end{array}
\rightarrow
\begin{array}{c}
\text{Thread view}\\
\begin{array}{cccc}
\text{T0V0} & \text{T1V0} & \text{T2V0} & \cdots & \text{T15V0}\\
\text{T0V1} & \text{T1V1} & \text{T2V1} & \cdots & \text{T15V1}\\
\text{T0V2} & \text{T1V2} & \text{T2V2} & \cdots & \text{T15V2}\\
\text{T0V3} & \text{T1V3} & \text{T2V3} & \cdots & \text{T15V3}
\end{array}
\end{array}
```

The DPAS B matrix follows the same pattern.


Expand Down Expand Up @@ -504,7 +542,6 @@ gemm_device(ATensor const& A, // (M,K)
}
```


## New Collective MMAs

... coming later!
... coming later!
Loading