-
Notifications
You must be signed in to change notification settings - Fork 64
[DOCS] Clarify existing VNNI load visualization and add another #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -290,6 +290,7 @@ Now that we have the basic thread mapping rule, let's apply it to a simple block | |
| An individual DPAS atom's A matrix follows the same pattern, with height ranging from 1 to 8, and width equal to 8 (tf32), 16 (f16/bf16), or 32 (s8/u8). The DPAS C matrix is also organized this way, except that its width is always 16. | ||
|
|
||
| As a more complicated example, let's consider a 16-bit VNNI load, with height = 4, width = 16: | ||
|
|
||
| ```math | ||
| \begin{array}{c} | ||
| \text{Subgroup view}\\ | ||
|
|
@@ -312,6 +313,43 @@ As a more complicated example, let's consider a 16-bit VNNI load, with height = | |
| \end{array} | ||
| ``` | ||
|
|
||
| If we instead assume that the values in the VNNI-transformed matrix below refer to the corresponding indices of the original plain layout matrix, | ||
|
||
| then we can view the 16-bit VNNI load from a different perspective. | ||
|
|
||
| ```math | ||
| \begin{array}{c} | ||
| \text{Subgroup view of data in global memory}\\ | ||
| \begin{array}{cccccc} | ||
| 0 & 1 & 2 & 3 & \cdots & 15\\ | ||
| 16 & 17 & 18 & 19 & \cdots & 31\\ | ||
| 32 & 33 & 34 & 35 & \cdots & 47\\ | ||
| 48 & 49 & 50 & 51 & \cdots & 63 | ||
| \end{array} | ||
| \end{array} | ||
| ``` | ||
|
|
||
| ```math | ||
| \begin{array}{c} | ||
| \text{Subgroup view of data in registers after VNNI transformation that happened during the load}\\ | ||
| \begin{array}{cccccc} | ||
| 0 & 16 & 1 & 17 & \cdots & 7 & 23\\ | ||
| 8 & 24 & 9 & 25 & \cdots & 15 & 31\\ | ||
| 32 & 48 & 33 & 49 & \cdots & 39 & 55\\ | ||
| 40 & 56 & 41 & 57 & \cdots & 47 & 63 | ||
| \end{array} | ||
| \end{array} | ||
| \rightarrow | ||
| \begin{array}{c} | ||
| \text{Thread view}\\ | ||
| \begin{array}{cccc} | ||
| \text{T0V0} & \text{T1V0} & \text{T2V0} & \cdots & \text{T15V0}\\ | ||
| \text{T0V1} & \text{T1V1} & \text{T2V1} & \cdots & \text{T15V1}\\ | ||
| \text{T0V2} & \text{T1V2} & \text{T2V2} & \cdots & \text{T15V2}\\ | ||
| \text{T0V3} & \text{T1V3} & \text{T2V3} & \cdots & \text{T15V3} | ||
| \end{array} | ||
| \end{array} | ||
| ``` | ||
|
|
||
| The DPAS B matrix follows the same pattern. | ||
|
|
||
|
|
||
|
|
@@ -504,7 +542,6 @@ gemm_device(ATensor const& A, // (M,K) | |
| } | ||
| ``` | ||
|
|
||
|
|
||
| ## New Collective MMAs | ||
|
|
||
| ... coming later! | ||
| ... coming later! | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the explanation of how to understand this representation which helped you? I think it would also be good to include (/link to) a good explanation of the VNNI format. Such as:

Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
Peter clarified that the numbers in subgroup view in the current documentation represent the order of elements in the Xe register file. I added that information in the latest commit in this PR.
I added a link to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc#example-2-16-bit-elements that describes VNNI layout.