-
Notifications
You must be signed in to change notification settings - Fork 64
[DOCS] Clarify existing VNNI load visualization and add another #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The current visualization of the subgroup view of the VNNI load visualization does not correspond to the actual values, so it might be confusing.
|
@sanchitintel The existing illustration is correct, though I think there might be some confusion about what the numbers mean. The numbers are indicating the order of data elements in registers, and the positions are always consistent with their logical positions in the matrix. I think it may be confusing for readers switching from the logical view to a subgroup "TV view" as in this PR. |
|
Thanks for reviewing, @petercad!
Sorry, I'm confused. Can you please elaborate with an example? Thanks!
The current documentation already has a TV view on the right side of subgroup view. To clarify, I mean that if the numbers correspond to logical coordinates in the plain-layout matrix, then the following VNNI-transformation is incorrect: It should instead be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description needs to be updated.
| An individual DPAS atom's A matrix follows the same pattern, with height ranging from 1 to 8, and width equal to 8 (tf32), 16 (f16/bf16), or 32 (s8/u8). The DPAS C matrix is also organized this way, except that its width is always 16. | ||
|
|
||
| As a more complicated example, let's consider a 16-bit VNNI load, with height = 4, width = 16: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
Peter clarified that the numbers in subgroup view in the current documentation represent the order of elements in the Xe register file. I added that information in the latest commit in this PR.
I think it would also be good to include (/link to) a good explanation of the VNNI format
I added a link to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc#example-2-16-bit-elements that describes VNNI layout.
media/docs/cpp/xe_rearchitecture.md
Outdated
| \end{array} | ||
| ``` | ||
|
|
||
| If we instead assume that the values in the VNNI-transformed matrix below refer to the corresponding indices of the original plain layout matrix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think it would be helpful to have more than one different representation? I think it only adds to the confusion if the document contains representation with different meaning for the numbers. If this is needed this should either use a different syntax or clear explanation of the different representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The representation which entails using subgroup-view values as the logical indices of the original plain-layout matrices depicts the VNNI transformation (as it helps visualize where each element would end up post-transformation), and helps understand that for that particular example, elements in each column of the VNNI-transformed matrix would be owned by a different work-item.
Please note that this representation is also similar to that of the plain layout load in the existing documentation - to a reader, the numbers in its corresponding subgroup view may seem to correspond to logical linear indices of data elements in the loaded block.
The current documentation doesn't mention that for VNNI subgroup view, the numbers correspond to the order of elements in the Xe register file.
I also added that clarification in the latest commit it in this PR.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it only adds to the confusion if the document contains representation with different meaning for the numbers
@rolandschulz , can you please point out where the current documentation mentions what the numbers mean?
Explain what the numbers mean in context of the original subgroup view for VNNI. Add link explaining VNNI layout




Summary
The current visualization of the subgroup view of the VNNI load visualization does not correspond to the logical indices of the plain-layout matrix in global memory, so added another visualization, and clarified what the existing one means.
Rendering at https://github.com/sanchitintel/cutlass/blob/vnni_load_illustration/media/docs/cpp/xe_rearchitecture.md
Background
In https://github.com/intel/sycl-tla/blob/main/media/docs/cpp/xe_rearchitecture.md#subgroup-scope-and-thread-local-data,
The first subgroup view (of plain data load) has numbers that a reader may interpret as logical indices of the elements in the matrix. VNNI layout from subgroup view for a 4x16 block has been described as:
So, even for VNNI layout, readers may construe the numbers as representing logical indices of the original plain-layout matrix. However, they represent the order of elements in an Xe general register file.
Changes
Intel Optimization Guides define VNNI layout like this (although this is for AMX, it's also true for XMX):
Using this approach, if we assume that we have a 4x16 array whose values are representative of logical indices in a contiguous order (i.e.
*(array.data_ptr() + x) = x, forxbetween 0 and 63), then with the following code, we see a different visualization than in the documentation for VNNI layout:Thanks!