Skip to content

Conversation

@sanchitintel
Copy link

@sanchitintel sanchitintel commented Oct 20, 2025

Summary

The current visualization of the subgroup view of the VNNI load visualization does not correspond to the logical indices of the plain-layout matrix in global memory, so added another visualization, and clarified what the existing one means.

Rendering at https://github.com/sanchitintel/cutlass/blob/vnni_load_illustration/media/docs/cpp/xe_rearchitecture.md

Background

In https://github.com/intel/sycl-tla/blob/main/media/docs/cpp/xe_rearchitecture.md#subgroup-scope-and-thread-local-data,

The first subgroup view (of plain data load) has numbers that a reader may interpret as logical indices of the elements in the matrix. VNNI layout from subgroup view for a 4x16 block has been described as:

image

So, even for VNNI layout, readers may construe the numbers as representing logical indices of the original plain-layout matrix. However, they represent the order of elements in an Xe general register file.

Changes

  1. Added clarification on the existing VNNI layout visualization.
  2. Added a new visualization that construes the numbers in subgroup views as being logical indices of data in the plain (pre-transformation) matrix.

Intel Optimization Guides define VNNI layout like this (although this is for AMX, it's also true for XMX):

Image

Using this approach, if we assume that we have a 4x16 array whose values are representative of logical indices in a contiguous order (i.e. *(array.data_ptr() + x) = x, for x between 0 and 63), then with the following code, we see a different visualization than in the documentation for VNNI layout:

import torch

k = 4
n = 16

a = torch.arange(k * n).view(k, n).contiguous()
print("plain format")
print(a)

b = torch.zeros(k / 2, n, 2).to(torch.long)

for i in range(k):
  for j in range(n):
    b[i // 2][j][i % 2] = a[i][j]
print("VNNI format")
print(b.view(4, 16))
Image

Thanks!

The current visualization of the subgroup view of the VNNI load visualization does not correspond to the actual values, so it might be confusing.
@sanchitintel sanchitintel requested a review from petercad October 20, 2025 23:00
@petercad
Copy link

@sanchitintel The existing illustration is correct, though I think there might be some confusion about what the numbers mean. The numbers are indicating the order of data elements in registers, and the positions are always consistent with their logical positions in the matrix.

I think it may be confusing for readers switching from the logical view to a subgroup "TV view" as in this PR.

@sanchitintel
Copy link
Author

sanchitintel commented Oct 21, 2025

Thanks for reviewing, @petercad!

The numbers are indicating the order of data elements in registers, and the positions are always consistent with their logical positions in the matrix

Sorry, I'm confused. Can you please elaborate with an example? Thanks!

it may be confusing for readers switching from the logical view to a subgroup "TV view" as in this PR

The current documentation already has a TV view on the right side of subgroup view.

To clarify, I mean that if the numbers correspond to logical coordinates in the plain-layout matrix, then the following VNNI-transformation is incorrect:

$$\begin{array}{c} \text{Plain layout}\\\ \begin{array}{cccccc} 0 & 1 & 2 & 3 & \cdots & 15\\\ 16 & 17 & 18 & 19 & \cdots & 31\\\ 32 & 33 & 34 & 35 & \cdots & 47\\\ 48 & 49 & 50 & 51 & \cdots & 63 \end{array} \end{array} \rightarrow \begin{array}{c} \text{After VNNI transformation (subgroup view in the current documentation)}\\\ \begin{array}{cccccc} 0 & 2 & 4 & 6 & \cdots & 30\\\ 1 & 3 & 5 & 7 & \cdots & 31\\\ 32 & 34 & 36 & 38 & \cdots & 62\\\ 33 & 35 & 37 & 39 & \cdots & 63 \end{array} \end{array}$$

It should instead be:

$$\begin{array}{c} \text{Plain layout}\\\ \begin{array}{cccccc} 0 & 1 & 2 & 3 & \cdots & 15\\\ 16 & 17 & 18 & 19 & \cdots & 31\\\ 32 & 33 & 34 & 35 & \cdots & 47\\\ 48 & 49 & 50 & 51 & \cdots & 63 \end{array} \end{array} \rightarrow \begin{array}{c} \text{Subgroup view of data in registers after VNNI transformation that happened during the load}\\\ \begin{array}{cccccc} 0 & 16 & 1 & 17 & \cdots & 7 & 23\\\ 8 & 24 & 9 & 25 & \cdots & 15 & 31\\\ 32 & 48 & 33 & 49 & \cdots & 39 & 55\\\ 40 & 56 & 41 & 57 & \cdots & 47 & 63 \end{array} \end{array}$$

@sanchitintel
Copy link
Author

sanchitintel commented Oct 21, 2025

If the numbers in matrices correspond to logical linear indices of the original plain-layout tensor, then the VNNI subgroup view matrix in the documentation is actually the mutual inverse permutation of what would have been the subgroup view with VNNI layout, and vice-versa:

i.e. if val = vnni_subgroup_view_in_documentation[idx], then actual_subgroup_view_in_vnni_layout[val] == idx is True, and vice-versa.

image image

The latter matrix (not in current documentation), however, is the VNNI layout corresponding to a plain format tensor, with values corresponding to indices of the plain-layout tensor that were shuffled to that specific index.

But the values of the subgroup view in the documentation don't seem to have a direct relation with the linear indices of the plain-format matrix that was transformed.

Copy link

@rolandschulz rolandschulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description needs to be updated.

An individual DPAS atom's A matrix follows the same pattern, with height ranging from 1 to 8, and width equal to 8 (tf32), 16 (f16/bf16), or 32 (s8/u8). The DPAS C matrix is also organized this way, except that its width is always 16.

As a more complicated example, let's consider a 16-bit VNNI load, with height = 4, width = 16:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the explanation of how to understand this representation which helped you? I think it would also be good to include (/link to) a good explanation of the VNNI format. Such as:
image

Copy link
Author

@sanchitintel sanchitintel Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing!

Peter clarified that the numbers in subgroup view in the current documentation represent the order of elements in the Xe register file. I added that information in the latest commit in this PR.

I think it would also be good to include (/link to) a good explanation of the VNNI format

I added a link to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc#example-2-16-bit-elements that describes VNNI layout.

\end{array}
```

If we instead assume that the values in the VNNI-transformed matrix below refer to the corresponding indices of the original plain layout matrix,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it would be helpful to have more than one different representation? I think it only adds to the confusion if the document contains representation with different meaning for the numbers. If this is needed this should either use a different syntax or clear explanation of the different representation.

Copy link
Author

@sanchitintel sanchitintel Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The representation which entails using subgroup-view values as the logical indices of the original plain-layout matrices depicts the VNNI transformation (as it helps visualize where each element would end up post-transformation), and helps understand that for that particular example, elements in each column of the VNNI-transformed matrix would be owned by a different work-item.

Please note that this representation is also similar to that of the plain layout load in the existing documentation - to a reader, the numbers in its corresponding subgroup view may seem to correspond to logical linear indices of data elements in the loaded block.

The current documentation doesn't mention that for VNNI subgroup view, the numbers correspond to the order of elements in the Xe register file.
I also added that clarification in the latest commit it in this PR.

Thanks!

Copy link
Author

@sanchitintel sanchitintel Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only adds to the confusion if the document contains representation with different meaning for the numbers

@rolandschulz , can you please point out where the current documentation mentions what the numbers mean?

Explain what the numbers mean in context of the original subgroup view for VNNI.

Add link explaining VNNI layout
@sanchitintel sanchitintel changed the title [DOCS] Update VNNI load visualization [DOCS] Clarify existing VNNI load visualization and add another Oct 21, 2025
@sanchitintel
Copy link
Author

sanchitintel commented Oct 21, 2025

@petercad @rolandschulz,

This is the updated portion -

image

Please advise if it needs further revision. Thanks!

@sanchitintel sanchitintel requested a review from ratnampa October 23, 2025 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants