[DOCS] Clarify existing VNNI load visualization and add another #571

sanchitintel · 2025-10-20T23:00:04Z

Summary

The current visualization of the subgroup view of the VNNI load visualization does not correspond to the logical indices of the plain-layout matrix in global memory, so added another visualization, and clarified what the existing one means.

Rendering at https://github.com/sanchitintel/cutlass/blob/vnni_load_illustration/media/docs/cpp/xe_rearchitecture.md

Background

In https://github.com/intel/sycl-tla/blob/main/media/docs/cpp/xe_rearchitecture.md#subgroup-scope-and-thread-local-data,

The first subgroup view (of plain data load) has numbers that a reader may interpret as logical indices of the elements in the matrix. VNNI layout from subgroup view for a 4x16 block has been described as:

So, even for VNNI layout, readers may construe the numbers as representing logical indices of the original plain-layout matrix. However, they represent the order of elements in an Xe general register file.

Changes

Added clarification on the existing VNNI layout visualization.
Added a new visualization that construes the numbers in subgroup views as being logical indices of data in the plain (pre-transformation) matrix.

Intel Optimization Guides define VNNI layout like this (although this is for AMX, it's also true for XMX):

Using this approach, if we assume that we have a 4x16 array whose values are representative of logical indices in a contiguous order (i.e. *(array.data_ptr() + x) = x, for x between 0 and 63), then with the following code, we see a different visualization than in the documentation for VNNI layout:

import torch

k = 4
n = 16

a = torch.arange(k * n).view(k, n).contiguous()
print("plain format")
print(a)

b = torch.zeros(k / 2, n, 2).to(torch.long)

for i in range(k):
  for j in range(n):
    b[i // 2][j][i % 2] = a[i][j]
print("VNNI format")
print(b.view(4, 16))

Thanks!

The current visualization of the subgroup view of the VNNI load visualization does not correspond to the actual values, so it might be confusing.

petercad · 2025-10-20T23:44:38Z

@sanchitintel The existing illustration is correct, though I think there might be some confusion about what the numbers mean. The numbers are indicating the order of data elements in registers, and the positions are always consistent with their logical positions in the matrix.

I think it may be confusing for readers switching from the logical view to a subgroup "TV view" as in this PR.

sanchitintel · 2025-10-21T00:11:24Z

Thanks for reviewing, @petercad!

The numbers are indicating the order of data elements in registers, and the positions are always consistent with their logical positions in the matrix

Sorry, I'm confused. Can you please elaborate with an example? Thanks!

it may be confusing for readers switching from the logical view to a subgroup "TV view" as in this PR

The current documentation already has a TV view on the right side of subgroup view.

To clarify, I mean that if the numbers correspond to logical coordinates in the plain-layout matrix, then the following VNNI-transformation is incorrect:

$$\begin{array}{c} \text{Plain layout}\\\ \begin{array}{cccccc} 0 & 1 & 2 & 3 & \cdots & 15\\\ 16 & 17 & 18 & 19 & \cdots & 31\\\ 32 & 33 & 34 & 35 & \cdots & 47\\\ 48 & 49 & 50 & 51 & \cdots & 63 \end{array} \end{array} \rightarrow \begin{array}{c} \text{After VNNI transformation (subgroup view in the current documentation)}\\\ \begin{array}{cccccc} 0 & 2 & 4 & 6 & \cdots & 30\\\ 1 & 3 & 5 & 7 & \cdots & 31\\\ 32 & 34 & 36 & 38 & \cdots & 62\\\ 33 & 35 & 37 & 39 & \cdots & 63 \end{array} \end{array}$$

It should instead be:

$$\begin{array}{c} \text{Plain layout}\\\ \begin{array}{cccccc} 0 & 1 & 2 & 3 & \cdots & 15\\\ 16 & 17 & 18 & 19 & \cdots & 31\\\ 32 & 33 & 34 & 35 & \cdots & 47\\\ 48 & 49 & 50 & 51 & \cdots & 63 \end{array} \end{array} \rightarrow \begin{array}{c} \text{Subgroup view of data in registers after VNNI transformation that happened during the load}\\\ \begin{array}{cccccc} 0 & 16 & 1 & 17 & \cdots & 7 & 23\\\ 8 & 24 & 9 & 25 & \cdots & 15 & 31\\\ 32 & 48 & 33 & 49 & \cdots & 39 & 55\\\ 40 & 56 & 41 & 57 & \cdots & 47 & 63 \end{array} \end{array}$$

sanchitintel · 2025-10-21T02:59:28Z

If the numbers in matrices correspond to logical linear indices of the original plain-layout tensor, then the VNNI subgroup view matrix in the documentation is actually the mutual inverse permutation of what would have been the subgroup view with VNNI layout, and vice-versa:

i.e. if val = vnni_subgroup_view_in_documentation[idx], then actual_subgroup_view_in_vnni_layout[val] == idx is True, and vice-versa.

The latter matrix (not in current documentation), however, is the VNNI layout corresponding to a plain format tensor, with values corresponding to indices of the plain-layout tensor that were shuffled to that specific index.

But the values of the subgroup view in the documentation don't seem to have a direct relation with the linear indices of the plain-format matrix that was transformed.

rolandschulz

The PR description needs to be updated.

rolandschulz · 2025-10-21T17:59:29Z

media/docs/cpp/xe_rearchitecture.md

 An individual DPAS atom's A matrix follows the same pattern, with height ranging from 1 to 8, and width equal to 8 (tf32), 16 (f16/bf16), or 32 (s8/u8). The DPAS C matrix is also organized this way, except that its width is always 16.

 As a more complicated example, let's consider a 16-bit VNNI load, with height = 4, width = 16:
+


Could you add the explanation of how to understand this representation which helped you? I think it would also be good to include (/link to) a good explanation of the VNNI format. Such as:

Thanks for reviewing!

Peter clarified that the numbers in subgroup view in the current documentation represent the order of elements in the Xe register file. I added that information in the latest commit in this PR.

I think it would also be good to include (/link to) a good explanation of the VNNI format

I added a link to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc#example-2-16-bit-elements that describes VNNI layout.

rolandschulz · 2025-10-21T18:01:24Z

media/docs/cpp/xe_rearchitecture.md

    \end{array}
 ```

+If we instead assume that the values in the VNNI-transformed matrix below refer to the corresponding indices of the original plain layout matrix,


Why do you think it would be helpful to have more than one different representation? I think it only adds to the confusion if the document contains representation with different meaning for the numbers. If this is needed this should either use a different syntax or clear explanation of the different representation.

The representation which entails using subgroup-view values as the logical indices of the original plain-layout matrices depicts the VNNI transformation (as it helps visualize where each element would end up post-transformation), and helps understand that for that particular example, elements in each column of the VNNI-transformed matrix would be owned by a different work-item.

Please note that this representation is also similar to that of the plain layout load in the existing documentation - to a reader, the numbers in its corresponding subgroup view may seem to correspond to logical linear indices of data elements in the loaded block.

The current documentation doesn't mention that for VNNI subgroup view, the numbers correspond to the order of elements in the Xe register file.
I also added that clarification in the latest commit it in this PR.

Thanks!

it only adds to the confusion if the document contains representation with different meaning for the numbers

@rolandschulz , can you please point out where the current documentation mentions what the numbers mean?

Explain what the numbers mean in context of the original subgroup view for VNNI. Add link explaining VNNI layout

sanchitintel · 2025-10-21T20:16:18Z

@petercad @rolandschulz,

This is the updated portion -

Please advise if it needs further revision. Thanks!

Update VNNI load visualization

56c9ab1

The current visualization of the subgroup view of the VNNI load visualization does not correspond to the actual values, so it might be confusing.

sanchitintel requested a review from petercad October 20, 2025 23:00

Retain original & explain what the numbers mean

552df61

rolandschulz requested changes Oct 21, 2025

View reviewed changes

Add more details

cc16d0e

Explain what the numbers mean in context of the original subgroup view for VNNI. Add link explaining VNNI layout

sanchitintel requested a review from rolandschulz October 21, 2025 19:34

sanchitintel changed the title ~~[DOCS] Update VNNI load visualization~~ [DOCS] Clarify existing VNNI load visualization and add another Oct 21, 2025

Add more explanation

61757f5

sanchitintel requested a review from ratnampa October 23, 2025 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DOCS] Clarify existing VNNI load visualization and add another #571

[DOCS] Clarify existing VNNI load visualization and add another #571

Uh oh!

sanchitintel commented Oct 20, 2025 •

edited

Loading

Uh oh!

petercad commented Oct 20, 2025

Uh oh!

sanchitintel commented Oct 21, 2025 •

edited

Loading

Uh oh!

sanchitintel commented Oct 21, 2025 •

edited

Loading

Uh oh!

rolandschulz left a comment

Uh oh!

rolandschulz Oct 21, 2025

Uh oh!

sanchitintel Oct 21, 2025 •

edited

Loading

Uh oh!

rolandschulz Oct 21, 2025

Uh oh!

sanchitintel Oct 21, 2025 •

edited

Loading

Uh oh!

sanchitintel Oct 22, 2025 •

edited

Loading

Uh oh!

sanchitintel commented Oct 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		An individual DPAS atom's A matrix follows the same pattern, with height ranging from 1 to 8, and width equal to 8 (tf32), 16 (f16/bf16), or 32 (s8/u8). The DPAS C matrix is also organized this way, except that its width is always 16.

		As a more complicated example, let's consider a 16-bit VNNI load, with height = 4, width = 16:

[DOCS] Clarify existing VNNI load visualization and add another #571

Are you sure you want to change the base?

[DOCS] Clarify existing VNNI load visualization and add another #571

Uh oh!

Conversation

sanchitintel commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Uh oh!

petercad commented Oct 20, 2025

Uh oh!

sanchitintel commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanchitintel commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolandschulz left a comment

Choose a reason for hiding this comment

Uh oh!

rolandschulz Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

sanchitintel Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolandschulz Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

sanchitintel Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sanchitintel commented Oct 20, 2025 •

edited

Loading

sanchitintel commented Oct 21, 2025 •

edited

Loading

sanchitintel commented Oct 21, 2025 •

edited

Loading

sanchitintel Oct 21, 2025 •

edited

Loading

sanchitintel Oct 21, 2025 •

edited

Loading

sanchitintel Oct 22, 2025 •

edited

Loading

sanchitintel commented Oct 21, 2025 •

edited

Loading