intel · sanchitintel · Oct 20, 2025 · Oct 21, 2025 · Oct 21, 2025 · Oct 22, 2025
diff --git a/media/docs/cpp/xe_rearchitecture.md b/media/docs/cpp/xe_rearchitecture.md
@@ -290,6 +290,7 @@ Now that we have the basic thread mapping rule, let's apply it to a simple block
 An individual DPAS atom's A matrix follows the same pattern, with height ranging from 1 to 8, and width equal to 8 (tf32), 16 (f16/bf16), or 32 (s8/u8). The DPAS C matrix is also organized this way, except that its width is always 16.
 
 As a more complicated example, let's consider a 16-bit VNNI load, with height = 4, width = 16:
+
 ```math
     \begin{array}{c}
     \text{Subgroup view}\\
@@ -312,6 +313,43 @@ As a more complicated example, let's consider a 16-bit VNNI load, with height =
     \end{array}
 ```
 
+If we instead assume that the values in the VNNI-transformed matrix below refer to the corresponding indices of the original plain layout matrix,
+then we can view the 16-bit VNNI load from a different perspective.
+
+```math
+    \begin{array}{c}
+    \text{Subgroup view of data in global memory}\\
+    \begin{array}{cccccc}
+    0 & 1 & 2 & 3 & \cdots & 15\\
+    16 & 17 & 18 & 19 & \cdots & 31\\
+    32 & 33 & 34 & 35 & \cdots & 47\\
+    48 & 49 & 50 & 51 & \cdots & 63
+    \end{array}
+    \end{array}
+```    
+
+```math
+    \begin{array}{c}
+    \text{Subgroup view of data in registers after VNNI transformation that happened during the load}\\
+    \begin{array}{cccccc}
+    0 & 16 & 1 & 17 & \cdots & 7 & 23\\
+    8 & 24 & 9 & 25 & \cdots & 15 & 31\\
+    32 & 48 & 33 & 49 & \cdots & 39 & 55\\
+    40 & 56 & 41 & 57 & \cdots & 47 & 63
+    \end{array}
+    \end{array}
+    \rightarrow
+    \begin{array}{c}
+    \text{Thread view}\\
+    \begin{array}{cccc}
+    \text{T0V0} & \text{T1V0} & \text{T2V0} & \cdots & \text{T15V0}\\
+    \text{T0V1} & \text{T1V1} & \text{T2V1} & \cdots & \text{T15V1}\\
+    \text{T0V2} & \text{T1V2} & \text{T2V2} & \cdots & \text{T15V2}\\
+    \text{T0V3} & \text{T1V3} & \text{T2V3} & \cdots & \text{T15V3}
+    \end{array}
+    \end{array}
+```
+
 The DPAS B matrix follows the same pattern.
 
 
@@ -504,7 +542,6 @@ gemm_device(ATensor   const& A,         // (M,K)
 }
 ```
 
-
 ## New Collective MMAs
 
-... coming later!
+... coming later!