You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: third_party/intel/include/Dialect/TritonIntelGPU/IR/TritonIntelGPUAttrDefs.td
+60-51Lines changed: 60 additions & 51 deletions
Original file line number
Diff line number
Diff line change
@@ -26,15 +26,16 @@ The encoding is characterized by parameters:
26
26
- `opsPerChannel` 4 for 8 bit scalar type, 2 for 16 bit scalar type, 1 for 32 bit scalar type.
27
27
- `warpsPerCTA` indicates the distribution of the warps in the block. The order is [1, 0] for rank 2.
28
28
- `repCluster` indicates the cluster size of the repetitions of the DPAS tile.
29
-
- `sugGroupSize` Currently only sub group size 16 is supported.
29
+
- `threadsPerWarp_` AKA threadsPerWarp, use the name threadsPerWarp_ to avoid conflicting
30
+
with the `getThreadsPerWarp` in interface DistributedLayout. Currently only 16 is supported.
30
31
31
32
The values of the matrix is distributed across the threads in the subgroup as row-major order.
32
-
- If the column size of the matrix is equal to the number of threads in the subgroup, a single value name represents a single rows of the matrix.
33
-
- If the column size of the matrix is less than the number of threads in the subgroup, a single value name represents multiple rows of the matrix.
34
-
- If the column size of the matrix is larger than the number of the threads in the subgroup, a single row of the matrix requires multiple value name.
33
+
- If the column size of the matrix is equal to the number of threads in the subgroup, one scalar represents one row of the matrix in register.
34
+
- If the column size of the matrix is less than the number of threads in the subgroup, one scalar represents multiple rows of the matrix in register.
35
+
- If the column size of the matrix is larger than the number of the threads in the subgroup, one scalar represents partial row of the matrix in register.
35
36
36
37
Example 1, the column size of the matrix is 16 and the number of threads in the subgroup is 16.
37
-
The DPAS encoding of repeatCount=8, systolicDepth=8, executionSize=16, opsPerChannel=2 and sugGroupSize=16.
38
+
The DPAS encoding of repeatCount=8, systolicDepth=8, executionSize=16, opsPerChannel=2 and threadsPerWarp=16.
along the row (resp. col) dimension. And the repetitions are clustered of the size of repCluster to optimize the memory accessing.
126
+
127
+
Suppose we have a `tt.dot` operation of the block size [64, 128] = [64, 32] * [32, 128] of f16/bf16. And its input tensor layout is defined as follows:
0 commit comments