You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a new layout to describe the tensor layout with respect to the GPU
compute hierarchy (register, lane, warp, block). This PR introduces the
layout and adds its definition and basic functions to the Triton Intel
GPU Dialect. The conversion to Linear Layout function has been added and
unit tested through an Intel specific `LinearLayoutConversionsTest`. The
layouts are unpacked - each register is assumed to be the size of the
tensor type. However, the layout generation follows the convention
described in
https://github.khronos.org/SPIRV-Registry/extensions/INTEL/[SPV_INTEL_2d_block_io](https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_2d_block_io.html).html.
While there may be some bugs, the goal is for any valid operation
described in the SPIRV extension to be represented correctly with this
layout.
Currently the layout is unused other than for linear layout conversion
testing purposes. I plan to leave this PR in draft until I have replaced
the `block_io` attribute on the load ops with this layout - and then I
plan to replace the linear layout code I added to
`LoadStoreOpToLLVM.cpp`. That second task might prove challenging since
I think the DPAS layouts do sometimes incorporate register packing
schemes into the layout - but looking at the upstream layouts for NVIDIA
and AMD MMA, specific packing is an implementation detail and not
represented as part of the high-level layout encoding.
cc #4192
An encoding for tensors produced via Intel Subgroup 2D Block IO operations.
292
+
293
+
The subgroup 2D block IO operations read or write two-dimensional blocks of data from a two-dimensional region of memory. The Subgroup 2D Block Encoding layout is parameterized by the block width, block height, and block count for the individual load instructions and the distribution and replication of loads across warps.
294
+
295
+
The SPV_INTEL_2d_block_io extension documentation provides more information on the subgroup 2D block IO operations and parameters: https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_2d_block_io.html
296
+
297
+
For the layout, the following parameters are required:
298
+
- `instrShape` : contains the (height, width) block parameters for the block io operation
299
+
- `numBlocks` : the block count parameter allows a single load to load multiple blocks in row-major order (useful for increasing cache line utilization)
300
+
- `threadsPerWarp` : currently a scalar, this parameter allows us to support different subgroup / warp configurations. Because the 2d block io operation is a subgroup operation, the size of the subgroup is important in determining the ordering of the loaded tensor.
301
+
- `warpsPerCTA` : the number of warps per block / subgroups per workgroup and their distribution
302
+
- `order` : The order within the block, used to determine along which dimension to broadcast.
303
+
- `kWidth` : Currently unused, but keeping because we will likely need it for layout conversions.
304
+
- `CTALayout` : Describes how blocks are distributed among work-groups/thread blocks.
305
+
}];
306
+
307
+
let parameters = (
308
+
ins
309
+
ArrayRefParameter<"unsigned">:$warpsPerCTA,
310
+
"CTALayoutAttr":$CTALayout,
311
+
ArrayRefParameter<"unsigned">:$instrShape,
312
+
"unsigned":$numBlocks,
313
+
ArrayRefParameter<"unsigned">:$order,
314
+
"unsigned":$kWidth,
315
+
"unsigned":$threadsPerWarp
316
+
);
317
+
318
+
let extraClassDeclaration = extraDistributedDeclaration # [{
0 commit comments