@@ -183,53 +183,54 @@ def XeGPU_LayoutAttr : XeGPUAttr<"Layout", "layout"> {
183183 1-dimensional layout. The first dimension in the order list is the fastest-changing dimension. If it
184184 is not present, the default value is [1, 0].
185185
186- ### Examples:
187- 1. Subgroup level layout:
188- ```mlir
189- #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
190- ```
191- In this example, there are 16 work-items per subgroup, and is organized as
192- [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
193-
194- 2. Subgroup level layout with order:
195- ```mlir
196- #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
197- ```
198- In this example, there are 16 work-items per subgroup, and is organized as
199- [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
200-
201- 3. Subgroup level layout with inst_data
202- ```mlir
203- #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
204- ```
205- In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
206- which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
207- work-item is assigned four 2x2 blocks in a round-robin manner.
208-
209- 4. Workgroup level layout:
210- ```mlir
211- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
212- ```
213- In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
214- arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
215- is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
216-
217- 5. Workgroup level layout with order:
218- ```mlir
219- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
220- ```
221- In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
222- arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
223- is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
224-
225- 6. Workgroup level layout with inst_data:
226- ```mlir
227- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
228- ```
229- This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
230- each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
231- Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
232- unit may result in non-contiguous access.
186+ Examples:
187+
188+ 1. Subgroup level layout:
189+ ```mlir
190+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
191+ ```
192+ In this example, there are 16 work-items per subgroup, and is organized as
193+ [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
194+
195+ 2. Subgroup level layout with order:
196+ ```mlir
197+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
198+ ```
199+ In this example, there are 16 work-items per subgroup, and is organized as
200+ [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
201+
202+ 3. Subgroup level layout with inst_data
203+ ```mlir
204+ #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
205+ ```
206+ In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
207+ which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
208+ work-item is assigned four 2x2 blocks in a round-robin manner.
209+
210+ 4. Workgroup level layout:
211+ ```mlir
212+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
213+ ```
214+ In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
215+ arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
216+ is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
217+
218+ 5. Workgroup level layout with order:
219+ ```mlir
220+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
221+ ```
222+ In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
223+ arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
224+ is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
225+
226+ 6. Workgroup level layout with inst_data:
227+ ```mlir
228+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
229+ ```
230+ This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
231+ each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
232+ Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
233+ unit may result in non-contiguous access.
233234 }];
234235
235236 let parameters = (ins
0 commit comments