You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The XeGPU dialect supports lowering from [XeTile dialects]{./XeTile.md}. The tile-based XeTile operation can be further decomposed to multiple XeGPU ops. For example, XeTile.load_tile operation is lowered to XeGPU’s load_nd or load_gather operations. Compared with the XeTile dialect, the XeGPU dialect works with even smaller matrix sizes, since XeGPU operations map to one hardware instruction in most cases.
@@ -253,7 +253,7 @@ Attributes `L1_hint`, `L2_hint`, and `L3_hint` can be applied to prefetch.
253
253
XeGPU.atomic_rmw reuses the arith dialect attribute, ::mlir::arith::AtomicRMWKindAttr.
254
254
In case that certain Xe GPU target does not support atomic operation for a certain data type, the user needs to convert the matrix to the supported datatype to perform the atomic operation.
255
255
256
-
alloc_nbarrier allocates a set of named barriers with the specified number. Named barrier is workgroup level resource, shared by all subgroups.
256
+
`alloc_nbarrier` allocates a set of named barriers with the specified number. Named barrier is workgroup level resource, shared by all subgroups.
257
257
```mlir
258
258
XeGPU.alloc_nbarrier %total_nbarrier_num: i8
259
259
```
@@ -271,19 +271,18 @@ alloc_nbarrier allocates a set of named barriers with the specified number. Name
271
271
XeGPU.nbarrier_wait %nbarrier
272
272
```
273
273
274
-
`mfence` synchronizes the memory access between write and following read or write.
274
+
`fence` synchronizes the memory access between write and following read or write.
Attribute `Fence_op` describes the operations associated with the fence, the current value is limited to {"none"}.
279
-
Attribute `Fence_scope` describes the scope of fence. "local" means that the scope would be within each XeCore. "tile" means the scope would be across XeCore with one tile.
280
-
Attribute `Memory_kind` describes the memory kind. "ugm" means the global memory, "slm" means the share local memory.
278
+
Attribute `scope` describes the scope of fence. "workgroup" means that the scope is within each work group. "gpu" means the scope is across work groups within the gpu.
279
+
Attribute `Memory_kind` describes the memory kind. "global" means the global memory, "shared" means the shared local memory.
281
280
282
281
`compile_hint` passes performance hints to the lower-level compiler. The schedule_barrier hint prevents instructions from being reordered by a lower-level compiler. For example, a prefetch instruction is location-sensitive, but the lower-level compiler may schedule it to an undesired location.
283
282
```mlir
284
283
XeGPU.compile_hint {hint=schedule_barrier}
285
284
```
286
-
nbarrrier, mfence, and compile_hint operations lower to uniform instructions, so there is no need to specify the sg_map or VC mode.
285
+
nbarrier, fence, and compile_hint operations lower to uniform instructions, so there is no need to specify the sg_map or VC mode.
0 commit comments