Skip to content

Commit e56462c

Browse files
Jianhui-Lisilee2
andauthored
Update on Fence OP (#713)
* Update XeGPU.md --------- Co-authored-by: Sang Ik Lee <[email protected]>
1 parent 3da78dc commit e56462c

File tree

1 file changed

+7
-8
lines changed

1 file changed

+7
-8
lines changed

docs/rfcs/XeGPU.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Below is a summary.
2626
|init_nbarrier | operation ::= XeGPU.init_nbarrier $nbarrier_id, $participant_thread_num attr-dict : Uint8_t, Uint8_t -> type($nbarrier) | %nbarrier = XeGPU.alloc_nbarrier %nbarrier_id, %participant_thread_num : Uint8_t, Uint8_t -> !XeGPU.nbarrier |
2727
|nbarrier_arrive | operation ::= XeGPU.nbarrier_arrive $nbarrier : type($nbarrier) | XeGPU.nbarrier_arrive %nbarrier : !XeGPU.nbarrier |
2828
|nbarrier_wait | operation ::= XeGPU.nbarrier_wait $nbarrier : type($nbarrier) | XeGPU.nbarrier_wait %nbarrier : !XeGPU.nbarrier |
29-
|Mfence | operation ::= XeGPU.mfence attr-dict | XeGPU.mfence {fence_scope = global} |
29+
|fence | operation ::= XeGPU.fence attr-dict | XeGPU.fence {scope = gpu, memory_kind = global} |
3030
|complile-hint | operation ::= XeGPU.compile_hint attr-dict | XeGPU.compile_hint {scheduling_barrier} |
3131

3232
The XeGPU dialect supports lowering from [XeTile dialects]{./XeTile.md}. The tile-based XeTile operation can be further decomposed to multiple XeGPU ops. For example, XeTile.load_tile operation is lowered to XeGPU’s load_nd or load_gather operations. Compared with the XeTile dialect, the XeGPU dialect works with even smaller matrix sizes, since XeGPU operations map to one hardware instruction in most cases.
@@ -253,7 +253,7 @@ Attributes `L1_hint`, `L2_hint`, and `L3_hint` can be applied to prefetch.
253253
XeGPU.atomic_rmw reuses the arith dialect attribute, ::mlir::arith::AtomicRMWKindAttr.
254254
In case that certain Xe GPU target does not support atomic operation for a certain data type, the user needs to convert the matrix to the supported datatype to perform the atomic operation.
255255

256-
alloc_nbarrier allocates a set of named barriers with the specified number. Named barrier is workgroup level resource, shared by all subgroups.
256+
`alloc_nbarrier` allocates a set of named barriers with the specified number. Named barrier is workgroup level resource, shared by all subgroups.
257257
```mlir
258258
XeGPU.alloc_nbarrier %total_nbarrier_num: i8
259259
```
@@ -271,19 +271,18 @@ alloc_nbarrier allocates a set of named barriers with the specified number. Name
271271
XeGPU.nbarrier_wait %nbarrier
272272
```
273273

274-
`mfence` synchronizes the memory access between write and following read or write.
274+
`fence` synchronizes the memory access between write and following read or write.
275275
```mlir
276-
XeGPU.mfence {memory_kind = "ugm", fence_op = "none", fence_scope = "local"}
276+
XeGPU.fence {scope = "gpu", memory_kind = "global", }
277277
```
278-
Attribute `Fence_op` describes the operations associated with the fence, the current value is limited to {"none"}.
279-
Attribute `Fence_scope` describes the scope of fence. "local" means that the scope would be within each XeCore. "tile" means the scope would be across XeCore with one tile.
280-
Attribute `Memory_kind` describes the memory kind. "ugm" means the global memory, "slm" means the share local memory.
278+
Attribute `scope` describes the scope of fence. "workgroup" means that the scope is within each work group. "gpu" means the scope is across work groups within the gpu.
279+
Attribute `Memory_kind` describes the memory kind. "global" means the global memory, "shared" means the shared local memory.
281280

282281
`compile_hint` passes performance hints to the lower-level compiler. The schedule_barrier hint prevents instructions from being reordered by a lower-level compiler. For example, a prefetch instruction is location-sensitive, but the lower-level compiler may schedule it to an undesired location.
283282
```mlir
284283
XeGPU.compile_hint {hint=schedule_barrier}
285284
```
286-
nbarrrier, mfence, and compile_hint operations lower to uniform instructions, so there is no need to specify the sg_map or VC mode.
285+
nbarrier, fence, and compile_hint operations lower to uniform instructions, so there is no need to specify the sg_map or VC mode.
287286

288287
## Notes
289288

0 commit comments

Comments
 (0)