Skip to content

Commit 58bf4f8

Browse files
authored
Simplify the named barrier (#708)
1 parent 8ab3695 commit 58bf4f8

File tree

1 file changed

+9
-10
lines changed

1 file changed

+9
-10
lines changed

docs/rfcs/XeGPU.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ Below is a summary.
2222
|store_nd | operation ::= XeGPU.store_nd $value, $tdesc attr-dict : type($value), type($tdesc) | XeGPU.store_nd %value, %tdesc2 {L1_hint = uncached, L3_hint = uncached} : vector<8x16xbf16>, tensor_desc<8x16xbf16> |
2323
|update_nd_offset | operation ::= XeGPU.update_nd_offset $tdesc, $delta0, $delta1 : type($tdesc), index, index -> type($tdesc) | %tdesc_updated = XeGpu.update_nd_offset %tdesc, %offset_x, offset_y, tensor_desc<8x16xbf16>, index, index -> tensor_desc<8x16xbf16> |
2424
|prefetch_nd | operation ::= XeGPU.prefetch_nd $tdesc, attr-dict : type($tdesc) | XeGPU.prefetch_nd %tdesc2: tensor_desc<8x16xbf16> |
25-
|alloc_nbarrier | operation ::= XeGPU.alloc_nbarrier $barrier_couter : uint8_t | XeGPU.alloc_nbarrier %nbarrier_count: Uint8_t |
26-
|create_nbarrier | operation ::= XeGPU.create_nbarrier $nbarrier_id, $nbarrier_role attr-dict : uint8_t, type($nbarrier_role) -> type($nbarrier) | %nbarrier = XeGPU.create_nbarrier %nbarrier_id, %nbarrier_role {num_producers = 2, num_consumers = 2} : Uint8_t, nbarrier_role -> !XeGPU.nbarrier |
27-
|nbarrier_arrive | operation ::= XeGPU.nbarrier_arrive $nbarrier_id : type($nbarrier) | XeGPU.nbarrier_arrive %nbarrier : !XeGPU.nbarrier |
28-
|nbarrier_wait | operation ::= XeGPU.nbarrier_wait $nbarrier_id : type($nbarrier) | XeGPU.nbarrier_wait %nbarrier : !XeGPU.nbarrier |
25+
|alloc_nbarrier | operation ::= XeGPU.alloc_nbarrier $total_barrier_num attr-dict: index | XeGPU.creat_nbarrier %total_nbarrier_num: Uint8_t |
26+
|init_nbarrier | operation ::= XeGPU.init_nbarrier $nbarrier_id, $participant_thread_num attr-dict : Uint8_t, Uint8_t -> type($nbarrier) | %nbarrier = XeGPU.alloc_nbarrier %nbarrier_id, %participant_thread_num : Uint8_t, Uint8_t -> !XeGPU.nbarrier |
27+
|nbarrier_arrive | operation ::= XeGPU.nbarrier_arrive $nbarrier : type($nbarrier) | XeGPU.nbarrier_arrive %nbarrier : !XeGPU.nbarrier |
28+
|nbarrier_wait | operation ::= XeGPU.nbarrier_wait $nbarrier : type($nbarrier) | XeGPU.nbarrier_wait %nbarrier : !XeGPU.nbarrier |
2929
|Mfence | operation ::= XeGPU.mfence attr-dict | XeGPU.mfence {fence_scope = global} |
3030
|complile-hint | operation ::= XeGPU.compile_hint attr-dict | XeGPU.compile_hint {scheduling_barrier} |
3131

@@ -253,23 +253,22 @@ Attributes `L1_hint`, `L2_hint`, and `L3_hint` can be applied to prefetch.
253253
XeGPU.atomic_rmw reuses the arith dialect attribute, ::mlir::arith::AtomicRMWKindAttr.
254254
In case that certain Xe GPU target does not support atomic operation for a certain data type, the user needs to convert the matrix to the supported datatype to perform the atomic operation.
255255

256-
alloc_nbarrier allocates named barriers. Named barrier is workgroup level resource, shared by all subgroups.
256+
alloc_nbarrier allocates a set of named barriers with the specified number. Named barrier is workgroup level resource, shared by all subgroups.
257257
```mlir
258-
XeGPU.alloc_nbarrier %nbarrier_count: i8
258+
XeGPU.alloc_nbarrier %total_nbarrier_num: i8
259259
```
260-
`create_nbarrier` assigns a role for a specific named barrier to be producer and/or consumer. The returned nbarrier object holds a description of the specified barrier, which encodes all the barrier information. It also binds the current thread with the named barrier by holding the returned nbarrier object. Multiple threads may bind to the same nbarrier so that they can sync with each other.
260+
`init_nbarrier` returns one named barrier with the specified barrier ID to the current thread. Multiple threads may bind to the same named barrier, and the input specifies the number of total participant threads. The returned nbarrier object holds a description of the specified barrier, which encodes all the barrier information.
261261
```mlir
262-
%nbarrier = XeGPU.create_nbarrier %nbarrier_id, %nbarrier_role {num_producers = 2, num_consumers = 2} : i8, i8, nbarrier_role into nbarrier
262+
%nbarrier = XeGPU.init_nbarrier %nbarrier_id, %participant_thread_num : i8, i8 into nbarrier
263263
```
264-
enum class nbarrier_role : uint8_t {producer_consumer = 0, producer = 1, consumer = 2 };
265264

266265
`nbarrier_arrive` notifies other threads sharing the same named barrier that it has arrived.
267266
```mlir
268267
XeGPU.nbarrier_arrive %nbarrier
269268
```
270269
`nbarrier_wait` waits until all other threads sharing the same named barrier have signaled the arrival.
271270
```mlir
272-
XeGPU. nbarrier_wait %nbarrier
271+
XeGPU.nbarrier_wait %nbarrier
273272
```
274273

275274
`mfence` synchronizes the memory access between write and following read or write.

0 commit comments

Comments
 (0)