@@ -13,7 +13,7 @@ Some parallel execution environments execute threads in groups that allow
13
13
efficient communication within the group using special primitives called
14
14
*convergent * operations. The outcome of a convergent operation is sensitive to
15
15
the set of threads that executes it "together", i.e., convergently. When control
16
- flow :ref: `diverges <convergence-and-uniformity >`, i.e. threads of the same
16
+ flow :ref: `diverges <convergence-and-uniformity >`, i.e., threads of the same
17
17
group follow different
18
18
paths through the CFG, not all threads of the group may be available to
19
19
participate in this communication. This is the defining characteristic that
@@ -41,7 +41,7 @@ In structured programming languages, there is often an intuitive and
41
41
unambiguous way of determining the threads that are expected to communicate.
42
42
However, this is not always the case even in structured programming languages,
43
43
and the intuition breaks down entirely in unstructured control flow. This
44
- document describes the formal semantics in LLVM, i.e. how to determine the set
44
+ document describes the formal semantics in LLVM, i.e., how to determine the set
45
45
of communicating threads for convergent operations.
46
46
47
47
The definitions in this document leave many details open, such as how groups of
@@ -449,15 +449,15 @@ Consider the following example:
449
449
// E
450
450
}
451
451
452
- In this program, the call to convergent_op() is lexically "inside" the ``for ``
452
+ In this program, the call to `` convergent_op() `` is lexically "inside" the ``for ``
453
453
loop. But when translated to LLVM IR, the basic block B is an exiting block
454
454
ending in a divergent branch, and the basic block C is an exit of the loop.
455
- Thus, the call to convergent_op() is outside the loop. This causes a mismatch
455
+ Thus, the call to `` convergent_op() `` is outside the loop. This causes a mismatch
456
456
between the programmer's expectation and the compiled program. The call should
457
457
be executed convergently on every iteration of the loop, by threads that
458
458
together take the branch to exit the loop. But when compiled, all threads that
459
459
take the divergent exit on different iterations first converge at the beginning
460
- of basic block C and then together execute the call to convergent_op().
460
+ of basic block C and then together execute the call to `` convergent_op() `` .
461
461
462
462
In this case, :ref: `llvm.experimental.convergence.loop
463
463
<llvm.experimental.convergence.loop>` can be used to express the desired
@@ -588,18 +588,18 @@ indirectly.
588
588
589
589
token @llvm.experimental.convergence.entry() convergent readnone
590
590
591
- This intrinsic is used to tie the dynamic instances inside of a function to
591
+ This intrinsic is used to tie the dynamic instances inside a function to
592
592
those in the caller.
593
593
594
594
1. If the function is called from outside the scope of LLVM, the convergence of
595
- dynamic instances of this intrinsic are environment-defined. For example:
595
+ dynamic instances of this intrinsic is environment-defined. For example:
596
596
597
597
a. In an OpenCL *kernel launch *, the maximal set of threads that
598
598
can communicate outside the memory model is a *workgroup *.
599
599
Hence, a suitable choice is to specify that all the threads from
600
600
a single workgroup in OpenCL execute converged dynamic instances
601
601
of this intrinsic.
602
- b. In a C/C++ program, threads are launched independently and they can
602
+ b. In a C/C++ program, threads are launched independently and can
603
603
communicate only through the memory model. Hence the dynamic instances of
604
604
this intrinsic in a C/C++ program are never converged.
605
605
2. If the function is called from a call-site in LLVM IR, then two
@@ -701,7 +701,7 @@ convergent operation in the same basic block.
701
701
702
702
token @llvm.experimental.convergence.anchor() convergent readnone
703
703
704
- This intrinsic produces an initial convergence token that is independent from
704
+ This intrinsic produces an initial convergence token that is independent of
705
705
any "outer scope". The set of threads executing converged dynamic instances of
706
706
this intrinsic is implementation-defined.
707
707
@@ -1483,7 +1483,7 @@ There is no guarantee about the value of ``%id`` in the threads where
1483
1483
hoisting ``@subgroupShuffle `` might introduce UB.
1484
1484
1485
1485
On the other hand, if ``@subgroupShuffle `` is defined such that it merely
1486
- produces an undefined value or poison as result when ``%id `` is "out of range",
1486
+ produces an undefined value or poison as a result when ``%id `` is "out of range",
1487
1487
then speculating is okay.
1488
1488
1489
1489
Even though
@@ -1502,7 +1502,7 @@ Assuming that ``%tok`` is only used inside the conditional block, the anchor can
1502
1502
be sunk. The rationale is two-fold. First, the anchor has implementation-defined
1503
1503
behavior, and the sinking is part of the implementation. Second, already in the
1504
1504
original program, the set of threads that communicates in the
1505
- ``@convergent.operation `` is automatically subset to the threads for which
1505
+ ``@convergent.operation `` is automatically a subset of the threads for which
1506
1506
``condition `` is true.
1507
1507
1508
1508
Anchors can be hoisted in acyclic control flow. For example:
0 commit comments