@@ -35,8 +35,7 @@ The scheduling relies on two principles:
35
35
- Virtual and Persistent Workgroups
36
36
- Atomic Counters as Sempahores
37
37
38
- # Virtual Workgroups
39
- TODO: Move this Paragraph somewhere else.
38
+ # Virtual Workgroups TODO: Move this Paragraph somewhere else.
40
39
Generally speaking, launching a new workgroup has non-trivial overhead.
41
40
42
41
Also most IHVs, especially AMD have silly limits on the ranges of dispatches (like 64k workgroups), which also apply to 1D dispatches.
@@ -55,6 +54,33 @@ for (uint virtualWorkgroupIndex=gl_GlobalInvocationID.x; virtualWorkgroupIndex<v
55
54
// do actual work for a single workgroup
56
55
}
57
56
```
57
+
58
+ This actually opens some avenues to abusing the system to achieve customized scheduling.
59
+
60
+ The GLSL and underlying spec give no guarantees and explicitly warn AGAINST assuming that a workgroup with a lower ID will begin executing
61
+ no later than a workgroup with a higher ID. Actually attempting to enforce this, such as this
62
+ ```glsl
63
+ layout() buffer coherent Sched
64
+ {
65
+ uint nextWorkgroup; // initial value is 0 before the dispatch
66
+ };
67
+
68
+ while (nextWorkgroup!=gl_GlobalInvocationID.x) {}
69
+ atomicMax(nextWorkgroup,gl_GlobalInvocationID.x+1);
70
+ ```
71
+ has the potential to deadlock and TDR your GPU.
72
+
73
+ However if you use a global counter of dispatched workgroups in an SSBO and `atomicAdd` to assign the `virtualWorkgroupIndex`
74
+ ```glsl
75
+ uint virtualWorkgroupIndex;
76
+ for ((virtualWorkgroupIndex=atomicAdd(nextWorkgroup,1u))<virtualWorkgroupCount)
77
+ {
78
+ // do actual work for a single workgroup
79
+ }
80
+ ```
81
+ the ordering of starting work is now enforced (still wont guarantee the order of completion).
82
+
83
+ # Atomic Counters as Semaphores
58
84
**/
59
85
class NBL_API CScanner final : public core::IReferenceCounted
60
86
{
0 commit comments