@@ -27,6 +27,11 @@ its result will be passed into the function receiving the argument. If the
2727argument is * not* an [ ` DTask ` ] ( @ref ) (instead, some other type of Julia object),
2828it'll be passed as-is to the function ` f ` (with some exceptions).
2929
30+ !!! note "Task / thread occupancy"
31+ By default, ` Dagger ` assumes that tasks saturate the thread they are running on and does not try to schedule other tasks on the thread.
32+ This default can be controlled by specifying [ ` Sch.ThunkOptions ` ] ( @ref ) (more details can be found under [ Scheduler and Thunk options] ( @ref ) ).
33+ The section [ Changing the thread occupancy] ( @ref ) shows a runnable example of how to achieve this.
34+
3035## Options
3136
3237The [ ` Options ` ] (@ref Dagger.Options) struct in the second argument position is
@@ -182,7 +187,7 @@ Note that, as a legacy API, usage of the lazy API is generally discouraged for m
182187- Distinct schedulers don't share runtime metrics or learned parameters, thus causing the scheduler to act less intelligently
183188- Distinct schedulers can't share work or data directly
184189
185- ### Scheduler and Thunk options
190+ ## Scheduler and Thunk options
186191
187192While Dagger generally "just works", sometimes one needs to exert some more
188193fine-grained control over how the scheduler allocates work. There are two
@@ -215,3 +220,73 @@ Dagger.spawn(+, Dagger.Options(;single=1), 1, 2)
215220
216221delayed (+ ; single= 1 )(1 , 2 )
217222```
223+
224+ ## Changing the thread occupancy
225+
226+ One of the supported [ ` Sch.ThunkOptions ` ] ( @ref ) is the ` occupancy ` keyword.
227+ This keyword can be used to communicate that a task is not expected to fully saturate a CPU core (e.g. due to being IO-bound).
228+ The basic usage looks like this:
229+
230+ ``` julia
231+ Dagger. @spawn occupancy= Dict (Dagger. ThreadProc=> 0 ) fn
232+ ```
233+
234+ Consider the following function definitions:
235+
236+ ``` julia
237+ using Dagger
238+
239+ function inner ()
240+ sleep (0.1 )
241+ end
242+
243+ function outer_full_occupancy ()
244+ @sync for _ in 1 : 2
245+ # By default, full occupancy is assumed
246+ Dagger. @spawn inner ()
247+ end
248+ end
249+
250+ function outer_low_occupancy ()
251+ @sync for _ in 1 : 2
252+ # Here, we're explicitly telling the scheduler to assume low occupancy
253+ Dagger. @spawn occupancy= Dict (Dagger. ThreadProc => 0 ) inner ()
254+ end
255+ end
256+ ```
257+
258+ When running the first outer function N times in parallel, you should see parallelization until all threads are blocked:
259+
260+ ``` julia
261+ for N in [1 , 2 , 4 , 8 , 16 ]
262+ @time fetch .([Dagger. @spawn outer_full_occupancy () for _ in 1 : N])
263+ end
264+ ```
265+
266+ The results from the above code snippet should look similar to this (the timings will be influenced by your specific machine):
267+
268+ ``` text
269+ 0.124829 seconds (44.27 k allocations: 3.055 MiB, 12.61% compilation time)
270+ 0.104652 seconds (14.80 k allocations: 1.081 MiB)
271+ 0.110588 seconds (28.94 k allocations: 2.138 MiB, 4.91% compilation time)
272+ 0.208937 seconds (47.53 k allocations: 2.932 MiB)
273+ 0.527545 seconds (79.35 k allocations: 4.384 MiB, 0.64% compilation time)
274+ ```
275+
276+ Whereas running the outer function that communicates a low occupancy (` outer_low_occupancy ` ) should run fully in parallel:
277+
278+ ``` julia
279+ for N in [1 , 2 , 4 , 8 , 16 ]
280+ @time fetch .([Dagger. @spawn outer_low_occupancy () for _ in 1 : N])
281+ end
282+ ```
283+
284+ In comparison, the ` outer_low_occupancy ` snippet should show results like this:
285+
286+ ``` text
287+ 0.120686 seconds (44.38 k allocations: 3.070 MiB, 13.00% compilation time)
288+ 0.105665 seconds (15.40 k allocations: 1.072 MiB)
289+ 0.107495 seconds (28.56 k allocations: 1.940 MiB)
290+ 0.109904 seconds (55.03 k allocations: 3.631 MiB)
291+ 0.117239 seconds (87.95 k allocations: 5.372 MiB)
292+ ```
0 commit comments