@@ -45,18 +45,25 @@ julia> t()()
4545### When to `@thunk`?
4646When writing `rrule`s (and to a lesser exent `frule`s), it is important to `@thunk`
4747appropriately.
48- Propagation rule's that return multiple derivatives are not able to do all the computing themselves.
49- By `@thunk`ing the work required for each, they then compute only what is needed.
48+ Propagation rules that return multiple derivatives may not have all deriviatives used.
49+ By `@thunk`ing the work required for each derivative, they then compute only what is needed.
50+
51+ #### How do thunks prevent work?
52+ If we have `res = pullback(...) = @thunk(f(x)), @thunk(g(x))`
53+ then if we did `dx + res[1]` then only `f(x)` would be evaluated, not `g(x)`.
54+ Also if we did `Zero() * res[1]` then the result would be `Zero()` and `f(x)` would not be evaluated.
5055
5156#### So why not thunk everything?
5257`@thunk` creates a closure over the expression, which (effectively) creates a `struct`
5358with a field for each variable used in the expression, and call overloaded.
5459
5560Do not use `@thunk` if this would be equal or more work than actually evaluating the expression itself. Examples being:
56- - The expression wrapping something in a `struct`, such as `Adjoint(x)` or `Diagonal(x)`
5761- The expression being a constant
62+ - The expression is merely wrapping something in a `struct`, such as `Adjoint(x)` or `Diagonal(x)`
5863- The expression being itself a `thunk`
5964- The expression being from another `rrule` or `frule` (it would be `@thunk`ed if required by the defining rule already)
65+ - There is only one derivative being returned, so from the fact that the user called `frule`/`rrule`
66+ they clearly will want to use that one.
6067"""
6168struct Thunk{F} <: AbstractThunk
6269 f:: F
0 commit comments