You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
julia>l2reg(model) =sum([sum(abs2,p) for p intrainables(model)]);
314
+
julia>l2reg(model) =sum([sum(abs2,p) for p intrainables(model)]);
315
315
316
316
julia> g =gradient(l2reg, model)[1];
317
317
```
318
318
Notice that the `BatchNorm` layer has two trainable parameters, `γ` and `β`, which are included in the list, while the `μ ` and `σ²` buffers are not.
319
+
320
+
Sometimes one wants to iterate over all trainable parameters in a model and the corresponding parameters of a matched structure such a gradient or the moving average of the model.
321
+
This can be done using `trainables(model, path=true)`. For instance, here is how to update the parameters
322
+
of a moving average model with the parameters of the model:
323
+
324
+
```julia
325
+
for (kp, p_avg) intrainables(model_avg, path=true)
326
+
p =getkeypath(model, kp)
327
+
p_avg .=0.99.* p_avg .+0.01.* p
328
+
end
329
+
```
330
+
331
+
## Incomplete or nothing gradients
332
+
333
+
If the gradient is not available for some parameters, or branches of the model,
334
+
`update` will not take an optimisation step for those parameters.
335
+
This is the case when the gradient is `nothing` or a subtype of `ChainRules.AbstractZero`.
336
+
337
+
For stateful optimisers, skipping an update it is generaly not the same as updating with a zero gradient.
338
+
For example, in the case of Adam, the momentum and variance are updated even if the gradient is zero:
0 commit comments