Skip to content

Conversation

@mcabbott
Copy link
Member

As discussed starting here #2150 (comment) this prints an indication when layers like Dropout are set to train/test mode. But not in the default automatic state:

julia> m
Chain(
  Dense(2 => 3),                        # 9 parameters
  Dropout(0.4),
  BatchNorm(3),                         # 6 parameters, plus 6
)         # Total: 4 trainable arrays, 15 parameters,
          # plus 2 non-trainable, 6 parameters, summarysize 512 bytes.

julia> trainmode!(m)
Chain(
  Dense(2 => 3),                        # 9 parameters
  Dropout(0.4, active=true),
  BatchNorm(3, active=true),            # 6 parameters, plus 6
)         # Total: 4 trainable arrays, 15 parameters,
          # plus 2 non-trainable, 6 parameters, summarysize 513 bytes.

Adds the keyword printed to the constructor too.

Needs tests.

Comment on lines 8 to 28
testmode!(m, mode = true)
testmode!(m, inactive = true)
Set a layer or model's test mode (see below).
Using `:auto` mode will treat any gradient computation as training.
Set a layer, or all layers in a model, to test mode.
This disables the effect of [`Dropout`](@ref), and similar layers.
_Note_: if you manually set a model into test mode, you need to manually place
it back into train mode during training phase.
Possible values include:
- `false` for training
Possible values of optional 2nd argument `inactive` are:
- `true` for testing
- `:auto` or `nothing` for Flux to detect the mode automatically
- `false` for training, same as [`trainmode!`](@ref)`(m)`
- `:auto` or `nothing` for Flux to detect training automatically.
# Example
```jldoctest
julia> d = Dropout(0.3)
Dropout(0.3)
julia> testmode!(d) # dropout is now always disabled
Dropout(0.3, active=false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy with the name active? This existed as a field name, but not previously exposed.

testmode! and trainmode! both had a positional argument called mode with opposite meanings. I made these active + inactive to match.

Copy link
Member

@ToucheSir ToucheSir Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the double negative a little confusing but can't come up with a better word. For a future PR, would it make sense to add a third setactive! function and move the true/false/auto/nothing handling logic to that? Then trainmode! and testmode! lose their second arg and become shortcuts for setactive!(model, <true|false>). Either way, we could even use an enum if we're feeling fancy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awful that they both do everything. It would be OK if either accepted :auto, but never true/false. Maybe that's a deprecation goal?

There could also be a 3rd function, but two is already a lot. Or the 3rd could replace both, but that's more churn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But more immediately, your enum suggestion could read Dropout(0.5; mode=:test) and Dropout(0.5; mode=:train). That has the advantage of always being one type. It's a little more indirect -- it tells you what the layer is intended for, not what it does.

Copy link
Member

@ToucheSir ToucheSir Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be OK if either accepted :auto, but never true/false. Maybe that's a deprecation goal?

That's a good idea! Can we keep mode then? In that case, have we considered something like enabled instead of mode or (in)active?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky thing is that trainmode!(m, ::Bool) recurses to itself, and is what's overloaded by layers. Presumably some layers in packages may overload this too.

Deprecating that method and changing the recursion to use something else means that we will break any packages which rely on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe 3 functions isn't that bad after all then if it makes the deprecation path easier. PyTorch has .train(), .eval() and module.training = {true,false}.

@codecov-commenter
Copy link

codecov-commenter commented Apr 9, 2023

Codecov Report

Patch coverage: 73.52% and project coverage change: -2.70 ⚠️

Comparison is base (45dddf6) 82.79% compared to head (e2cef3d) 80.09%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2222      +/-   ##
==========================================
- Coverage   82.79%   80.09%   -2.70%     
==========================================
  Files          24       24              
  Lines        1610     1638      +28     
==========================================
- Hits         1333     1312      -21     
- Misses        277      326      +49     
Impacted Files Coverage Δ
src/deprecations.jl 36.11% <0.00%> (-1.58%) ⬇️
src/functor.jl 41.66% <69.23%> (-32.46%) ⬇️
src/layers/normalise.jl 92.76% <88.88%> (-2.41%) ⬇️

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@darsnack
Copy link
Member

We discussed this PR on call. Here is our review:

  • Add a doc section here explaining overloading testmode! for stochastic layers
  • Add forward pointers to doc section from docstring
  • Some discussion of future steps:
    • Have a single setactive!(m, mode) and just trainmode!(m) / testmode!(m)
    • This would be breaking so post-PR

@mcabbott
Copy link
Member Author

OK last commit adds a note that this is the layer API but it may change later. It does not edit the "advanced" docs page, as that's a bit of a mess and this PR has many things already. This is not a new feature here, it's just that revising it brought attention.

@mcabbott mcabbott merged commit 11c3d06 into FluxML:master Apr 25, 2023
@mcabbott mcabbott deleted the dropout2 branch April 25, 2023 01:09
rgobbel pushed a commit to rgobbel/Flux.jl that referenced this pull request Apr 25, 2023
* print the state of Dropout etc.

* add tests

* doc improvements

* simpler scheme for testmode/trainmode

* simplify active keyword a bit

* a bug

* fix tests

* Update test/layers/normalisation.jl

Co-authored-by: Brian Chen <[email protected]>

* Update src/functor.jl

* extend docstrings & warnings

---------

Co-authored-by: Brian Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants