@@ -6,66 +6,79 @@ add the result to the overall loss.
6
6
7
7
For example, say we have a simple regression.
8
8
9
- ``` julia
10
- using Flux
11
- using Flux. Losses: logitcrossentropy
12
- m = Dense (10 => 5 )
13
- loss (x, y) = logitcrossentropy (m (x), y)
9
+ ``` jldoctest regularisation; setup = :(using Random; Random.seed!(0))
10
+ julia> using Flux
11
+
12
+ julia> using Flux.Losses: logitcrossentropy
13
+
14
+ julia> m = Dense(10 => 5)
15
+ Dense(10 => 5) # 55 parameters
16
+
17
+ julia> loss(x, y) = logitcrossentropy(m(x), y)
18
+ loss (generic function with 1 method)
14
19
```
15
20
16
21
We can apply L2 regularisation by taking the squared norm of the parameters , ` m.weight ` and ` m.bias ` .
17
22
18
- ``` julia
19
- penalty () = sum (abs2, m. weight) + sum (abs2, m. bias)
20
- loss (x, y) = logitcrossentropy (m (x), y) + penalty ()
23
+ ``` jldoctest regularisation
24
+ julia> penalty() = sum(abs2, m.weight) + sum(abs2, m.bias)
25
+ penalty (generic function with 1 method)
26
+
27
+ julia> loss(x, y) = logitcrossentropy(m(x), y) + penalty()
28
+ loss (generic function with 1 method)
21
29
```
22
30
23
31
When working with layers, Flux provides the ` params ` function to grab all
24
32
parameters at once. We can easily penalise everything with ` sum ` :
25
33
26
- ``` julia
34
+ ``` jldoctest regularisation
27
35
julia> Flux.params(m)
28
- 2 - element Array{Any,1 }:
29
- param ([0.355408 0.533092 ; … 0.430459 0.171498 ])
30
- param ([0.0 , 0.0 , 0.0 , 0.0 , 0.0 ])
36
+ Params([Float32[0.34704182 -0.48532376 … -0.06914271 -0.38398427; 0.5201164 -0.033709668 … -0.36169025 -0.5552353; … ; 0.46534058 0.17114447 … -0.4809643 0.04993277; -0.47049698 -0.6206029 … -0.3092334 -0.47857067], Float32[0.0, 0.0, 0.0, 0.0, 0.0]])
31
37
32
38
julia> sqnorm(x) = sum(abs2, x)
39
+ sqnorm (generic function with 1 method)
33
40
34
41
julia> sum(sqnorm, Flux.params(m))
35
- 26.01749952921026
42
+ 8.34994f0
36
43
```
37
44
38
45
Here's a larger example with a multi-layer perceptron.
39
46
40
- ``` julia
41
- m = Chain (
42
- Dense (28 ^ 2 => 128 , relu),
43
- Dense (128 => 32 , relu),
44
- Dense (32 => 10 ))
47
+ ``` jldoctest regularisation
48
+ julia> m = Chain(Dense(28^2 => 128, relu), Dense(128 => 32, relu), Dense(32 => 10))
49
+ Chain(
50
+ Dense(784 => 128, relu), # 100_480 parameters
51
+ Dense(128 => 32, relu), # 4_128 parameters
52
+ Dense(32 => 10), # 330 parameters
53
+ ) # Total: 6 arrays, 104_938 parameters, 410.289 KiB.
45
54
46
- sqnorm (x) = sum (abs2, x)
55
+ julia> sqnorm(x) = sum(abs2, x)
56
+ sqnorm (generic function with 1 method)
47
57
48
- loss (x, y) = logitcrossentropy (m (x), y) + sum (sqnorm, Flux. params (m))
58
+ julia> loss(x, y) = logitcrossentropy(m(x), y) + sum(sqnorm, Flux.params(m))
59
+ loss (generic function with 1 method)
49
60
50
- loss (rand (28 ^ 2 ), rand (10 ))
61
+ julia> loss(rand(28^2), rand(10))
62
+ 300.76693683244997
51
63
```
52
64
53
65
One can also easily add per-layer regularisation via the ` activations ` function:
54
66
55
- ``` julia
67
+ ``` jldoctest regularisation
56
68
julia> using Flux: activations
57
69
58
70
julia> c = Chain(Dense(10 => 5, σ), Dense(5 => 2), softmax)
59
- Chain (Dense (10 => 5 , σ), Dense (5 => 2 ), softmax)
71
+ Chain(
72
+ Dense(10 => 5, σ), # 55 parameters
73
+ Dense(5 => 2), # 12 parameters
74
+ NNlib.softmax,
75
+ ) # Total: 4 arrays, 67 parameters, 524 bytes.
60
76
61
77
julia> activations(c, rand(10))
62
- 3 - element Array{Any,1 }:
63
- Float32[0.84682214 , 0.6704139 , 0.42177814 , 0.257832 , 0.36255655 ]
64
- Float32[0.1501253 , 0.073269576 ]
65
- Float32[0.5192045 , 0.48079553 ]
78
+ ([0.3274892431795043, 0.5360197770386552, 0.3447464835514667, 0.5273025865532305, 0.7513168089280781], [-0.3533774181890544, -0.010937055274926138], [0.4152168057978045, 0.5847831942021956])
66
79
67
80
julia> sum(sqnorm, ans)
68
- 2.0710278f0
81
+ 1.9953131077618562
69
82
```
70
83
71
84
``` @docs
0 commit comments