1
1
"""
2
2
Chain(layers...)
3
+
3
4
Chain multiple layers / functions together, so that they are called in sequence
4
5
on a given input.
6
+
5
7
`Chain` also supports indexing and slicing, e.g. `m[2]` or `m[1:end-1]`.
6
8
`m[1:3](x)` will calculate the output of the first three layers.
9
+
7
10
# Examples
8
11
```jldoctest
9
12
julia> m = Chain(x -> x^2, x -> x+1);
13
+
10
14
julia> m(5) == 26
11
15
true
16
+
12
17
julia> m = Chain(Dense(10, 5), Dense(5, 2));
18
+
13
19
julia> x = rand(10);
20
+
14
21
julia> m(x) == m[2](m[1](x))
15
22
true
16
23
```
@@ -63,30 +70,40 @@ extraChain(::Tuple{}, x) = ()
63
70
"""
64
71
Dense(in, out, σ=identity; bias=true, init=glorot_uniform)
65
72
Dense(W::AbstractMatrix, [bias, σ])
73
+
66
74
Create a traditional `Dense` layer, whose forward pass is given by:
75
+
67
76
y = σ.(W * x .+ bias)
77
+
68
78
The input `x` should be a vector of length `in`, or batch of vectors represented
69
79
as an `in × N` matrix, or any array with `size(x,1) == in`.
70
80
The out `y` will be a vector of length `out`, or a batch with
71
81
`size(y) == (out, size(x)[2:end]...)`
82
+
72
83
Keyword `bias=false` will switch off trainable bias for the layer.
73
84
The initialisation of the weight matrix is `W = init(out, in)`, calling the function
74
85
given to keyword `init`, with default [`glorot_uniform`](@doc Flux.glorot_uniform).
75
86
The weight matrix and/or the bias vector (of length `out`) may also be provided explicitly.
87
+
76
88
# Examples
77
89
```jldoctest
78
90
julia> d = Dense(5, 2)
79
91
Dense(5, 2)
92
+
80
93
julia> d(rand(Float32, 5, 64)) |> size
81
94
(2, 64)
95
+
82
96
julia> d(rand(Float32, 5, 1, 1, 64)) |> size # treated as three batch dimensions
83
97
(2, 1, 1, 64)
98
+
84
99
julia> d1 = Dense(ones(2, 5), false, tanh) # using provided weight matrix
85
100
Dense(5, 2, tanh; bias=false)
101
+
86
102
julia> d1(ones(5))
87
103
2-element Array{Float64,1}:
88
104
0.9999092042625951
89
105
0.9999092042625951
106
+
90
107
julia> Flux.params(d1) # no trainable bias
91
108
Params([[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]])
92
109
```
@@ -142,10 +159,14 @@ end
142
159
"""
143
160
Diagonal(α, β)
144
161
Diagonal(size::Integer...)
162
+
145
163
Create an element-wise linear layer, which performs
164
+
146
165
y = α .* x .+ β
166
+
147
167
The learnable arrays are initialised `α = ones(Float32, size)` and
148
168
`β = zeros(Float32, size)`.
169
+
149
170
Used by [`LayerNorm`](@ref).
150
171
"""
151
172
struct Diagonal{T}
179
200
180
201
"""
181
202
Maxout(over)
203
+
182
204
The [Maxout](https://arxiv.org/abs/1302.4389) layer has a number of
183
205
internal layers which all receive the same input. It returns the elementwise
184
206
maximum of the internal layers' outputs.
207
+
185
208
Maxout over linear dense layers satisfies the univeral approximation theorem.
186
209
"""
187
210
struct Maxout{FS<: Tuple }
@@ -190,15 +213,20 @@ end
190
213
191
214
"""
192
215
Maxout(f, n_alts)
216
+
193
217
Construct a Maxout layer over `n_alts` instances of the layer given by `f`.
194
218
The function takes no arguments and should return some callable layer.
195
219
Conventionally, this is a linear dense layer.
220
+
196
221
# Examples
222
+
197
223
This constructs a `Maxout` layer over 4 internal dense linear layers, each
198
224
identical in structure (784 inputs, 128 outputs):
199
225
```jldoctest
200
226
julia> insize = 784;
227
+
201
228
julia> outsize = 128;
229
+
202
230
julia> Maxout(()->Dense(insize, outsize), 4);
203
231
```
204
232
"""
@@ -215,19 +243,25 @@ end
215
243
216
244
"""
217
245
SkipConnection(layer, connection)
246
+
218
247
Create a skip connection which consists of a layer or `Chain` of consecutive
219
248
layers and a shortcut connection linking the block's input to the output
220
249
through a user-supplied 2-argument callable. The first argument to the callable
221
250
will be propagated through the given `layer` while the second is the unchanged,
222
251
"skipped" input.
252
+
223
253
The simplest "ResNet"-type connection is just `SkipConnection(layer, +)`.
224
254
Here is a more complicated example:
225
255
```jldoctest
226
256
julia> m = Conv((3,3), 4 => 7, pad=(1,1));
257
+
227
258
julia> x = ones(Float32, 5, 5, 4, 10);
259
+
228
260
julia> size(m(x)) == (5, 5, 7, 10)
229
261
true
262
+
230
263
julia> sm = SkipConnection(m, (mx, x) -> cat(mx, x, dims=3));
264
+
231
265
julia> size(sm(x)) == (5, 5, 11, 10)
232
266
true
233
267
```
@@ -250,32 +284,45 @@ end
250
284
"""
251
285
Bilinear(in1, in2, out, σ=identity; bias=true, init=glorot_uniform)
252
286
Bilinear(W::AbstractArray, [bias, σ])
287
+
253
288
Creates a Bilinear layer, which operates on two inputs at the same time.
254
289
Its output, given vectors `x` & `y`, is another vector `z` with,
255
290
for all `i ∈ 1:out`:
291
+
256
292
z[i] = σ(x' * W[i,:,:] * y + bias[i])
293
+
257
294
If `x` and `y` are matrices, then each column of the output `z = B(x, y)` is of this form,
258
295
with `B` a Bilinear layer.
296
+
259
297
If `y` is not given, it is taken to be equal to `x`, i.e. `B(x) == B(x, x)`
298
+
260
299
The two inputs may also be provided as a tuple, `B((x, y)) == B(x, y)`,
261
300
which is accepted as the input to a `Chain`.
301
+
262
302
The initialisation works as for [`Dense`](@ref) layer, with `W = init(out, in1, in2)`.
263
303
By default the bias vector is `zeros(Float32, out)`, option `bias=false` will switch off
264
304
trainable bias. Either of these may be provided explicitly.
305
+
265
306
# Examples
266
307
```jldoctest
267
308
julia> x, y = randn(Float32, 5, 32), randn(Float32, 5, 32);
309
+
268
310
julia> B = Flux.Bilinear(5, 5, 7);
311
+
269
312
julia> B(x) |> size # interactions based on one input
270
313
(7, 32)
314
+
271
315
julia> B(x,y) == B((x,y)) # two inputs, may be given as a tuple
272
316
true
317
+
273
318
julia> sc = SkipConnection(
274
319
Chain(Dense(5, 20, tanh), Dense(20, 9, tanh)),
275
320
Flux.Bilinear(9, 5, 3, bias=false),
276
321
); # used as the recombinator, with skip as the second input
322
+
277
323
julia> sc(x) |> size
278
324
(3, 32)
325
+
279
326
julia> Flux.Bilinear(rand(4,8,16), false, tanh) # first dim of weight is the output
280
327
Bilinear(8, 16, 4, tanh, bias=false)
281
328
```
@@ -329,17 +376,22 @@ end
329
376
330
377
"""
331
378
Parallel(connection, layers...)
379
+
332
380
Create a 'Parallel' layer that passes an input array to each path in
333
381
`layers`, reducing the output with `connection`.
382
+
334
383
Called with one input `x`, this is equivalent to `reduce(connection, [l(x) for l in layers])`.
335
384
If called with multiple inputs, they are `zip`ped with the layers, thus `Parallel(+, f, g)(x, y) = f(x) + g(y)`.
385
+
336
386
# Examples
337
387
```jldoctest
338
388
julia> model = Chain(Dense(3, 5),
339
389
Parallel(vcat, Dense(5, 4), Chain(Dense(5, 7), Dense(7, 4))),
340
390
Dense(8, 17));
391
+
341
392
julia> size(model(rand(3)))
342
393
(17,)
394
+
343
395
julia> model = Parallel(+, Dense(10, 2), Dense(5, 2))
344
396
Parallel(+, Dense(10, 2), Dense(5, 2))
345
397
julia> size(model(rand(10), rand(5)))
@@ -366,4 +418,4 @@ function Base.show(io::IO, m::Parallel)
366
418
print (io, " Parallel(" , m. connection, " , " )
367
419
join (io, m. layers, " , " )
368
420
print (io, " )" )
369
- end
421
+ end
0 commit comments