|
1 | | -""" |
| 1 | +@doc raw""" |
2 | 2 | DeepESN(in_dims::Int, |
3 | 3 | res_dims::AbstractVector{<:Int}, |
4 | 4 | out_dims, |
|
12 | 12 | state_modifiers=(), |
13 | 13 | readout_activation=identity) |
14 | 14 |
|
15 | | -Build a deep ESN: a stack of `StatefulLayer(ESNCell)` with optional per-layer |
16 | | -state modifiers, followed by a final linear readout. |
| 15 | +Deep Echo State Network (DeepESN): a stack of stateful [`ESNCell`](@ref) layers |
| 16 | +(optionally with per-layer state modifiers) followed by a linear readout. |
| 17 | +
|
| 18 | +`DeepESN` composes, for `L = length(res_dims)` layers: |
| 19 | + 1) a sequence of stateful [`ESNCell`](@ref) with widths `res_dims[ℓ]`, |
| 20 | + 2) zero or more per-layer `state_modifiers[ℓ]` applied to the layer's state, and |
| 21 | + 3) a final [`LinearReadout`](@ref) from the last layer's features to the output. |
| 22 | +
|
| 23 | +## Equations |
| 24 | +
|
| 25 | +For input `\mathbf{x}(t) ∈ \mathbb{R}^{in\_dims}`, per-layer reservoir states |
| 26 | +`\mathbf{h}^{(\ell)}(t) ∈ \mathbb{R}^{res\_dims[\ell]}` (`\ell = 1..L`), and output |
| 27 | +`\mathbf{y}(t) ∈ \mathbb{R}^{out\_dims}`: |
| 28 | +
|
| 29 | +```math |
| 30 | +\begin{aligned} |
| 31 | + \tilde{\mathbf{h}}^{(1)}(t) &= \phi_1\!\left( |
| 32 | + \mathbf{W}^{(1)}_{in}\,\mathbf{x}(t) + \mathbf{W}^{(1)}_{res}\,\mathbf{h}^{(1)}(t-1) |
| 33 | + + \mathbf{b}^{(1)}\right) \\ |
| 34 | + \mathbf{h}^{(1)}(t) &= (1-\alpha_1)\,\mathbf{h}^{(1)}(t-1) + \alpha_1\,\tilde{\mathbf{h}}^{(1)}(t) \\ |
| 35 | + \mathbf{u}^{(1)}(t) &= \mathrm{Mods}_1\!\big(\mathbf{h}^{(1)}(t)\big) \\ |
| 36 | + \tilde{\mathbf{h}}^{(\ell)}(t) &= \phi_\ell\!\left( |
| 37 | + \mathbf{W}^{(\ell)}_{in}\,\mathbf{u}^{(\ell-1)}(t) + |
| 38 | + \mathbf{W}^{(\ell)}_{res}\,\mathbf{h}^{(\ell)}(t-1) + \mathbf{b}^{(\ell)}\right), |
| 39 | + \quad \ell=2..L \\ |
| 40 | + \mathbf{h}^{(\ell)}(t) &= (1-\alpha_\ell)\,\mathbf{h}^{(\ell)}(t-1) + \alpha_\ell\,\tilde{\mathbf{h}}^{(\ell)}(t), |
| 41 | + \quad \ell=2..L \\ |
| 42 | + \mathbf{u}^{(\ell)}(t) &= \mathrm{Mods}_\ell\!\big(\mathbf{h}^{(\ell)}(t)\big), \quad \ell=2..L \\ |
| 43 | + \mathbf{y}(t) &= \rho\!\left(\mathbf{W}_{out}\,\mathbf{u}^{(L)}(t) + \mathbf{b}_{out}\right) |
| 44 | +\end{aligned} |
| 45 | +
|
| 46 | +## Where |
| 47 | +
|
| 48 | +- `\mathbf{x}(t) ∈ ℝ^{in_dims × batch}` — input at time `t`. |
| 49 | +- `\mathbf{h}^{(\ell)}(t) ∈ ℝ^{res_dims[ℓ] × batch}` — hidden state of layer `ℓ`. |
| 50 | +- `\tilde{\mathbf{h}}^{(\ell)}(t)` — candidate state before leaky mixing. |
| 51 | +- `\mathbf{u}^{(\ell)}(t)` — features after applying the `ℓ`-th `state_modifiers` (identity if none). |
| 52 | +- `\mathbf{y}(t) ∈ ℝ^{out_dims × batch}` — network output. |
| 53 | +
|
| 54 | +- `\mathbf{W}^{(\ell)}_{in} ∈ ℝ^{res_dims[ℓ] × in\_size[ℓ]}` — input matrix at layer `ℓ` |
| 55 | + (`in_size[1]=in_dims`, `in_size[ℓ]=res_dims[ℓ-1]` for `ℓ>1`). |
| 56 | +- `\mathbf{W}^{(\ell)}_{res} ∈ ℝ^{res_dims[ℓ] × res_dims[ℓ]}` — reservoir matrix at layer `ℓ`. |
| 57 | +- `\mathbf{b}^{(\ell)} ∈ ℝ^{res_dims[ℓ] × 1}` — reservoir bias (broadcast over batch), present iff `use_bias[ℓ]=true`. |
| 58 | +- `\mathbf{W}_{out} ∈ ℝ^{out_dims × res_dims[L]}` — readout matrix. |
| 59 | +- `\mathbf{b}_{out} ∈ ℝ^{out_dims × 1}` — readout bias (if used by the readout). |
| 60 | +
|
| 61 | +- `\phi_\ell` — activation of layer `ℓ` (`activation[ℓ]`, default `tanh`). |
| 62 | +- `\alpha_\ell ∈ (0,1]` — leak coefficient of layer `ℓ` (`leak_coefficient[ℓ]`). |
| 63 | +- `\mathrm{Mods}_\ell(·)` — composition of modifiers for layer `ℓ` (may be empty). |
| 64 | +- `\rho` — readout activation (`readout_activation`, default `identity`). |
| 65 | +
|
| 66 | +## Arguments |
| 67 | +
|
| 68 | + - `in_dims`: Input dimension. |
| 69 | + - `res_dims`: Vector of reservoir (hidden) dimensions per layer; its length sets the depth `L`. |
| 70 | + - `out_dims`: Output dimension. |
| 71 | + - `activation`: Reservoir activation(s). Either a single function (broadcast to all layers) |
| 72 | + or a vector/tuple of length `L`. Default: `tanh`. |
| 73 | +
|
| 74 | +## Keyword arguments |
| 75 | +
|
| 76 | +Per-layer reservoir options (passed to each [`ESNCell`](@ref)): |
| 77 | +
|
| 78 | + - `leak_coefficient`: Leak rate(s) `α_ℓ ∈ (0,1]`. Scalar or length-`L` collection. Default: `1.0`. |
| 79 | + - `init_reservoir`: Initializer(s) for `W_res^{(ℓ)}`. Scalar or length-`L`. Default: [`rand_sparse`](@ref). |
| 80 | + - `init_input`: Initializer(s) for `W_in^{(ℓ)}`. Scalar or length-`L`. Default: [`scaled_rand`](@ref). |
| 81 | + - `init_bias`: Initializer(s) for reservoir bias (used iff `use_bias[ℓ]=true`). |
| 82 | + Scalar or length-`L`. Default: [`zeros32`](@extref). |
| 83 | + - `init_state`: Initializer(s) used when an external state is not provided. |
| 84 | + Scalar or length-`L`. Default: [`randn32`](@extref). |
| 85 | + - `use_bias`: Whether each reservoir uses a bias term. Boolean scalar or length-`L`. Default: `false`. |
| 86 | +
|
| 87 | +Composition: |
| 88 | +
|
| 89 | + - `state_modifiers`: Per-layer modifier(s) applied to each layer’s state before it |
| 90 | + feeds into the next layer (and the readout for the last layer). Accepts `nothing`, |
| 91 | + a single layer, a vector/tuple of length `L`, or per-layer collections. Defaults to no modifiers. |
| 92 | + - `readout_activation`: Activation for the final linear readout. Default: `identity`. |
| 93 | +
|
| 94 | +## Inputs |
| 95 | +
|
| 96 | + - `x :: AbstractArray (in_dims, batch)` |
| 97 | +
|
| 98 | +## Returns |
| 99 | +
|
| 100 | + - Output `y :: (out_dims, batch)`. |
| 101 | + - Updated layer state (NamedTuple) containing states for all cells, modifiers, and readout. |
| 102 | +
|
| 103 | +## Parameters |
| 104 | +
|
| 105 | + - `cells :: NTuple{L,NamedTuple}` — parameters for each [`ESNCell`](@ref), including: |
| 106 | + - `input_matrix :: (res_dims[ℓ] × in_size[ℓ])` — `W_in^{(ℓ)}` |
| 107 | + - `reservoir_matrix :: (res_dims[ℓ] × res_dims[ℓ])` — `W_res^{(ℓ)}` |
| 108 | + - `bias :: (res_dims[ℓ],)` — present only if `use_bias[ℓ]=true` |
| 109 | + - `states_modifiers :: NTuple{L,Tuple}` — per-layer tuples of modifier parameters (empty tuples if none). |
| 110 | + - `readout` — parameters of [`LinearReadout`](@ref), typically: |
| 111 | + - `weight :: (out_dims × res_dims[L])` — `W_out` |
| 112 | + - `bias :: (out_dims,)` — `b_out` (if the readout uses bias) |
| 113 | +
|
| 114 | +> Exact field names for modifiers/readout follow their respective layer definitions. |
| 115 | +
|
| 116 | +## States |
| 117 | +
|
| 118 | + - `cells :: NTuple{L,NamedTuple}` — states for each [`ESNCell`](@ref). |
| 119 | + - `states_modifiers :: NTuple{L,Tuple}` — per-layer tuples of modifier states. |
| 120 | + - `readout` — states for [`LinearReadout`](@ref). |
| 121 | +
|
17 | 122 | """ |
18 | 123 | @concrete struct DeepESN <: AbstractEchoStateNetwork{(:cells, :states_modifiers, :readout)} |
19 | 124 | cells |
|
0 commit comments