Skip to content

Commit f6515e3

Browse files
author
Pedro Paulo
committed
Final Chapter
1 parent d181a34 commit f6515e3

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

class12/class12.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,14 +137,14 @@ Let:
137137
- $\mathcal G^\dagger$ is the composition of non-linear operators: $\mathcal G^\dagger=S_1\circ \text{...} \circ S_L$
138138
- In the linear case, as described before, $S_1 = K_\mathcal X$, $S_L = K_\mathcal Y$ and they're connected through multiple $\varphi_j$.
139139
The above definition *looks a lot* like the typical definition of NNs, where each one of the $S_l$ is a layer of your NN. And, as we're going to see, it is! At least it is a generalization of the definition of NN to function space.
140-
\[cite] *et al.* proposed to create each one of this $S_l$ as follows:
140+
[9] proposed to create each one of this $S_l$ as follows:
141141
```math
142142
S_l(a)(x) = \sigma_l\bigg( W_la(x) + b_l + \int_\Omega\mathrm dz \ \kappa_l(x,z)a(z) \bigg), \ \ \ \ x \in \Omega
143143
```
144144
where:
145145
- $\sigma_l:\mathbb R^k\rightarrow\mathbb R^k$ is the non-linear activation function.
146146
- $W_l\in\mathbb R^k$ is a term related to a "residual network".
147-
- This term is not necessary for convergence, but it's credited to help with convergence speed \[cite].
147+
- This term is not necessary for convergence, but it's credited to help with convergence speed.
148148
- $b_l\in\mathbb R^k$ is the bias term.
149149
- $\kappa_l:\Omega\times\Omega\rightarrow\mathbb R^k$ is the kernel function.
150150

@@ -197,7 +197,7 @@ where $W_\kappa$ are the (trainable) weights for the kernel, and $j$ represents
197197
We can see this "low-pass filter" behavior of the kernel represented on the "zoom" of the general diagram (b), where the high frequencies vanish, while the remaining low frequencies are multiplied by a certain weight.
198198
After this "filtering" and weighting, we apply the inverse FFT get the $\mathcal F^{-1}\{\hat\kappa_l(v) \cdot\hat a(v)\}$ term.
199199

200-
Meanwhile we also have the so called "1D Convolution", represented by $W_la(x)$, with trainable $W_l$. It is not strictly necessary to be used, but it helps with convergence speed \[cite], and the (also trainable) bias term $b_l$, suppressed on the figure. The sum of all the aforementioned terms is then passed by a non-linear activation function $\sigma$, defined _a priori_.
200+
Meanwhile we also have the so called "residual network", represented by $W_la(x)$, with trainable $W_l$. It is not strictly necessary to be used, but it helps with convergence speed, and the (also trainable) bias term $b_l$, suppressed on the figure. The sum of all the aforementioned terms is then passed by a non-linear activation function $\sigma$, defined _a priori_.
201201

202202
And, finally, T (defined _a priori_) of these layers are concatenated, before being projected down by the layer **Q**, to produce the output $u(x)$.
203203

0 commit comments

Comments
 (0)