Skip to content

Commit 4066405

Browse files
author
Pedro Paulo
committed
Class12: 10/20 checkpoint
1 parent a71f777 commit 4066405

File tree

4 files changed

+16
-11
lines changed

4 files changed

+16
-11
lines changed

.DS_Store

0 Bytes
Binary file not shown.

class12/Figures/diagram.png

88.4 KB
Loading

class12/Figures/unetvspca.png

306 KB
Loading

class12/class12.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Key differences:
2020
- A vector is *naturally* discrete. Therefore, the input-output pair for functions are also *naturally* discrete.
2121
- A function is *naturally* continuous. Therefore, the input-output pair for operators are also *naturally* continuous.
2222

23-
It is said that Neural Networks (NN) are **universal function approximators** \[cite], in this section we're going to ~~try to~~ create the idea of **universal operator approximators**, that map functions to functions, using something called **Neural Operators**.
23+
It is said that Neural Networks (NN) are **universal function approximators** \[cite], in this section we're going to create the idea of **universal operator approximators**, that map functions to functions, using something called **Neural Operators**.
2424

2525
A NN $\mathcal N$ can be thought as a general **function** $\mathcal N: X \times \Theta \rightarrow Y$, where $X$ and $Y$ are vector spaces, and $\Theta$ is the parameter space. So we take elements $x \in X$ and we *learn* how to map those onto $y\in Y$, by means of changing the parameters $\theta \in \Theta$. That way, we can approximate any function (that's where the "universal function approximator" comes from) that maps $X \rightarrow Y$.
2626
In a similar way we can think about a Neural Operator $\mathcal G^\dagger: \mathcal X \times \Theta \rightarrow \mathcal Y$, where $\mathcal X$ and $\mathcal Y$ are function spaces, and $\Theta$ is the parameter space. Now, instead of learning how to map *vectors*, we're going to learn the mapping of *functions*. This general idea will be expanded further.
@@ -33,6 +33,8 @@ When putting into a computer we are going to need to mesh our function, otherwis
3333
- In case of regression, the output **has to** a fixed dimension, the need of different dimension leads to a new NN and a new training.
3434
For the case of image processing, where there's no trivial underlying function behind the image, we cannot take advantage of the use of Neural Operators, but in the case of distributions of physical quantities, e.g., temperature, where there's a underlying function behind it, we can leverage the use of Neural Operators to understand distribution function, and make predictions/controls based on it, decoupling the parametrization $\Theta$ from the discretization of the data. \[cite] *et al.* compared the errors of two networks: U-Net (NN topology) and PCA-Net (Neural operator topology), that were trained on different discretizations of the *same underlying function*, and the result is shown below:
3535

36+
![Alt text](Figures/diagram.png)
37+
3638
This brings a concept (that we'll try to keep with our definition of Neural Operators) called **Discretization Invariance**:
3739
- When we have Discretization Invariance we de-couple the parameters and the cost from the discretization, i.e., when changing the discretization the error doesn't vary.
3840
- If our model is Discretization Invariable, we can use information at different discretizations to train, and we can transfer parameters learned for one discretization to another, that leads to something called "zero-shot super-resolution", that basically consists of training into a smaller discretization and predicting into a bigger one, due to the Discretization Invariance. This concept, together with its limitations, will be discussed in the "Fourier Neural Operator" section.
@@ -76,6 +78,9 @@ Imagine that I want to approximate this operator $\mathcal G$ by means of an $\m
7678

7779
A general diagram is shown below:
7880

81+
![Alt text](Figures/diagram.png)
82+
83+
7984
In this case, we can see that our $\mathcal G^\dagger$ can be given by $\mathcal G^\dagger = K_\mathcal X \circ \varphi\circ L_\mathcal Y$, where $K_\mathcal X$ and $L_\mathcal Y$ are the operators that project $\mathcal X$ and $\mathcal Y$ to the non-infinite dimension spaces $\mathbb R^{n}$ and $\mathbb R^{n}$, respectively, and $\varphi$ is a non-linear function that maps $\mathbb R^{n}$ to $\mathbb R^{m}$. Different selections of the set {$K_\mathcal W$, $L_\mathcal W$, $\varphi$} generate different classes of Neural Operators.
8085

8186
We can, from this, see the first limitation of this technique: we're limited by how well is the approximation of $K_\mathcal WL_\mathcal W \approx I$. It turns out that, as described by \[cite], this is approximation is fairly general:
@@ -92,9 +97,9 @@ Let:
9297
- $\mathcal X$ be separable Banach spaces, and $\mu \in \mathcal P(\mathcal X)$ be a probability measure in $\mathcal X$.
9398
- $\mathcal G \in L_\mu^p(\mathcal X;\mathcal Y)$ for some $1\leq p < \infty$
9499
If $\mathcal Y$ is separable Hilbert space, and $\epsilon > 0$, *there exists* continuous, linear maps $K_\mathcal X:\mathcal X \rightarrow \mathbb R^n$, $L_\mathcal Y:\mathcal Y \rightarrow \mathbb R^m$, and $\varphi: \mathbb R^n \rightarrow \mathbb R^m$ such that:
95-
$$
100+
```math
96101
\| \mathcal G(u)-\mathcal G^\dagger(u)\|_{L_\mu^p(\mathcal X;\mathcal Y)} < \epsilon
97-
$$
102+
```
98103
Let's start by giving two classes of Neural Operators, the Principal Component Analysis Network (PCA-NET) and the Deep Operator Network (DeepONet).
99104

100105
## PCA
@@ -133,9 +138,9 @@ Let:
133138
- In the linear case, as described before, $S_1 = K_\mathcal X$, $S_L = K_\mathcal Y$ and they're connected through multiple $\varphi_j$.
134139
The above definition *looks a lot* like the typical definition of NNs, where each one of the $S_l$ is a layer of your NN. And, as we're going to see, it is! At least it is a generalization of the definition of NN to function space.
135140
\[cite] *et al.* proposed to create each one of this $S_l$ as follows:
136-
$$
141+
```math
137142
S_l(a)(x) = \sigma_l\bigg( W_la(x) + b_l + \int_\Omega\mathrm dz \ \kappa_l(x,z)a(z) \bigg), \ \ \ \ x \in \Omega
138-
$$
143+
```
139144
where:
140145
- $\sigma_l:\mathbb R^k\rightarrow\mathbb R^k$ is the non-linear activation function.
141146
- $W_l\in\mathbb R^k$ is a term related to a "residual network".
@@ -148,19 +153,19 @@ Different selections of $\kappa_l$ generate different classes of these non-linea
148153

149154
# Fourier Neural Operator
150155
Let $\kappa_l(x,z)=\kappa_l(x-z)$, the integral will then become:
151-
$$
156+
```math
152157
\int_\Omega \mathrm dz \ \kappa_l(x,z)a(z) = \int_\Omega \mathrm dz \ \kappa_l(x-z)a(z) =\kappa_l(x) * a(x)
153-
$$
158+
```
154159
where $*$ represents the convolution operator.
155160
And, as we know from Fourier Transformation Theory,
156-
$$
161+
```math
157162
\mathcal F\{\kappa_l(x)*a(x)\} = \mathcal F\{\kappa_l(x)\} \cdot\mathcal F\{a(x)\}
158-
$$
163+
```
159164
where $\mathcal F\{\cdot\}$ represents the Fourier transform of a function.
160165
We can than reduce the single layer $S_l$ represented before to the following:
161-
$$
166+
```math
162167
S_l(a)(x) = \sigma_l\bigg( W_la(x) + b_l + \mathcal F^{-1}\{\mathcal F\{\kappa_l(x)\} \cdot\mathcal F\{a(x)\}\} \bigg), \ \ \ \ x \in \Omega
163-
$$
168+
```
164169
This is basically what defines the Fourier Neural operator: the Neural Operator $\mathcal G^\dagger=S_1\circ \text{...} \circ S_L$ where each one of these $S_l$ is done by up/downscaling the previous output function using its fourier expansions.
165170

166171

0 commit comments

Comments
 (0)