Skip to content

Commit 086fa86

Browse files
committed
lecture polishing
1 parent 52b986e commit 086fa86

File tree

3 files changed

+217
-29
lines changed

3 files changed

+217
-29
lines changed

docs/src/lecture_12/lab_2.jl

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
abstract type AbstractODEProblem end
2+
3+
struct ODEProblem{F,T,U,P} <: AbstractODEProblem
4+
f::F
5+
tspan::T
6+
u0::U
7+
θ::P
8+
end
9+
10+
abstract type ODESolver end
11+
struct Euler{T} <: ODESolver
12+
dt::T
13+
end
14+
struct RK2{T} <: ODESolver
15+
dt::T
16+
end
17+
18+
function f(x,θ)
19+
α, β, γ, δ = θ
20+
x₁, x₂ = x
21+
22+
dx₁ = α*x₁ - β*x₁*x₂
23+
dx₂ = δ*x₁*x₂ - γ*x₂
24+
25+
[dx₁, dx₂]
26+
end
27+
28+
function solve(prob::AbstractODEProblem, solver::ODESolver)
29+
t = prob.tspan[1]; u = prob.u0
30+
us = [u]; ts = [t]
31+
while t < prob.tspan[2]
32+
(u,t) = solver(prob, u, t)
33+
push!(us,u)
34+
push!(ts,t)
35+
end
36+
ts, reduce(hcat,us)
37+
end
38+
39+
function (solver::Euler)(prob::ODEProblem, u, t)
40+
f, θ, dt = prob.f, prob.θ, solver.dt
41+
(u + dt*f(u,θ), t+dt)
42+
end
43+
44+
function (solver::RK2)(prob::ODEProblem, u, t)
45+
f, θ, dt = prob.f, prob.θ, solver.dt
46+
uh = u + f(u,θ)*dt
47+
u + dt/2*(f(u,θ) + f(uh,θ)), t+dt
48+
end
49+
50+
51+
θ = [0.1,0.2,0.3,0.2]
52+
u0 = [1.0,1.0]
53+
tspan = (0.,100.)
54+
dt = 0.1
55+
prob = ODEProblem(f,tspan,u0,θ)
56+
57+
t,X=solve(prob, RK2(0.2))
58+
59+
using Plots
60+
p1 = plot(t, X[1,:], label="x", lw=3)
61+
plot!(p1, t, X[2,:], label="y", lw=3)
62+
63+
display(p1)
64+
65+
#
66+
θ = [0.2,0.2,0.3,0.2]
67+
u0 = [1.0,1.0]
68+
tspan = (0.,100.)
69+
dt = 0.1
70+
prob2 = ODEProblem(f,tspan,u0,θ)
71+
72+
t,X2=solve(prob2, RK2(0.2))
73+
74+
using Optim
75+
76+
function loss(θin,prob::ODEProblem,Y)
77+
prob.θ.=θin
78+
t,Xn=solve(prob,RK2(0.2))
79+
sum((Y.-Xn).^2)
80+
end
81+
θopt = copy(θ)
82+
O=Optim.optimize->loss(θ,prob,X),θopt)
83+
O=Optim.optimize->loss(θ,prob,X),θopt,LBFGS())
84+
85+
using DiffEqFlux
86+
nn=FastDense(2,2)
87+
p = initial_params(nn)
88+
nn([1,2],p)
89+
90+
91+
function fy(x,θ)
92+
α, β, γ, δ, ω = θ
93+
x₁, x₂ = x
94+
95+
dx₁ = α*x₁ - β*x₁*x₂ + ω*x₂
96+
dx₂ = δ*x₁*x₂ - γ*x₂
97+
98+
[dx₁, dx₂]
99+
end
100+
101+
#
102+
θy = [0.2,0.2,0.3,0.2,0.1]
103+
u0 = [1.0,1.0]
104+
tspan = (0.,100.)
105+
dt = 0.1
106+
proby = ODEProblem(fy,tspan,u0,θy)
107+
108+
t,Xy=solve(proby, RK2(0.2))
109+
110+
py = plot(t, Xy[1,:], label="x", lw=3)
111+
plot!(py, t, Xy[2,:], label="y", lw=3)
112+
savefig("LV_omega.svg")
113+
114+
function fnn(x,θ)
115+
α, β, γ, δ = θ[1:4]
116+
x₁, x₂ = x
117+
118+
dx₁ = α*x₁ - β*x₁*x₂
119+
dx₂ = δ*x₁*x₂ - γ*x₂
120+
121+
[dx₁, dx₂]+nn(x,@view θ[5:end])
122+
end
123+
124+
θnn = [0.2,0.2,0.3,0.2,0.01*initial_params(nn)...]
125+
probnn = ODEProblem(fnn,tspan,u0,θnn)
126+
127+
θopt = copy(θnn)
128+
O=Optim.optimize->loss(θ,probnn,Xy),θopt)

docs/src/lecture_13/lecture.md

Lines changed: 89 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
# Data-driven Ordinary Differential Equations
22

3-
We have looked into the uncertainty propagation trough an ODE in previous lecture. The uncertainty may stem from:
4-
- unknown boundary consitions (e.g. initial conditions)
5-
- unknown parameters,
6-
- missing terms (hidden dynamics) of ODE
3+
We have looked into the uncertainty propagation through an ODE in the previous lecture. The uncertainty may stem from:
4+
- unknown boundary conditions (e.g. initial conditions)
5+
- unknown parameters (reproduction rates, etc.)
6+
- missing terms (hidden dynamics) of an ODE
77

88
The uncertainty in the solution can be reduced when data are available. This can be either in incremental or batch form:
9-
- batch form: we have a set of data
10-
- incremental: common e.g. in temporal evolution, when the data are measured on the fly and systems cna change in time
9+
- batch form: we have a set of data and we look for their explanation
10+
- incremental: common e.g. in temporal evolution, when the data are measured on the fly and systems can change in time (stochastic ODE)
11+
- if done right, the incremental solution also solves the batch problem.
1112

1213
## Fitting ODE solution to data
1314

14-
Since ODE solver is a function like any other, it is possible to use general-purpose optimizers to optimize parameters of the ODE to match the output.
15+
Since the ODE solver is a function like any other, it is possible to use general-purpose optimizers to optimize parameters of the ODE to match the output.
1516
```julia
1617
using Optim
1718

@@ -22,14 +23,15 @@ function loss(θin,prob::ODEProblem,Y)
2223
end
2324
θopt = copy(θ)
2425
O=Optim.optimize->loss(θ,prob,X),θopt)
26+
Olb=Optim.optimize->loss(θ,prob,X),θopt,LBFGS())
2527
```
2628

27-
- show various optimizers?
28-
- gradient optimizers?
29+
- using the power of automatic differentiation (of the numerical solver)
30+
- in the case of ODE, the gradients can be modified to use the information about exact derivatives (adjoints)
2931

3032
## Extending the ODE
3133

32-
The previous approach will work only if the data were generated by the exact ODE. If the structure of ODE is different, e.g. soem terms are missing, we can never find an exact fit.
34+
The previous approach will work only if the data were generated by the exact ODE. If the structure of ODE is different, e.g. some terms are missing, we can never find an exact fit.
3335

3436
```math
3537
\begin{align}
@@ -40,13 +42,13 @@ dot{y}&=-\delta y+\gamma xy,
4042

4143
We could "guess" what is the missing term or add a black box (neural network). The whole problem will become finding parameters ``\theta = [\theta_{ODE},\theta_{NN}]``.
4244
```math
43-
\frac{d\mathbf{x}}{dt}=f(\mathbf{x},\theta) + NN(\mathbf{x},\theta)
45+
\frac{d\mathbf{x}}{dt}=f(\mathbf{x},\theta_{ODE}) + NN(\mathbf{x},\theta_{NN})
4446
```
4547
In the limiting case, we may learn only the network:
4648
```math
47-
\frac{d\mathbf{x}}{dt}=f(\mathbf{x},\theta) + NN(\mathbf{x},\theta)
49+
\frac{d\mathbf{x}}{dt}= NN(\mathbf{x},\theta_{NN})
4850
```
49-
known as the "Neural ODE" [citation needed].
51+
known as the "Neural ODE" (Chen et. al. 2018).
5052

5153
# Neural Networks in Julia
5254
Many possible packages implementing Neural Networks (Flux, Knets, MXnets, tensorFlow) etc. By far the most used package is the ```Flux.jl```
@@ -82,7 +84,7 @@ function (a::Dense)(x::AbstractArray)
8284
end
8385
```
8486

85-
While it is straightforward compose a MLP:
87+
Building an MLP is straightforward:
8688
```julia
8789
nx = 2
8890
nn = Chain(Dense(rand(nx,nx),rand(nx)))
@@ -107,7 +109,9 @@ gs[ps[1]]
107109

108110
This approach has benefits and drawbacks:
109111
- it allows to write a very general code easily,
110-
- removing parameter from optimization can be done by removing it from the parameter list
112+
- the list of parameters is accessible for modifications:
113+
- removing parameter from optimization can be done by removing it from the parameter list
114+
- adding a parameter (e.g. from the ODE) allows composition of NN with other code
111115
- it introduces and overhead
112116
- may be negligible for large models (hundrets of hidden neurons) that are dominated by matrix manipulation.
113117
- becomes significant for low dimensional models (ODEs)
@@ -140,7 +144,9 @@ end
140144
```
141145
It does not store its parameters but operates on an external parameter vector:
142146
```julia
143-
(f::FastDense)(x,p) = ((f.bias == true ) ? (f.σ.(reshape(p[1:(f.out*f.in)],f.out,f.in)*x .+ p[(f.out*f.in+1):end])) : (f.σ.(reshape(p[1:(f.out*f.in)],f.out,f.in)*x)))
147+
(f::FastDense)(x,p) = ((f.bias == true )
148+
? (f.σ.(reshape(p[1:(f.out*f.in)],f.out,f.in)*x .+ p[(f.out*f.in+1):end]))
149+
: (f.σ.(reshape(p[1:(f.out*f.in)],f.out,f.in)*x)))
144150
```
145151

146152
The same behavior is replicated in FastChain:
@@ -150,7 +156,7 @@ struct FastChain{T<:Tuple} <: FastLayer
150156
# function FastChain(xs...)...
151157
end
152158
```
153-
Since it is a Layer, it implemnets interfaces:
159+
Since it is a Layer, it implements interfaces:
154160
```julia
155161
paramlength(c::FastChain) = sum(paramlength(x) for x in c.layers)
156162
initial_params(c::FastChain) = vcat(initial_params.(c.layers)...)
@@ -163,13 +169,20 @@ applychain(::Tuple{}, x, p) = x
163169
applychain(fs::Tuple, x, p) = applychain(Base.tail(fs), first(fs)(x,p[1:paramlength(first(fs))]), p[(paramlength(first(fs))+1):end])
164170
```
165171

172+
This allows to implement layers with StaticArrays (allocating on the stack).
173+
166174
The same 2x2 network can be implemented as:
167175
```julia
168176
nn=FastDense(2,2)
169177
p = initial_params(nn)
170178
nn([1,2],p)
171179
```
172180

181+
Effects of code composition in Julia:
182+
- Toolboxes of Neural Networks in Julia are often lightweight
183+
- the tools necessary for their training are not specific to NN (AD: Zygote, Enzyme)
184+
- Combination with ODE is straigthforward
185+
173186
# Neural Networks in ODEs:
174187

175188
Neural networks are universal approximators. They can approaximate:
@@ -179,17 +192,38 @@ Neural networks are universal approximators. They can approaximate:
179192
## Neural Lotka-Volterra
180193
Consider an extension of the LV ODE by a MLP:
181194
```julia
182-
function fnn(x,θ,nn,p)
183-
α,β,γ,δ = θ
184-
x1,x2=x
185-
dx1 = α*x1 - β*x1*x2
186-
dx2 = δ*x1*x2 - γ*x2
187-
[dx1,dx2]+nn(x,p)
195+
196+
function fnn(x,θ)
197+
α, β, γ, δ = θ[1:4]
198+
x₁, x₂ = x
199+
200+
dx₁ = α*x₁ - β*x₁*x₂
201+
dx₂ = δ*x₁*x₂ - γ*x₂
202+
203+
[dx₁, dx₂]+nn(x,@view θ[5:end])
188204
end
189205
```
206+
Can be implemented via a closure (closing on nn).
207+
208+
Optimize using the same approach as before:
209+
```julia
210+
θnn = [0.2,0.2,0.3,0.2,0.01*initial_params(nn)...]
211+
probnn = ODEProblem(fnn,tspan,u0,θnn)
212+
213+
θopt = copy(θnn)
214+
O=Optim.optimize->loss(θ,probnn,Xy),θopt,Optim.Options(iterations=10000))
215+
```
216+
217+
The ``Xy`` data were generated with the ω version of the ODE, with parameters ``\theta=[0.2,0.2,0.3,0.2,0.1]``.
218+
219+
Optimization difficulties:
220+
- the number of iteration in Nelder-Mead had to be increased
221+
- LBGFS() optimizer extremely slow
222+
223+
Why?
224+
225+
190226

191-
- run through solver (RK2)
192-
- optimize, show ambiguity
193227

194228
## Physics-informed Neural Network
195229

@@ -221,6 +255,15 @@ This straightforward approach was proposed relatively recently (2019).
221255
- need for higher-order derivatives
222256
- numerical issues
223257

258+
Very simple extension for known data:
259+
```math
260+
\begin{align}
261+
\mathcal{L}=&||nn(0)-x0)|| + \frac{1}{N}\sum_{i=1}^N||f(x_i)-\nabla_x nn(x_i) ||\\
262+
& + \frac{1}{M}\sum_{i=1}^M||y_i - h(nn(x_i))||
263+
\end{align}
264+
```
265+
where ``h()`` is a function transforming ODE solution to observations (e.g. identity, or selection of the relevant observations).
266+
224267
Worked out in the lab.
225268

226269
Can be combined with Neural ODE.
@@ -231,7 +274,7 @@ So far, we have seen optimizations of the ODEs in the form of point estimate. We
231274
- the measurement are uncertain with large possible error
232275
- the number of measurements is insufficient to fit the model.
233276

234-
Consider the Monte Carlo simulation from the previous lecture:
277+
Consider the Monte Carlo simulation from the previous lecture extended for unknown parameter:
235278
```julia
236279
K=100
237280
X0 = [x0 .+ 0.1*randn(2) for k=1:K]
@@ -248,7 +291,7 @@ Point estimate is the trajectory with the thick color.
248291
- it is the one with minimum error
249292
- is it really the solution?
250293

251-
Lets, select all trajectories withing a selected tolerance:
294+
Lets select all trajectories within a selected tolerance:
252295

253296
![](LV_MC_param_assim.svg)
254297

@@ -258,7 +301,17 @@ When the data are collected sequentially, the process of reduction of the uncert
258301
1. prediction - use ODE with uncertainty propagation to the next step,
259302
2. correction - use the acquired measurement to reduce the uncertainty
260303

261-
How exactly are these steps implemented depends on the assumptions made on the type of model uncertainty (initial conditions, parameters, noise) and the measurment uncertainty (noise).
304+
In mathematics, it is direct application of the Bayes rule:
305+
```math
306+
\begin{align}
307+
p(\mathbf{x},\mathbf{y}) =p(\mathbf{y}|\mathbf{x})p(\mathbf{x})=p(\mathbf{x}|\mathbf{y})p(\mathbf{y})
308+
p(\mathbf{x}|\mathbf{y}) =\frac{p(\mathbf{y}|\mathbf{x})p(\mathbf{x})}{p(\mathbf{y})}=\frac{p(\mathbf{y}|\mathbf{x})p(\mathbf{x})}{\int p(\mathbf{y}|\mathbf{x})p(\mathbf{x})d\mathbf{x}}
309+
\end{align}
310+
```
311+
312+
Tradeoff between generality and speed
313+
- A implementation of the whole procedure can be implemented on general level using types for probability distributions and operations on them.
314+
- How exactly are these steps implemented depends on the assumptions made on the type of model uncertainty (initial conditions, parameters, noise) and the measurment uncertainty (noise).
262315

263316
We have done propagation of the Gaussian uncertainty through an ODE (GaussNum, Cubature rules). We will complement it by the correcton step here.
264317

@@ -273,6 +326,11 @@ p(\mathbf{x},\mathbf{y})&=\mathcal{N}\left(\begin{bmatrix}\mu_{x}\\
273326
\end{align}
274327
```
275328

329+
![](mvgaussian.png)
330+
331+
- marginal distributions are unaffected by the correlation
332+
- the correlation determines the reduction of uncertainty in the conditional case
333+
276334
We have uncertainty in all our unknowns ``p(\mathbf{x})`` in the form of quadrature points. We assume that the probability of observation of ``p(\mathbf{y}|\mathbf{x})`` has mean given by ``x`` and variance ``\sigma_y``.
277335
Hence, the means can be obtained by empirical samples of the cubature points ``X_p`` and measurements corresponding to cubature points.
278336
```math
@@ -290,7 +348,9 @@ The covariance matrices can be obtained by empirical samples:
290348
The uncertainty reduction is then application of the conditional distribution using the obtained means and variances. A common trick is to define the Kalman gain:
291349
```math
292350
\begin{align}
293-
\mu_{x|y}&=\mu_{x}+K(\mathbf{y}-\mu_{y}),\,\,\,\,K=\Sigma_{xy}\Sigma_{yy}^{-1},\\\Sigma_{x|y}&=\Sigma_{xx}-K\Sigma_{yx},
351+
K&=\Sigma_{xy}\Sigma_{yy}^{-1},\\
352+
\mu_{x|y}&=\mu_{x}+K(\mathbf{y}-\mu_{y}),\\
353+
\Sigma_{x|y}&=\Sigma_{xx}-K\Sigma_{yx},
294354
\end{align}
295355
```
296356

docs/src/lecture_13/mvgaussian.png

105 KB
Loading

0 commit comments

Comments
 (0)