You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training a neural network we need to find the gradient with respect to our data set. There are three main ways to partition our data when using a training algorithm like gradient descent: stochastic, batching and mini-batching. Stochastic gradient descent trains on a single random data point each epoch. This allows for the neural network to better converge to the global minimum even on noisy data but is computationally inefficient. Batch gradient descent trains on the whole data set each epoch and while computationally effiecient is prone to converging to local minima. Mini-batching combines both of these advantages and by training on a small random "mini-batch" of the data each epoch can converge to the global minimum while remaining more computationally effiecient than stochastic descent. Typically we do this by randomly selecting subsets of the data each epoch and use this subset to train on. We can also pre-batch the data by creating an iterator holding these randomly selected batches before beginning to train. The proper size for the batch can be determined expirementally. Let us see how to do this with Julia.
68
99
69
-
70
-
71
-
72
100
For this example we will use a very simple ordinary differential equation, newtons law of cooling. We can represent this in Julia like so.
73
101
74
-
75
-
76
102
```julia
77
103
using DifferentialEquations, Flux, Optim, DiffEqFlux, Plots
104
+
using IterTools: ncycle
105
+
78
106
79
107
functionnewtons_cooling(du, u, p, t)
80
108
temp = u[1]
@@ -86,54 +114,52 @@ function true_sol(du, u, p, t)
Now we define a neural-network using a linear approximation with 1 hidden layer of 8 neurons.
101
120
102
121
```julia
103
122
ann =FastChain(FastDense(1,8,tanh), FastDense(8,1,tanh))
104
-
pp =initial_params(ann)
105
-
prob =ODEProblem{false}(dudt_, u0, tspan, pp)
123
+
θ =initial_params(ann)
106
124
107
125
functiondudt_(u,p,t)
108
126
ann(u, p).* u
109
127
end
110
128
```
111
129
112
-
113
130
From here we build a loss function around it.
114
131
115
132
```julia
116
-
functionpredict_adjoint(fullp, time_batch)
133
+
functionpredict_adjoint(time_batch)
117
134
Array(concrete_solve(prob, Tsit5(),
118
-
u0, fullp, saveat = time_batch))
135
+
u0, θ, saveat = time_batch))
119
136
end
120
137
121
-
functionloss_adjoint(fullp, batch, time_batch)
122
-
pred =predict_adjoint(fullp,time_batch)
123
-
sum(abs2, batch - pred), pred
138
+
functionloss_adjoint(batch, time_batch)
139
+
pred =predict_adjoint(time_batch)
140
+
sum(abs2, batch - pred)#, pred
124
141
end
125
142
```
126
143
127
144
To add support for batches of size `k` we use `Flux.Data.DataLoader`. To use this we pass in the `ode_data` and `t` as the 'x' and 'y' data to batch respectively. The parameter `batchsize` controls the size of our batches. We check our implementation by iterating over the batched data.
title!("Neural ODE for Newton's Law of Cooling: Test Data")
207
+
xlabel!("Time")
208
+
ylabel!("Temp")
166
209
```
167
210
168
211
We can also minibatch using tools from `MLDataUtils`. To do this we need to slightly change our implementation and is shown below again with a batch size of k and the same number of epochs.
@@ -171,7 +214,6 @@ We can also minibatch using tools from `MLDataUtils`. To do this we need to slig
0 commit comments