Skip to content

Commit 5c869ef

Browse files
don't do beeler
1 parent 7cfe183 commit 5c869ef

File tree

2 files changed

+26
-23
lines changed

2 files changed

+26
-23
lines changed

docs/make.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ using Sundials, DASKR
55
cp("./docs/Manifest.toml", "./docs/src/assets/Manifest.toml", force = true)
66
cp("./docs/Project.toml", "./docs/src/assets/Project.toml", force = true)
77

8+
ENV["PLOTS_TEST"] = "true"
9+
ENV["GKSwstype"] = "100"
10+
811
include("pages.jl")
912

1013
makedocs(modules = [

docs/src/examples/beeler_reuter.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Let's start by developing a CPU only IMEX solver. The main idea is to use the *D
2828

2929
First, we define the model constants:
3030

31-
```@example beeler
31+
```julia
3232
const v0 = -84.624
3333
const v1 = 10.0
3434
const C_K1 = 1.0f0
@@ -51,8 +51,8 @@ Note that the constants are defined as `Float32` and not `Float64`. The reason i
5151

5252
Next, we define a struct to contain our state. `BeelerReuterCpu` is a functor and we will define a deriv function as its associated function.
5353

54-
```@example beeler
55-
mutable struct BeelerReuterCpu <: Function
54+
```julia
55+
mutable struct BeelerReuterCpu
5656
t::Float64 # the last timestep time to calculate Δt
5757
diff_coef::Float64 # the diffusion-coefficient (coupling strength)
5858

@@ -92,7 +92,7 @@ end
9292

9393
The finite-difference Laplacian is calculated in-place by a 5-point stencil. The Neumann boundary condition is enforced. Note that we could have also used [DiffEqOperators.jl](https://github.com/JuliaDiffEq/DiffEqOperators.jl) to automate this step.
9494

95-
```@example beeler
95+
```julia
9696
# 5-point stencil
9797
function laplacian(Δu, u)
9898
n1, n2 = size(u)
@@ -154,7 +154,7 @@ $$g(t + \Delta{t}) = g(t) + \Delta{t}\frac{dg}{dt}.$$
154154

155155
`rush_larsen` is a helper function that use the Rush-Larsen method to integrate the gating variables.
156156

157-
```@example beeler
157+
```julia
158158
@inline function rush_larsen(g, α, β, Δt)
159159
inf = α/+β)
160160
τ = 1f0 /+β)
@@ -164,7 +164,7 @@ end
164164

165165
The gating variables are updated as below. The details of how to calculate $\alpha$ and $\beta$ are based on the Beeler-Reuter model and not of direct interest to this tutorial.
166166

167-
```@example beeler
167+
```julia
168168
function update_M_cpu(g, v, Δt)
169169
# the condition is needed here to prevent NaN when v == 47.0
170170
α = isapprox(v, 47.0f0) ? 10.0f0 : -(v+47.0f0) / (exp(-0.1f0*(v+47.0f0)) - 1.0f0)
@@ -205,7 +205,7 @@ end
205205

206206
The intracelleular calcium is not technically a gating variable, but we can use a similar explicit exponential integrator for it.
207207

208-
```@example beeler
208+
```julia
209209
function update_C_cpu(g, d, f, v, Δt)
210210
ECa = D_Ca - 82.3f0 - 13.0278f0 * log(g)
211211
kCa = C_s * g_s * d * f
@@ -236,7 +236,7 @@ Now, it is time to define the derivative function as an associated function of *
236236

237237
Here, every time step is called three times. We distinguish between two types of calls to the deriv function. When $t$ changes, the gating variables are updated by calling `update_gates_cpu`:
238238

239-
```@example beeler
239+
```julia
240240
function update_gates_cpu(u, XI, M, H, J, D, F, C, Δt)
241241
let Δt = Float32(Δt)
242242
n1, n2 = size(u)
@@ -260,7 +260,7 @@ end
260260

261261
On the other hand, du is updated at each time step, since it is independent of $\Delta{t}$.
262262

263-
```@example beeler
263+
```julia
264264
# iK1 is the inward-rectifying potassium current
265265
function calc_iK1(v)
266266
ea = exp(0.04f0*(v+85f0))
@@ -314,7 +314,7 @@ end
314314

315315
Finally, we put everything together is our deriv function, which is a call on `BeelerReuterCpu`.
316316

317-
```@example beeler
317+
```julia
318318
function (f::BeelerReuterCpu)(du, u, p, t)
319319
Δt = t - f.t
320320

@@ -337,22 +337,22 @@ end
337337

338338
Time to test! We need to define the starting transmembrane potential with the help of global constants **v0** and **v1**, which represent the resting and activated potentials.
339339

340-
```@example beeler
340+
```julia
341341
const N = 192;
342342
u0 = fill(v0, (N, N));
343343
u0[90:102,90:102] .= v1; # a small square in the middle of the domain
344344
```
345345

346346
The initial condition is a small square in the middle of the domain.
347347

348-
```@example beeler
348+
```julia
349349
using Plots
350350
heatmap(u0)
351351
```
352352

353353
Next, the problem is defined:
354354

355-
```@example beeler
355+
```julia
356356
using DifferentialEquations, Sundials
357357

358358
deriv_cpu = BeelerReuterCpu(u0, 1.0);
@@ -361,11 +361,11 @@ prob = ODEProblem(deriv_cpu, u0, (0.0, 50.0));
361361

362362
For stiff reaction-diffusion equations, CVODE_BDF from Sundial library is an excellent solver.
363363

364-
```@example beeler
364+
```julia
365365
@time sol = solve(prob, CVODE_BDF(linear_solver=:GMRES), saveat=100.0);
366366
```
367367

368-
```@example beeler
368+
```julia
369369
heatmap(sol.u[end])
370370
```
371371

@@ -399,7 +399,7 @@ The key to fast CUDA programs is to minimize CPU/GPU memory transfers and global
399399

400400
We modify ``BeelerReuterCpu`` into ``BeelerReuterGpu`` by defining the state variables as *CuArray*s instead of standard Julia *Array*s. The name of each variable defined on GPU is prefixed by *d_* for clarity. Note that $\Delta{v}$ is a temporary storage for the Laplacian and stays on the CPU side.
401401

402-
```@example beeler
402+
```julia
403403
using CUDA
404404

405405
mutable struct BeelerReuterGpu <: Function
@@ -447,7 +447,7 @@ end
447447

448448
The Laplacian function remains unchanged. The main change to the explicit gating solvers is that *exp* and *expm1* functions are prefixed by *CUDAnative.*. This is a technical nuisance that will hopefully be resolved in future.
449449

450-
```@example beeler
450+
```julia
451451
function rush_larsen_gpu(g, α, β, Δt)
452452
inf = α/+β)
453453
τ = 1.0/+β)
@@ -503,7 +503,7 @@ end
503503

504504
Similarly, we modify the functions to calculate the individual currents by adding CUDAnative prefix.
505505

506-
```@example beeler
506+
```julia
507507
# iK1 is the inward-rectifying potassium current
508508
function calc_iK1(v)
509509
ea = CUDAnative.exp(0.04f0*(v+85f0))
@@ -563,7 +563,7 @@ A CUDA programmer is free to interpret the calculated index however it fits the
563563
In the GPU version of the solver, each thread works on a single element of the medium, indexed by a (x,y) pair.
564564
`update_gates_gpu` and `update_du_gpu` are very similar to their CPU counterparts but are in fact CUDA kernels where the *for* loops are replaced with CUDA specific indexing. Note that CUDA kernels cannot return a valve; hence, *nothing* at the end.
565565

566-
```@example beeler
566+
```julia
567567
function update_gates_gpu(u, XI, M, H, J, D, F, C, Δt)
568568
i = (blockIdx().x-UInt32(1)) * blockDim().x + threadIdx().x
569569
j = (blockIdx().y-UInt32(1)) * blockDim().y + threadIdx().y
@@ -608,7 +608,7 @@ end
608608

609609
Finally, the deriv function is modified to copy *u* to GPU and copy *du* back and to invoke CUDA kernels.
610610

611-
```@example beeler
611+
```julia
612612
function (f::BeelerReuterGpu)(du, u, p, t)
613613
L = 16 # block size
614614
Δt = t - f.t
@@ -636,23 +636,23 @@ end
636636

637637
Ready to test!
638638

639-
```@example beeler
639+
```julia
640640
using DifferentialEquations, Sundials
641641

642642
deriv_gpu = BeelerReuterGpu(u0, 1.0);
643643
prob = ODEProblem(deriv_gpu, u0, (0.0, 50.0));
644644
@time sol = solve(prob, CVODE_BDF(linear_solver=:GMRES), saveat=100.0);
645645
```
646646

647-
```@example beeler
647+
```julia
648648
heatmap(sol.u[end])
649649
```
650650

651651
## Summary
652652

653653
We achieve around a 6x speedup with running the explicit portion of our IMEX solver on a GPU. The major bottleneck of this technique is the communication between CPU and GPU. In its current form, not all of the internals of the method utilize GPU acceleration. In particular, the implicit equations solved by GMRES are performed on the CPU. This partial CPU nature also increases the amount of data transfer that is required between the GPU and CPU (performed every f call). Compiling the full ODE solver to the GPU would solve both of these issues and potentially give a much larger speedup. [JuliaDiffEq developers are currently working on solutions to alleviate these issues](http://www.stochasticlifestyle.com/solving-systems-stochastic-pdes-using-gpus-julia/), but these will only be compatible with native Julia solvers (and not Sundials).
654654

655-
```@example beeler, echo = false, skip="notebook"
655+
```julia, echo = false, skip="notebook"
656656
using SciMLTutorials
657657
SciMLTutorials.tutorial_footer(WEAVE_ARGS[:folder],WEAVE_ARGS[:file])
658658
```

0 commit comments

Comments
 (0)