Fix big memory allocations in L1GS#238
Conversation
@termi-official these are the outputs from the two examples in the Also one thing that might be optimized is |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #238 +/- ##
==========================================
+ Coverage 71.71% 73.03% +1.32%
==========================================
Files 78 78
Lines 6515 6865 +350
==========================================
+ Hits 4672 5014 +342
- Misses 1843 1851 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
termi-official
left a comment
There was a problem hiding this comment.
Thanks for the changes Abdelrahman. There are some minor things left tho.
@AbdAlazezAhmed can you confirm that this works now on the ECG?
There was a problem hiding this comment.
There are still some issues.
- ThreadedSparseMatrixCSC support is missing
- The preconditioner tanks the symmetry
- There is too much of memory usage
Lets try to make the following work
using Thunderbolt
using SparseArrays,OrderedCollections,TimerOutputs,LinearAlgebra
# TimerOutputs.enable_debug_timings(Thunderbolt)
# TimerOutputs.enable_debug_timings(Thunderbolt.Preconditioners)
geo = Hexahedron
nel = 2
size = 2.
signal_strength = 0.04
nel_heart = (6, 6, 6) .* 1 .* nel
nel_torso = (8, 8, 8) .* Int(size) .* nel
ground_vertex = Vec(0.0, 0.0, 0.0)
electrodes = [
ground_vertex,
Vec(-size, 0., 0.),
Vec( size, 0., 0.),
Vec( 0., -size, 0.),
Vec( 0., size, 0.),
Vec( 0., 0., -size),
Vec( 0., 0., size),
]
electrode_pairs = [[i,1] for i in 2:length(electrodes)]
heart_grid = generate_mesh(geo, nel_heart)
Ferrite.transform_coordinates!(heart_grid, x->Vec{3}(sign.(x) .* x.^2))
κ = ConstantCoefficient(SymmetricTensor{2,3,Float64}((1.0, 0, 0, 1.0, 0, 1.0)))
κᵢ = AnalyticalCoefficient(
(x,t) -> norm(x,Inf) ≤ 1.0 ? SymmetricTensor{2,3,Float64}((1.0, 0, 0, 1.0, 0, 1.0)) : SymmetricTensor{2,3,Float64}((0.0, 0, 0, 0.0, 0, 0.0)),
CartesianCoordinateSystem{3}()
)
heart_model = TransientDiffusionModel(
κᵢ,
NoStimulationProtocol(), # Poisoning to detecte if we accidentally touch these
:φₘ
)
heart_fun = semidiscretize(
heart_model,
FiniteElementDiscretization(
Dict(:φₘ => LagrangeCollection{1}()),
Dirichlet[]
),
heart_grid
)
op = Thunderbolt.setup_assembled_operator(
Thunderbolt.PerColorAssemblyStrategy(Thunderbolt.PolyesterDevice(8)),
Thunderbolt.BilinearDiffusionIntegrator(
κ,
QuadratureRuleCollection(2),
:φₘ,
),
Thunderbolt.SparseMatrixCSC,
heart_fun.dh,
)
Thunderbolt.update_operator!(op, 0.0) # trigger assembly
torso_grid_ = generate_grid(geo, nel_torso, Vec((-size,-size,-size)), Vec((size,size,size)))
addcellset!(torso_grid_, "heart", x->norm(x,Inf) ≤ 1.0)
# addcellset!(torso_grid_, "surrounding-tissue", x->norm(x,Inf) ≥ 1.0)
torso_grid_.cellsets["surrounding-tissue"] = OrderedSet([i for i in 1:getncells(torso_grid_) if i ∉ torso_grid_.cellsets["heart"]])
torso_grid = Thunderbolt.to_mesh(torso_grid_)
u = zeros(Thunderbolt.solution_size(heart_fun))
geselowitz_electrodes = [[electrodes[1], electrodes[i]] for i in 2:length(electrodes)]
TimerOutputs.reset_timer!()
@info prod(nel_torso)
geselowitz_ecg = Thunderbolt.Geselowitz1989ECGLeadCache(
heart_fun,
torso_grid,
κᵢ, κ,
geselowitz_electrodes;
ground = OrderedSet([Thunderbolt.get_closest_vertex(ground_vertex, torso_grid)]),
torso_heart_domain=["heart"],
ipc = LagrangeCollection{1}(),
qrc = QuadratureRuleCollection(3),
linear_solver = Thunderbolt.LinearSolve.KrylovJL_CG(precs = (A, p) -> (L1GSPrecBuilder(PolyesterDevice(8))(A, ceil(Int, SparseArrays.size(A,1)/8); isSymA=true), LinearAlgebra.I)),
# linear_solver = Thunderbolt.LinearSolve.KrylovJL_GMRES(precs = (A, p) -> (L1GSPrecBuilder(PolyesterDevice(8))(A, ceil(Int, 24)), LinearAlgebra.I)),
system_matrix_type = Thunderbolt.SparseMatrixCSR{Float64,Int64}, #Thunderbolt.ThreadedSparseMatrixCSR{Float64,Int64},
strategy = Thunderbolt.PerColorAssemblyStrategy(Thunderbolt.PolyesterDevice(8)),
)
TimerOutputs.print_timer()You need to apply these patches.
diff --git a/src/solver/interface.jl b/src/solver/interface.jl
index 8234949c..4279e657 100644
--- a/src/solver/interface.jl
+++ b/src/solver/interface.jl
@@ -148,13 +148,17 @@ update_constraints_block!(f::NullFunction, i::Block, solver_cache::AbstractTimeS
create_system_matrix(T::Type{<:AbstractMatrix}, f::AbstractSemidiscreteFunction) = create_system_matrix(T, f.dh)
-function create_system_matrix(::Type{<:ThreadedSparseMatrixCSR{Tv,Ti}}, dh::AbstractDofHandler) where {Tv,Ti}
- Acsct = transpose(convert(SparseMatrixCSC{Tv,Ti}, allocate_matrix(dh)))
+function create_system_matrix(::Type{<:ThreadedSparseMatrixCSR}, dh::AbstractDofHandler)
+ Acsct = allocate_matrix(SparseMatrixCSR{1,Float64,Int64}, dh)
return ThreadedSparseMatrixCSR(Acsct)
end
function create_system_matrix(SpMatType::Type{<:SparseMatrixCSC}, dh::AbstractDofHandler)
- A = convert(SpMatType, allocate_matrix(dh))
+ A = allocate_matrix(SparseMatrixCSC{Float64,Int64}, dh)
+ return A
+end
+function create_system_matrix(SpMatType::Type{<:SparseMatrixCSR}, dh::AbstractDofHandler)
+ A = allocate_matrix(SparseMatrixCSR{1,Float64,Int64}, dh)
return A
enddiff --git a/src/modeling/electrophysiology/ecg.jl b/src/modeling/electrophysiology/ecg.jl
index b0f47170..9db426db 100644
--- a/src/modeling/electrophysiology/ecg.jl
+++ b/src/modeling/electrophysiology/ecg.jl
@@ -402,6 +402,7 @@ function Geselowitz1989ECGLeadCache(
linear_solver = LinearSolve.KrylovJL_CG(),
solution_vector_type = Vector{Float64},
system_matrix_type = ThreadedSparseMatrixCSR{Float64,Int64},
+ strategy = SequentialAssemblyStrategy(SequentialCPUDevice()),
)
return Geselowitz1989ECGLeadCache(
heart_fun,
@@ -416,6 +417,7 @@ function Geselowitz1989ECGLeadCache(
linear_solver ,
solution_vector_type,
system_matrix_type ,
+ strategy ,
)
end
| η = convert(Tf, η) | ||
| D_Dl1 = adapt(backend, zeros(Tf, N)) # D + Dˡ | ||
| last_partsize = N - (nparts - 1) * partsize # size of the last partition | ||
| SLbuffer_size = (partsize * (partsize - 1) * (nparts-1)) ÷ 2 + last_partsize * (last_partsize - 1) ÷ 2 |
There was a problem hiding this comment.
We should store the lower/upper triangular parts in some sparse format. I do not have opinions about which one exactly.
| (; backend) = partitioning | ||
| # The following code is required because there is no assumption on the compatibality of x with the backend. | ||
| @timeit_debug "adapt(backend, y)" _y = adapt(backend, y) | ||
| @timeit_debug "_forward_sweep!" _forward_sweep!(_y, P) |
There was a problem hiding this comment.
The type of sweep should be ideally a user option -- and must be symmetric if the problem is symmetric, as we tank the symmetry needed for the solver otherwise.
|
Thunderbolt.jl/src/solver/linear/preconditioners/l1_gauss_seidel.jl Lines 144 to 146 in b34c57a Also, I don't know whether this is done intentionally or simply overlooked, but the reference to |
Yea, I am not sure how to fix this. Nothing worth to spend time on right now tho. |
Doesn't seem to be referenced in an md file |
| G_{SGS} = (D + L^T)^{-1} L (D + L)^{-1} L^T \rightarrow G_{SGS} = G_b G_f\\ | ||
| x^{(k+1)} = G_{SGS} x^{(k)} + (I - G_{SGS}) A^{-1} b |
There was a problem hiding this comment.
This does not look correct. There shouldn't be an inv(A).
There was a problem hiding this comment.
I know it's wierd at first glance and not standard but this is required (from my pov after deep thinking) to explain the difference in an easy way (so it is for demo only, I need to write every thing in G)
proof that this works is easy, we know for fact that the update eq is
=>
=>
I can write note to clarify this is for demo and show the standard eq
also I can include this proof in the note
There was a problem hiding this comment.
Indeed correct. However, my point here was rather that this does not reflect what is happening in the implementation. We never invert A at some point directly, because we would be done at that point with solving the system. Does that clarify the problem with the explanation given above from the perspective of a new user?
There was a problem hiding this comment.
Got your point. I 'll change it to sth more convenient.
|
@AbdAlazezAhmed can you test the ECG again using this branch? |
4b60be9 to
a1bbc86
Compare
closes #230