-
Notifications
You must be signed in to change notification settings - Fork 6
Simplified and cleaner code #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Pull Request Test Coverage Report for Build 21143246237Details
💛 - Coveralls |
basic_patankar_step.basic_patankar_step.
|
Introduced a function to compute linear combinations for out-of-place computations. It is implemented recursively, but internally unrolled. using BenchmarkTools, StaticArrays
# --- Base cases (End of recursion) ---
# For PDSFunctions (with destruction vectors)
@inline lincomb(c1::Number, P1, d1::AbstractArray) = (c1 .* P1, c1 .* d1)
# For ConservativePDSFunctions (without destruction vectors)
@inline lincomb(c1::Number, P1, d1::Nothing) = (c1 .* P1, nothing)
# --- Recursive steps ---
# For PDSFunctions: Processes triplets (coeff, P, d)
@inline function lincomb(c1::Number, P1, d1::AbstractArray, c2, P2, d2, args...)
P_tail, d_tail = lincomb(c2, P2, d2, args...)
return (c1 .* P1 .+ P_tail, c1 .* d1 .+ d_tail)
end
# For ConservativePDSFunctions: Processes triplets (coeff, P, nothing)
@inline function lincomb(c1::Number, P1, d1::Nothing, c2, P2, d2, args...)
P_tail, _ = lincomb(c2, P2, d2, args...)
return (c1 .* P1 .+ P_tail, nothing)
end
function run_comprehensive_benchmark(N)
println("\n" * "█"^70)
println(" BENCHMARKING SYSTEM SIZE N = $N")
println("█"^70)
# Setup test data (3 stages)
P1, P2, P3 = rand(SMatrix{N,N}), rand(SMatrix{N,N}), rand(SMatrix{N,N})
d1, d2, d3 = rand(SVector{N}), rand(SVector{N}), rand(SVector{N})
c1, c2, c3 = 0.5, 0.3, 0.2
# --- CASE A: Standard PDS (with destruction vectors) ---
println("\n>>> CASE A: Standard PDS (d is AbstractArray)")
println("\n[A.1] Direct Input (Manual):")
b_direct_a = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3,
$c1 .* $d1 .+ $c2 .* $d2 .+ $c3 .* $d3)
display(b_direct_a)
println("\n[A.2] lincomb Function:")
b_func_a = @benchmark lincomb($c1, $P1, $d1, $c2, $P2, $d2, $c3, $P3, $d3)
display(b_func_a)
# --- CASE B: Conservative PDS (d is nothing) ---
println("\n" * "-"^70)
println(">>> CASE B: Conservative PDS (d is nothing)")
println("\n[B.1] Direct Input (Manual):")
# In manual case, we only calculate P
b_direct_b = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, nothing)
display(b_direct_b)
println("\n[B.2] lincomb Function:")
b_func_b = @benchmark lincomb($c1, $P1, nothing, $c2, $P2, nothing, $c3, $P3, nothing)
display(b_func_b)
return nothing
endResults: julia> run_comprehensive_benchmark(2)
██████████████████████████████████████████████████████████████████████
BENCHMARKING SYSTEM SIZE N = 2
██████████████████████████████████████████████████████████████████████
>>> CASE A: Standard PDS (d is AbstractArray)
[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 6.135 ns … 183.600 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 7.063 ns ┊ GC (median): 0.00%
Time (mean ± σ): 7.266 ns ± 3.170 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅ █ ▇ ▄ ▁
▇██▃█▃█▃█▆▃█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
6.14 ns Histogram: frequency by time 14.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 5.700 ns … 152.339 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.569 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.755 ns ± 2.887 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ █ ▁ ▂ ▅ ▆
▇▅▅█▄█▄█▆▇▅█▅█▃█▄█▄▃▇▃▇▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▂▂▂▂▂▂▁▁▂▂ ▃
5.7 ns Histogram: frequency by time 10.4 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)
[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 4.823 ns … 174.098 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.560 ns ┊ GC (median): 0.00%
Time (mean ± σ): 5.677 ns ± 2.636 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅ ▆ ▅ ▂ █
█▃▄█▃▃█▄▃█▆▃▃█▇▃▃▅▇▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▁▂▂▂▂▂▂▂▁▁▂▂▁▂▂▁▂ ▃
4.82 ns Histogram: frequency by time 9.31 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 4.823 ns … 70.316 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.562 ns ┊ GC (median): 0.00%
Time (mean ± σ): 5.758 ns ± 1.852 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂█▄▇▄▃▇▃▆█▄▃▆▃▂▁▁▁▁▁ ▃
█████████████████████▇▇█▇▇██▇██████▇▇▇▇▇▇▆▇▆▅▆▆▁▅▅▅▅▄▄▄▃▃▄ █
4.82 ns Histogram: log(frequency) by time 10.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> run_comprehensive_benchmark(3)
██████████████████████████████████████████████████████████████████████
BENCHMARKING SYSTEM SIZE N = 3
██████████████████████████████████████████████████████████████████████
>>> CASE A: Standard PDS (d is AbstractArray)
[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 9.181 ns … 153.478 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 10.074 ns ┊ GC (median): 0.00%
Time (mean ± σ): 10.712 ns ± 3.368 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▃▆▃▆▂▆▄▁▅▁ ▂
█████████████▇▇▆▅▆▅▄▆▆▆▇▇▇▅▆▇▇▇▇▆▇▆▆▅▅▆▂▆▅▆▅▅▆▄▅▅▅▅▄▃▃▄▂▄▃▂▃ █
9.18 ns Histogram: log(frequency) by time 24.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 8.308 ns … 175.174 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.814 ns ┊ GC (median): 0.00%
Time (mean ± σ): 9.438 ns ± 3.640 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂▇▃▂▆▂▁▅▁ ▇▂▁▄▅ ▂
███████████▆██████▇▇▆▆▆▄▄▅▄▅▅▃▅▅▃▄▄▄▂▅▅▄▄▅▄▅▃▃▄▄▄▄▃▄▃▄▂▄▂▄▄ █
8.31 ns Histogram: log(frequency) by time 16.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)
[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 6.126 ns … 166.082 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.728 ns ┊ GC (median): 0.00%
Time (mean ± σ): 7.217 ns ± 3.308 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂▇▁▆▁▅▅▁▇▁▁▅▁▁ ▂
███████████████▇▇▇▆▅▆▅▅▆▇▇▅▆▅▆▆▆▆▇▇▆▇▇▆▇▆▆▇▅▅▅▅▆▆▅▅▆▇▅▃▄▅▅▄ █
6.13 ns Histogram: log(frequency) by time 14.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 6.127 ns … 149.530 ns ┊ GC (min … max): 0.00% … 0.00% |
basic_patankar_step.|
The following shows the efficiency of the new using BenchmarkTools
using StaticArrays
using MuladdMacro
using Test
using LinearAlgebra
# --- IMPLEMENTATIONS ---
### OLD #################
function build_mprk_matrix_old(P, sigma, dt, d = nothing)
# re-use the in-place version implemented below
M = similar(P)
build_mprk_matrix!(M, P, sigma, dt, d)
if P isa StaticArray
return SMatrix(M)
else
return M
end
end
@muladd function build_mprk_matrix!(M, P, sigma, dt, d = nothing)
# M[i,i] = (sigma[i] + dt * d[i] + dt * sum_j≠i P[j,i]) / sigma[i]
# M[i,j] = -dt * P[i,j] / sigma[j]
# TODO: the performance of this can likely be improved
Base.require_one_based_indexing(M, P, sigma)
@assert size(M, 1) == size(M, 2) == size(P, 1) == size(P, 2) == length(sigma)
if !isnothing(d)
Base.require_one_based_indexing(d)
@assert length(sigma) == length(d)
end
zeroM = zero(eltype(P))
# Set sigma on diagonal
@inbounds for i in eachindex(sigma)
M[i, i] = sigma[i]
end
# Add nonconservative destruction terms to diagonal (PDSFunctions only!)
if !isnothing(d)
@inbounds for i in eachindex(d)
M[i, i] = M[i, i] + dt * d[i]
end
end
# Run through P and fill M accordingly.
# If P[i,j] ≠ 0 set M[i,j] = -dt*P[i,j] and add dt*P[i,j] to M[j,j].
@fastmath @inbounds @simd for I in CartesianIndices(P)
if I[1] != I[2]
if !iszero(P[I])
dtP = dt * P[I]
M[I] = -dtP / sigma[I[2]]
M[I[2], I[2]] += dtP
else
M[I] = zeroM
end
end
end
# Divide diagonal elements by Patankar weights denominators
@fastmath @inbounds @simd for i in eachindex(sigma)
M[i, i] /= sigma[i]
end
return nothing
end
### NEW #################
@inline function sum_offdiagonal_col(P::StaticMatrix{N, N, T}, j) where {N, T}
res = zero(T)
for i in 1:N
if i != j
res += P[i, j]
end
end
return res
end
@muladd @inline function build_mprk_matrix_new(P::StaticMatrix{N, N, T}, sigma, dt,
d = nothing) where {N, T}
return SMatrix{N, N, T}((i == j) ?
# diagonal: sum over P[k, i] where k != i
(sigma[i] +
dt *
(sum_offdiagonal_col(P, i) + (d === nothing ? zero(T) : d[i]))) /
sigma[i] :
# off-diagonal
-(dt * P[i, j]) / sigma[j]
for i in 1:N, j in 1:N)
end
# --- TEST & BENCHMARK SUITE ---
function run_benchmark(N)
println("\n" * "█"^70)
println(" COMPREHENSIVE BENCHMARK N = $N")
println("█"^70)
# Setup
P = rand(SMatrix{N, N})
sigma = rand(SVector{N})
d_vec = rand(SVector{N})
dt = 0.1
# Validation
println("Checking Correctness...")
M_old = build_mprk_matrix_old(P, sigma, dt, d_vec)
M_new = build_mprk_matrix_new(P, sigma, dt, d_vec)
M_old_cons = build_mprk_matrix_old(P, sigma, dt, nothing)
M_new_cons = build_mprk_matrix_new(P, sigma, dt, nothing)
@test M_old ≈ M_new
@test M_old_cons ≈ M_new_cons
println("✓ Results are identical.\n")
# Benchmarks
println(">>> CASE A: With destruction vector d")
println("\n[OLD VERSION]")
display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, $d_vec))
println("\n[NEW VERSION]")
display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, $d_vec))
println("\n" * "-"^70)
println(">>> CASE B: Conservative (d = nothing)")
println("\n[OLD VERSION]")
display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, nothing))
println("\n[NEW VERSION]")
display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, nothing))
endjulia> run_benchmark(5)
██████████████████████████████████████████████████████████████████████
COMPREHENSIVE BENCHMARK N = 5
██████████████████████████████████████████████████████████████████████
Checking Correctness...
✓ Results are identical.
>>> CASE A: With destruction vector d
[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 988 evaluations per sample.
Range (min … max): 39.109 ns … 11.639 μs ┊ GC (min … max): 0.00% … 99.12%
Time (median): 48.098 ns ┊ GC (median): 0.00%
Time (mean ± σ): 72.646 ns ± 214.892 ns ┊ GC (mean ± σ): 12.98% ± 6.77%
▇▇█▅▄▂▂▃▃▂▂ ▆▆▄▃▂▂▁▁▁▁▁▁ ▁ ▂
█████████████████████████████▇▇▇▆▆▅▆▆▆▅▄▆▆▆▆▅▅▅▅▄▅▄▅▃▄▄▄▄▄▄▄ █
39.1 ns Histogram: log(frequency) by time 232 ns <
Memory estimate: 208 bytes, allocs estimate: 1.
[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
Range (min … max): 13.862 ns … 183.346 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 13.916 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.972 ns ± 5.200 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▄▂▂▂▃▃▃ ▁▂▂ ▁▁ ▂ ▁ ▂ ▂ ▁
████████▇███▆████▆█▆█▆███▆▇▅▅▄▆▆▆▆▆▆▇▇▇▆▇▆▅▄▆▇▆▆▆▅▄▄▃▃▄▃▁▅▅▅ █
13.9 ns Histogram: log(frequency) by time 39.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
----------------------------------------------------------------------
>>> CASE B: Conservative (d = nothing)
[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 991 evaluations per sample.
Range (min … max): 36.367 ns … 11.648 μs ┊ GC (min … max): 0.00% … 98.85%
Time (median): 42.436 ns ┊ GC (median): 0.00%
Time (mean ± σ): 66.981 ns ± 206.688 ns ┊ GC (mean ± σ): 12.98% ± 6.91%
▅█▅▂▂▁▁▂▁ ▄▅▃▂▁▁▁▁▁ ▁
█████████▇▆▇███████████▇█▇▇▆▇▆▆▆▅▆▆▆▅▅▅▅▅▄▄▄▅▅▄▄▄▄▅▃▃▂▄▃▄▃▄▄ █
36.4 ns Histogram: log(frequency) by time 232 ns <
Memory estimate: 208 bytes, allocs estimate: 1.
[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
Range (min … max): 13.692 ns … 159.564 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 13.731 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.496 ns ± 4.104 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▄▃▃▂ ▂▂▁▃▃▁▂ ▁ ▁ ▂ ▁▁▂ ▁ ▂ ▁ ▁
█████▇███████▇█▅█▅██████▇█▆▆█▅▅█▆▆▆▅▅▅▆▅▄▄▅▁▅▅▅▅▆▆▇▆▇▆▆▆▅▆▅▅ █
13.7 ns Histogram: log(frequency) by time 32.3 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
|
The following shows the efficiency of the new lincomb! methods for in-place computations. One is for using BenchmarkTools
using StaticArrays
using SparseArrays
using FastBroadcast
using MuladdMacro
using Test
# --- GENERATED IMPLEMENTATION ---
@muladd @generated function lincomb!(dest::AbstractArray, pairs...)
n = length(pairs) ÷ 2
expr = :(pairs[1] * pairs[2])
for i in 2:n
expr = :($expr + pairs[$(2 * i - 1)] * pairs[$(2 * i)])
end
return quote
@.. broadcast=false dest=$expr
return dest
end
end
@muladd @generated function lincomb!(dest::SparseMatrixCSC, pairs...)
n = length(pairs) ÷ 2
nz_names = [Symbol("nz_", i) for i in 1:n]
setup_block = Expr(:block)
# We need to keep the structural nonzeros of the production terms.
# However, this is not guaranteed by broadcasting, see
# https://github.com/JuliaSparse/SparseArrays.jl/issues/190
for i in 1:n
push!(setup_block.args, :($(nz_names[i]) = nonzeros(pairs[$(2 * i)])))
end
expr = :(pairs[1] * $(nz_names[1]))
for i in 2:n
expr = :($expr + pairs[$(2 * i - 1)] * $(nz_names[i]))
end
return quote
$setup_block
nz_dest = nonzeros(dest)
@.. broadcast=false nz_dest=$expr
return dest
end
end
# --- BENCHMARK FUNCTION ---
function run_comprehensive_benchmark(N)
println("\n" * "█"^75)
println(" LINCOMB! FULL PERFORMANCE REPORT (N = $N, 3 Components)")
println("█"^75)
c1, c2, c3 = 0.5, 0.8, -0.2
# --- 1. DENSE CASE ---
println("\n>>> CASE 1: Standard Dense Arrays (Matrix{Float64})")
A1, A2 = rand(N, N), rand(N, N)
A3_old = rand(N, N)
A3_new = copy(A3_old)
# Validation
A3_old .= c1 .* A1 .+ c2 .* A2 .+ c3 .* A3_old
lincomb!(A3_new, c1, A1, c2, A2, c3, A3_new)
@test A3_old ≈ A3_new
println("✓ Correctness: OK")
println("\n[OLD] Manual Dense @..")
display(@benchmark $A3_old .= $c1 .* $A1 .+ $c2 .* $A2 .+ $c3 .* $A3_old)
println("\n[NEW] lincomb! Dense")
display(@benchmark lincomb!($A3_new, $c1, $A1, $c2, $A2, $c3, $A3_new))
# --- 2. SPARSE CASE ---
println("\n" * "-"^75)
println("\n>>> CASE 2: Sparse Matrices (SparseMatrixCSC)")
S_ref = sprand(N, N, 0.01)
S1 = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
S2 = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
S3_old = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
S3_new = copy(S3_old)
# Validation
nz1, nz2, nz3 = nonzeros(S1), nonzeros(S2), nonzeros(S3_old)
@.. broadcast=false nz3 = c1 * nz1 + c2 * nz2 + c3 * nz3
lincomb!(S3_new, c1, S1, c2, S2, c3, S3_new)
@test S3_old ≈ S3_new
println("✓ Correctness: OK")
println("\n[OLD] Manual Sparse nz-broadcast")
display(@benchmark begin
nz_1 = nonzeros($S1)
nz_2 = nonzeros($S2)
nz_3 = nonzeros($S3_old)
@.. broadcast=false nz_3 = $c1 * nz_1 + $c2 * nz_2 + $c3 * nz_3
end)
println("\n[NEW] lincomb! Sparse")
display(@benchmark lincomb!($S3_new, $c1, $S1, $c2, $S2, $c3, $S3_new))
endjulia> run_comprehensive_benchmark(1000)
███████████████████████████████████████████████████████████████████████████
LINCOMB! FULL PERFORMANCE REPORT (N = 1000, 3 Components)
███████████████████████████████████████████████████████████████████████████
>>> CASE 1: Standard Dense Arrays (Matrix{Float64})
✓ Correctness: OK
[OLD] Manual Dense @..
BenchmarkTools.Trial: 2111 samples with 1 evaluation per sample.
Range (min … max): 1.824 ms … 4.594 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.084 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.351 ms ± 596.946 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃
███▇▇▇▅▄▅▅▄▄▃▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁ ▂
1.82 ms Histogram: frequency by time 4.08 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
[NEW] lincomb! Dense
BenchmarkTools.Trial: 2434 samples with 1 evaluation per sample.
Range (min … max): 1.437 ms … 3.640 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.751 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.035 ms ± 544.079 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▁▆██▄
▂▁▂▃████████▇▇▅▄▄▃▃▂▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▃▂▂▂▂▃▃▄▄▅▄▃▃▃▃▃▃▃▃▄▄▄▃ ▃
1.44 ms Histogram: frequency by time 3.26 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------------------
>>> CASE 2: Sparse Matrices (SparseMatrixCSC)
✓ Correctness: OK
[OLD] Manual Sparse nz-broadcast
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
Range (min … max): 2.446 μs … 26.794 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.944 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.266 μs ± 1.174 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▅▅▇█▇▆▄▄▃▃▃▂▃▂▂▂▂▂▃▂▂▁▁ ▂
███████████████████████████▆█▇▆▇▆▆▅▆▅▆▅▅▅▆▆▄▅▅▄▅▄▄▄▃▃▂▂▄▄▄ █
2.45 μs Histogram: log(frequency) by time 8.1 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
[NEW] lincomb! Sparse
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
Range (min … max): 3.267 μs … 20.178 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.845 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.050 μs ± 1.044 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▄▁ ▆▅ █
████▇██▆█▆█▃▃▂▂▃▂▂▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
3.27 μs Histogram: frequency by time 8.24 μs <
Memory estimate: 0 bytes, allocs estimate: 0. |
JoshuaLampert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go through everything in detail (especially the definition of lincomb), but I like the idea of bundling common steps in functions and I trust CI that the replacements are correct. I only have one question about type stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like the idea of defining functions for commonly-used functionality. However, this PR is quite long, so it will take some time to review everything in detail.
Did you perform some performance benchmarks for full time integrations (with save_everystep = false)?
| end | ||
|
|
||
| ##################################################################### | ||
| # out-of-place for dense and static arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we re-add some comments like this one to inform readers of the code whether we are treating in-place or out-of-place parts? This helps when looking for possible performance issues caused by allocations (for in-place methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments for all auxiliary functions.
Take your time. Once this is pushed, implementing additional schemes will be significantly simplified. |
I'll do this. |
Co-authored-by: Joshua Lampert <[email protected]>

All MPRK and SSPMPRK schemes are build upon a basic Patankar step with varying Patankar matrices, destruction vectors and denominators. This PR introduces serveral functions (in-place and out-of-place) to make the current implementations more readable and simplify the implementation of new schemes in #107.