-
Notifications
You must be signed in to change notification settings - Fork 6
WIP: Simplified and cleaner code. #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Pull Request Test Coverage Report for Build 20880257443Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
basic_patankar_step.basic_patankar_step.
|
Introduced a function to compute linear combinations for out-of-place computations. It is implemented recursively, but internally unrolled. using BenchmarkTools, StaticArrays
# --- Base cases (End of recursion) ---
# For PDSFunctions (with destruction vectors)
@inline lincomb(c1::Number, P1, d1::AbstractArray) = (c1 .* P1, c1 .* d1)
# For ConservativePDSFunctions (without destruction vectors)
@inline lincomb(c1::Number, P1, d1::Nothing) = (c1 .* P1, nothing)
# --- Recursive steps ---
# For PDSFunctions: Processes triplets (coeff, P, d)
@inline function lincomb(c1::Number, P1, d1::AbstractArray, c2, P2, d2, args...)
P_tail, d_tail = lincomb(c2, P2, d2, args...)
return (c1 .* P1 .+ P_tail, c1 .* d1 .+ d_tail)
end
# For ConservativePDSFunctions: Processes triplets (coeff, P, nothing)
@inline function lincomb(c1::Number, P1, d1::Nothing, c2, P2, d2, args...)
P_tail, _ = lincomb(c2, P2, d2, args...)
return (c1 .* P1 .+ P_tail, nothing)
end
function run_comprehensive_benchmark(N)
println("\n" * "█"^70)
println(" BENCHMARKING SYSTEM SIZE N = $N")
println("█"^70)
# Setup test data (3 stages)
P1, P2, P3 = rand(SMatrix{N,N}), rand(SMatrix{N,N}), rand(SMatrix{N,N})
d1, d2, d3 = rand(SVector{N}), rand(SVector{N}), rand(SVector{N})
c1, c2, c3 = 0.5, 0.3, 0.2
# --- CASE A: Standard PDS (with destruction vectors) ---
println("\n>>> CASE A: Standard PDS (d is AbstractArray)")
println("\n[A.1] Direct Input (Manual):")
b_direct_a = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3,
$c1 .* $d1 .+ $c2 .* $d2 .+ $c3 .* $d3)
display(b_direct_a)
println("\n[A.2] lincomb Function:")
b_func_a = @benchmark lincomb($c1, $P1, $d1, $c2, $P2, $d2, $c3, $P3, $d3)
display(b_func_a)
# --- CASE B: Conservative PDS (d is nothing) ---
println("\n" * "-"^70)
println(">>> CASE B: Conservative PDS (d is nothing)")
println("\n[B.1] Direct Input (Manual):")
# In manual case, we only calculate P
b_direct_b = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, nothing)
display(b_direct_b)
println("\n[B.2] lincomb Function:")
b_func_b = @benchmark lincomb($c1, $P1, nothing, $c2, $P2, nothing, $c3, $P3, nothing)
display(b_func_b)
return nothing
endResults: julia> run_comprehensive_benchmark(2)
██████████████████████████████████████████████████████████████████████
BENCHMARKING SYSTEM SIZE N = 2
██████████████████████████████████████████████████████████████████████
>>> CASE A: Standard PDS (d is AbstractArray)
[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 6.135 ns … 183.600 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 7.063 ns ┊ GC (median): 0.00%
Time (mean ± σ): 7.266 ns ± 3.170 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅ █ ▇ ▄ ▁
▇██▃█▃█▃█▆▃█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
6.14 ns Histogram: frequency by time 14.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 5.700 ns … 152.339 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.569 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.755 ns ± 2.887 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ █ ▁ ▂ ▅ ▆
▇▅▅█▄█▄█▆▇▅█▅█▃█▄█▄▃▇▃▇▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▂▂▂▂▂▂▁▁▂▂ ▃
5.7 ns Histogram: frequency by time 10.4 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)
[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 4.823 ns … 174.098 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.560 ns ┊ GC (median): 0.00%
Time (mean ± σ): 5.677 ns ± 2.636 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅ ▆ ▅ ▂ █
█▃▄█▃▃█▄▃█▆▃▃█▇▃▃▅▇▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▁▂▂▂▂▂▂▂▁▁▂▂▁▂▂▁▂ ▃
4.82 ns Histogram: frequency by time 9.31 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 4.823 ns … 70.316 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.562 ns ┊ GC (median): 0.00%
Time (mean ± σ): 5.758 ns ± 1.852 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂█▄▇▄▃▇▃▆█▄▃▆▃▂▁▁▁▁▁ ▃
█████████████████████▇▇█▇▇██▇██████▇▇▇▇▇▇▆▇▆▅▆▆▁▅▅▅▅▄▄▄▃▃▄ █
4.82 ns Histogram: log(frequency) by time 10.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> run_comprehensive_benchmark(3)
██████████████████████████████████████████████████████████████████████
BENCHMARKING SYSTEM SIZE N = 3
██████████████████████████████████████████████████████████████████████
>>> CASE A: Standard PDS (d is AbstractArray)
[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 9.181 ns … 153.478 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 10.074 ns ┊ GC (median): 0.00%
Time (mean ± σ): 10.712 ns ± 3.368 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▃▆▃▆▂▆▄▁▅▁ ▂
█████████████▇▇▆▅▆▅▄▆▆▆▇▇▇▅▆▇▇▇▇▆▇▆▆▅▅▆▂▆▅▆▅▅▆▄▅▅▅▅▄▃▃▄▂▄▃▂▃ █
9.18 ns Histogram: log(frequency) by time 24.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 8.308 ns … 175.174 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.814 ns ┊ GC (median): 0.00%
Time (mean ± σ): 9.438 ns ± 3.640 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂▇▃▂▆▂▁▅▁ ▇▂▁▄▅ ▂
███████████▆██████▇▇▆▆▆▄▄▅▄▅▅▃▅▅▃▄▄▄▂▅▅▄▄▅▄▅▃▃▄▄▄▄▃▄▃▄▂▄▂▄▄ █
8.31 ns Histogram: log(frequency) by time 16.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)
[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 6.126 ns … 166.082 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.728 ns ┊ GC (median): 0.00%
Time (mean ± σ): 7.217 ns ± 3.308 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂▇▁▆▁▅▅▁▇▁▁▅▁▁ ▂
███████████████▇▇▇▆▅▆▅▅▆▇▇▅▆▅▆▆▆▆▇▇▆▇▇▆▇▆▆▇▅▅▅▅▆▆▅▅▆▇▅▃▄▅▅▄ █
6.13 ns Histogram: log(frequency) by time 14.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 6.127 ns … 149.530 ns ┊ GC (min … max): 0.00% … 0.00% |
basic_patankar_step.|
The following shows the efficiency of the new using BenchmarkTools
using StaticArrays
using MuladdMacro
using Test
using LinearAlgebra
# --- IMPLEMENTATIONS ---
### OLD #################
function build_mprk_matrix_old(P, sigma, dt, d = nothing)
# re-use the in-place version implemented below
M = similar(P)
build_mprk_matrix!(M, P, sigma, dt, d)
if P isa StaticArray
return SMatrix(M)
else
return M
end
end
@muladd function build_mprk_matrix!(M, P, sigma, dt, d = nothing)
# M[i,i] = (sigma[i] + dt * d[i] + dt * sum_j≠i P[j,i]) / sigma[i]
# M[i,j] = -dt * P[i,j] / sigma[j]
# TODO: the performance of this can likely be improved
Base.require_one_based_indexing(M, P, sigma)
@assert size(M, 1) == size(M, 2) == size(P, 1) == size(P, 2) == length(sigma)
if !isnothing(d)
Base.require_one_based_indexing(d)
@assert length(sigma) == length(d)
end
zeroM = zero(eltype(P))
# Set sigma on diagonal
@inbounds for i in eachindex(sigma)
M[i, i] = sigma[i]
end
# Add nonconservative destruction terms to diagonal (PDSFunctions only!)
if !isnothing(d)
@inbounds for i in eachindex(d)
M[i, i] = M[i, i] + dt * d[i]
end
end
# Run through P and fill M accordingly.
# If P[i,j] ≠ 0 set M[i,j] = -dt*P[i,j] and add dt*P[i,j] to M[j,j].
@fastmath @inbounds @simd for I in CartesianIndices(P)
if I[1] != I[2]
if !iszero(P[I])
dtP = dt * P[I]
M[I] = -dtP / sigma[I[2]]
M[I[2], I[2]] += dtP
else
M[I] = zeroM
end
end
end
# Divide diagonal elements by Patankar weights denominators
@fastmath @inbounds @simd for i in eachindex(sigma)
M[i, i] /= sigma[i]
end
return nothing
end
### NEW #################
@inline function sum_offdiagonal_col(P::StaticMatrix{N, N, T}, j) where {N, T}
res = zero(T)
for i in 1:N
if i != j
res += P[i, j]
end
end
return res
end
@muladd @inline function build_mprk_matrix_new(P::StaticMatrix{N, N, T}, sigma, dt,
d = nothing) where {N, T}
return SMatrix{N, N, T}((i == j) ?
# diagonal: sum over P[k, i] where k != i
one(T) + (dt / sigma[i]) * (sum_offdiagonal_col(P, i) + (d === nothing ? zero(T) : d[i])) :
# off-diagonal
-(dt / sigma[j]) * P[i, j]
for i in 1:N, j in 1:N)
end
# --- TEST & BENCHMARK SUITE ---
function run_benchmark(N)
println("\n" * "█"^70)
println(" COMPREHENSIVE BENCHMARK N = $N")
println("█"^70)
# Setup
P = rand(SMatrix{N, N})
sigma = rand(SVector{N})
d_vec = rand(SVector{N})
dt = 0.1
# Validation
println("Checking Correctness...")
M_old = build_mprk_matrix_old(P, sigma, dt, d_vec)
M_new = build_mprk_matrix_new(P, sigma, dt, d_vec)
M_old_cons = build_mprk_matrix_old(P, sigma, dt, nothing)
M_new_cons = build_mprk_matrix_new(P, sigma, dt, nothing)
@test M_old ≈ M_new
@test M_old_cons ≈ M_new_cons
println("✓ Results are identical.\n")
# Benchmarks
println(">>> CASE A: With destruction vector d")
println("\n[OLD VERSION]")
display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, $d_vec))
println("\n[NEW VERSION]")
display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, $d_vec))
println("\n" * "-"^70)
println(">>> CASE B: Conservative (d = nothing)")
println("\n[OLD VERSION]")
display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, nothing))
println("\n[NEW VERSION]")
display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, nothing))
endjulia> run_benchmark(3)
██████████████████████████████████████████████████████████████████████
COMPREHENSIVE BENCHMARK N = 3
██████████████████████████████████████████████████████████████████████
Checking Correctness...
✓ Results are identical.
>>> CASE A: With destruction vector d
[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 993 evaluations per sample.
Range (min … max): 21.847 ns … 7.403 μs ┊ GC (min … max): 0.00% … 98.87%
Time (median): 36.851 ns ┊ GC (median): 0.00%
Time (mean ± σ): 41.136 ns ± 166.041 ns ┊ GC (mean ± σ): 11.62% ± 3.11%
▂█▂
███▅▃▂▂▂▂▂▂▁▂▄▅▇█▅▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
21.8 ns Histogram: frequency by time 87.7 ns <
Memory estimate: 80 bytes, allocs estimate: 1.
[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 4.907 ns … 159.538 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.988 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.444 ns ± 2.391 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▇▄▃▅▃▂▆▅▄▂▆▃▂▂▁▄▁▃▁▃▁▂▂▁▃▁▄▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
4.91 ns Histogram: frequency by time 12.5 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
----------------------------------------------------------------------
>>> CASE B: Conservative (d = nothing)
[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 995 evaluations per sample.
Range (min … max): 19.521 ns … 7.131 μs ┊ GC (min … max): 0.00% … 99.04%
Time (median): 31.826 ns ┊ GC (median): 0.00%
Time (mean ± σ): 36.613 ns ± 168.535 ns ┊ GC (mean ± σ): 13.28% ± 3.11%
▇█▇▅▄▃▂▁▁▁▁▁▁ ▄▅▆▆▆▅▄▃▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁ ▁ ▂
███████████████▆██████████████████████████████████▇█▇▇▅▆▆▆▄▄ █
19.5 ns Histogram: log(frequency) by time 69.5 ns <
Memory estimate: 80 bytes, allocs estimate: 1.
[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 4.936 ns … 35.389 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.805 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.235 ns ± 1.712 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
██▃▄▃▄▆▃▄▂▅▃▆▃▂▅▂▄▂▂▃▂▄▃▂▃▂▂▄▂▂▂▄▂▂▃▂▂▂▄▂▂▂▄▃▂▂▂▄▂▂▂▂▁▂▂▂▂ ▃
4.94 ns Histogram: frequency by time 9.58 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
All MPRK and SSPMPRK schemes are build upon a basic Patankar step with varying Patankar matrices, destruction vectors and denominators. This PR introduces the function
basic_patankar_step(better name suggestions are welcome), which makes the implementations cleaner and can also be used to implement the multistep Patankar schemes in #107.The PR also introduces
evaluate_pdsto evaluate the production matrix and destruction vector as well aslincombto compute linear combinations of production matrices and destruction vectors. In addition,build_mprk_matrixforStaticMatrixis added.