Skip to content

Conversation

@SKopecz
Copy link
Collaborator

@SKopecz SKopecz commented Dec 29, 2025

All MPRK and SSPMPRK schemes are build upon a basic Patankar step with varying Patankar matrices, destruction vectors and denominators. This PR introduces the function basic_patankar_step (better name suggestions are welcome), which makes the implementations cleaner and can also be used to implement the multistep Patankar schemes in #107.

The PR also introduces evaluate_pds to evaluate the production matrix and destruction vector as well as lincomb to compute linear combinations of production matrices and destruction vectors. In addition, build_mprk_matrix for StaticMatrix is added.

  1. Out-of-place implementations
  • MPE
  • MPRK22
  • MPRK43I & MPRK43II
  • SSPMPRK22
  • SSPMPRK43
  1. In-place implementations
  • MPE
  • MPRK22
  • MPRK43I & MPRK43II
  • SSPMPRK22
  • SSPMPRK43

@SKopecz SKopecz marked this pull request as draft December 29, 2025 13:00
@codecov
Copy link

codecov bot commented Dec 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coveralls
Copy link

coveralls commented Dec 29, 2025

Pull Request Test Coverage Report for Build 20880257443

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 75 of 75 (100.0%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 97.67%

Files with Coverage Reduction New Missed Lines %
src/mprk.jl 1 98.21%
Totals Coverage Status
Change from base Build 20199034964: -0.1%
Covered Lines: 1970
Relevant Lines: 2017

💛 - Coveralls

@SKopecz SKopecz changed the title WIP: Simplified and clearer code using the function basic_patankar_step. WIP: Simplified and cleaner code using the function basic_patankar_step. Dec 29, 2025
@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 4, 2026

Introduced a function to compute linear combinations for out-of-place computations. It is implemented recursively, but internally unrolled.

using BenchmarkTools, StaticArrays 

# --- Base cases (End of recursion) ---

# For PDSFunctions (with destruction vectors)
@inline lincomb(c1::Number, P1, d1::AbstractArray) = (c1 .* P1, c1 .* d1)

# For ConservativePDSFunctions (without destruction vectors)
@inline lincomb(c1::Number, P1, d1::Nothing) = (c1 .* P1, nothing)


# --- Recursive steps ---

# For PDSFunctions: Processes triplets (coeff, P, d)
@inline function lincomb(c1::Number, P1, d1::AbstractArray, c2, P2, d2, args...)
    P_tail, d_tail = lincomb(c2, P2, d2, args...)
    return (c1 .* P1 .+ P_tail, c1 .* d1 .+ d_tail)
end

# For ConservativePDSFunctions: Processes triplets (coeff, P, nothing)
@inline function lincomb(c1::Number, P1, d1::Nothing, c2, P2, d2, args...)
    P_tail, _ = lincomb(c2, P2, d2, args...)
    return (c1 .* P1 .+ P_tail, nothing)
end

function run_comprehensive_benchmark(N)
    println("\n" * ""^70)
    println("  BENCHMARKING SYSTEM SIZE N = $N")
    println(""^70)

    # Setup test data (3 stages)
    P1, P2, P3 = rand(SMatrix{N,N}), rand(SMatrix{N,N}), rand(SMatrix{N,N})
    d1, d2, d3 = rand(SVector{N}), rand(SVector{N}), rand(SVector{N})
    c1, c2, c3 = 0.5, 0.3, 0.2

    # --- CASE A: Standard PDS (with destruction vectors) ---
    println("\n>>> CASE A: Standard PDS (d is AbstractArray)")
    
    println("\n[A.1] Direct Input (Manual):")
    b_direct_a = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, 
                             $c1 .* $d1 .+ $c2 .* $d2 .+ $c3 .* $d3)
    display(b_direct_a)

    println("\n[A.2] lincomb Function:")
    b_func_a = @benchmark lincomb($c1, $P1, $d1, $c2, $P2, $d2, $c3, $P3, $d3)
    display(b_func_a)


    # --- CASE B: Conservative PDS (d is nothing) ---
    println("\n" * "-"^70)
    println(">>> CASE B: Conservative PDS (d is nothing)")
    
    println("\n[B.1] Direct Input (Manual):")
    # In manual case, we only calculate P
    b_direct_b = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, nothing)
    display(b_direct_b)

    println("\n[B.2] lincomb Function:")
    b_func_b = @benchmark lincomb($c1, $P1, nothing, $c2, $P2, nothing, $c3, $P3, nothing)
    display(b_func_b)
    
    return nothing
end

Results:

julia> run_comprehensive_benchmark(2)

██████████████████████████████████████████████████████████████████████
  BENCHMARKING SYSTEM SIZE N = 2
██████████████████████████████████████████████████████████████████████

>>> CASE A: Standard PDS (d is AbstractArray)

[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  6.135 ns  183.600 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     7.063 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.266 ns ±   3.170 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅ █ ▇ ▄  ▁                                                 
  ▇██▃█▃█▃█▆▃█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
  6.14 ns         Histogram: frequency by time        14.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  5.700 ns  152.339 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.569 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.755 ns ±   2.887 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     █ █     ▁ ▂ ▅ ▆                                           
  ▇▅▅█▄█▄█▆▇▅█▅█▃█▄█▄▃▇▃▇▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▂▂▂▂▂▂▁▁▂▂ ▃
  5.7 ns          Histogram: frequency by time        10.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)

[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min  max):  4.823 ns  174.098 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.560 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.677 ns ±   2.636 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅  ▆  ▅  ▂   █                                               
  █▃▄█▃▃█▄▃█▆▃▃█▇▃▃▅▇▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▁▂▂▂▂▂▂▂▁▁▂▂▁▂▂▁▂ ▃
  4.82 ns         Histogram: frequency by time        9.31 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min  max):  4.823 ns  70.316 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.562 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.758 ns ±  1.852 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂█▄▇▄▃▇▃▆█▄▃▆▃▂▁▁▁▁▁                                      ▃
  █████████████████████▇▇█▇▇██▇██████▇▇▇▇▇▇▆▇▆▅▆▆▁▅▅▅▅▄▄▄▃▃▄ █
  4.82 ns      Histogram: log(frequency) by time     10.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> run_comprehensive_benchmark(3)

██████████████████████████████████████████████████████████████████████
  BENCHMARKING SYSTEM SIZE N = 3
██████████████████████████████████████████████████████████████████████

>>> CASE A: Standard PDS (d is AbstractArray)

[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):   9.181 ns  153.478 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     10.074 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.712 ns ±   3.368 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▃▆▃▆▂▆▄▁▅▁                                                 ▂
  █████████████▇▇▆▅▆▅▄▆▆▆▇▇▇▅▆▇▇▇▇▆▇▆▆▅▅▆▂▆▅▆▅▅▆▄▅▅▅▅▄▃▃▄▂▄▃▂▃ █
  9.18 ns       Histogram: log(frequency) by time      24.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  8.308 ns  175.174 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     8.814 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.438 ns ±   3.640 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂▇▃▂▆▂▁▅▁  ▇▂▁▄▅                                           ▂
  ███████████▆██████▇▇▆▆▆▄▄▅▄▅▅▃▅▅▃▄▄▄▂▅▅▄▄▅▄▅▃▃▄▄▄▄▃▄▃▄▂▄▂▄▄ █
  8.31 ns      Histogram: log(frequency) by time      16.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)

[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  6.126 ns  166.082 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.728 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.217 ns ±   3.308 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂▇▁▆▁▅▅▁▇▁▁▅▁▁                                             ▂
  ███████████████▇▇▇▆▅▆▅▅▆▇▇▅▆▅▆▆▆▆▇▇▆▇▇▆▇▆▆▇▅▅▅▅▆▆▅▅▆▇▅▃▄▅▅▄ █
  6.13 ns      Histogram: log(frequency) by time      14.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  6.127 ns  149.530 ns  ┊ GC (min  max): 0.00%  0.00%

@SKopecz SKopecz changed the title WIP: Simplified and cleaner code using the function basic_patankar_step. WIP: Simplified and cleaner code. Jan 9, 2026
@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 9, 2026

The following shows the efficiency of the new build_mprk_matrix method for StaticMatrix input.

using BenchmarkTools
using StaticArrays
using MuladdMacro
using Test
using LinearAlgebra

# --- IMPLEMENTATIONS ---
### OLD #################
function build_mprk_matrix_old(P, sigma, dt, d = nothing)
    # re-use the in-place version implemented below
    M = similar(P)
    build_mprk_matrix!(M, P, sigma, dt, d)

    if P isa StaticArray
        return SMatrix(M)
    else
        return M
    end
end
@muladd function build_mprk_matrix!(M, P, sigma, dt, d = nothing)
    # M[i,i] = (sigma[i] + dt * d[i] + dt * sum_j≠i P[j,i]) / sigma[i]
    # M[i,j] = -dt * P[i,j] / sigma[j]
    # TODO: the performance of this can likely be improved
    Base.require_one_based_indexing(M, P, sigma)
    @assert size(M, 1) == size(M, 2) == size(P, 1) == size(P, 2) == length(sigma)
    if !isnothing(d)
        Base.require_one_based_indexing(d)
        @assert length(sigma) == length(d)
    end

    zeroM = zero(eltype(P))

    # Set sigma on diagonal
    @inbounds for i in eachindex(sigma)
        M[i, i] = sigma[i]
    end
    # Add nonconservative destruction terms to diagonal (PDSFunctions only!)
    if !isnothing(d)
        @inbounds for i in eachindex(d)
            M[i, i] = M[i, i] + dt * d[i]
        end
    end

    # Run through P and fill M accordingly.
    # If P[i,j] ≠ 0 set M[i,j] = -dt*P[i,j] and add dt*P[i,j] to M[j,j].
    @fastmath @inbounds @simd for I in CartesianIndices(P)
        if I[1] != I[2]
            if !iszero(P[I])
                dtP = dt * P[I]
                M[I] = -dtP / sigma[I[2]]
                M[I[2], I[2]] += dtP
            else
                M[I] = zeroM
            end
        end
    end

    # Divide diagonal elements by Patankar weights denominators
    @fastmath @inbounds @simd for i in eachindex(sigma)
        M[i, i] /= sigma[i]
    end

    return nothing
end

### NEW #################
@inline function sum_offdiagonal_col(P::StaticMatrix{N, N, T}, j) where {N, T}
    res = zero(T)
    for i in 1:N
        if i != j
            res += P[i, j]
        end
    end
    return res
end
@muladd @inline function build_mprk_matrix_new(P::StaticMatrix{N, N, T}, sigma, dt,
                                           d = nothing) where {N, T}
    return SMatrix{N, N, T}((i == j) ?
                            # diagonal: sum over P[k, i] where k != i
                            one(T) + (dt / sigma[i]) * (sum_offdiagonal_col(P, i) + (d === nothing ? zero(T) : d[i])) :
                            # off-diagonal
                            -(dt / sigma[j]) * P[i, j]
                            for i in 1:N, j in 1:N)
end

# --- TEST & BENCHMARK SUITE ---

function run_benchmark(N)
    println("\n" * ""^70)
    println("  COMPREHENSIVE BENCHMARK N = $N")
    println(""^70)

    # Setup
    P = rand(SMatrix{N, N})
    sigma = rand(SVector{N})
    d_vec = rand(SVector{N})
    dt = 0.1

    # Validation
    println("Checking Correctness...")
    M_old = build_mprk_matrix_old(P, sigma, dt, d_vec)
    M_new = build_mprk_matrix_new(P, sigma, dt, d_vec)
    M_old_cons = build_mprk_matrix_old(P, sigma, dt, nothing)
    M_new_cons = build_mprk_matrix_new(P, sigma, dt, nothing)

    @test M_old  M_new
    @test M_old_cons  M_new_cons
    println("✓ Results are identical.\n")

    # Benchmarks
    println(">>> CASE A: With destruction vector d")
    println("\n[OLD VERSION]")
    display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, $d_vec))
    
    println("\n[NEW VERSION]")
    display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, $d_vec))

    println("\n" * "-"^70)
    println(">>> CASE B: Conservative (d = nothing)")
    
    println("\n[OLD VERSION]")
    display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, nothing))
    
    println("\n[NEW VERSION]")
    display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, nothing))
end
julia> run_benchmark(3)

██████████████████████████████████████████████████████████████████████
  COMPREHENSIVE BENCHMARK N = 3
██████████████████████████████████████████████████████████████████████
Checking Correctness...
✓ Results are identical.

>>> CASE A: With destruction vector d

[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 993 evaluations per sample.
 Range (min  max):  21.847 ns    7.403 μs  ┊ GC (min  max):  0.00%  98.87%
 Time  (median):     36.851 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   41.136 ns ± 166.041 ns  ┊ GC (mean ± σ):  11.62% ±  3.11%

  ▂█▂                                                           
  ███▅▃▂▂▂▂▂▂▁▂▄▅▇█▅▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  21.8 ns         Histogram: frequency by time         87.7 ns <

 Memory estimate: 80 bytes, allocs estimate: 1.

[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  4.907 ns  159.538 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.988 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.444 ns ±   2.391 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                            
  █▇▄▃▅▃▂▆▅▄▂▆▃▂▂▁▄▁▃▁▃▁▂▂▁▃▁▄▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  4.91 ns         Histogram: frequency by time        12.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative (d = nothing)

[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 995 evaluations per sample.
 Range (min  max):  19.521 ns    7.131 μs  ┊ GC (min  max):  0.00%  99.04%
 Time  (median):     31.826 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   36.613 ns ± 168.535 ns  ┊ GC (mean ± σ):  13.28% ±  3.11%

  ▇█▇▅▄▃▂▁▁▁▁▁▁    ▄▅▆▆▆▅▄▃▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁    ▁             ▂
  ███████████████▆██████████████████████████████████▇█▇▇▅▆▆▆▄▄ █
  19.5 ns       Histogram: log(frequency) by time      69.5 ns <

 Memory estimate: 80 bytes, allocs estimate: 1.

[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  4.936 ns  35.389 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.805 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.235 ns ±  1.712 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                           
  ██▃▄▃▄▆▃▄▂▅▃▆▃▂▅▂▄▂▂▃▂▄▃▂▃▂▂▄▂▂▂▄▂▂▃▂▂▂▄▂▂▂▄▃▂▂▂▄▂▂▂▂▁▂▂▂▂ ▃
  4.94 ns        Histogram: frequency by time        9.58 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants