Skip to content

Implement common parallel functionality for trixi-framework#66

Draft
vchuravy wants to merge 7 commits intomainfrom
vc/parallelism
Draft

Implement common parallel functionality for trixi-framework#66
vchuravy wants to merge 7 commits intomainfrom
vc/parallelism

Conversation

@vchuravy
Copy link
Copy Markdown
Member

Heavily based on @efaulhaber infrastructure in PointNeighbors.jl

During Trudi @efaulhaber, @benegee and I were discussing how to reduce the code duplication currently in trixi-framework/Trixi.jl#2590

Currently we have a lot of fairly simple kernel like:

function calc_volume_integral!(backend::Backend, du, u, mesh,
                               have_nonconservative_terms, equations,
                               volume_integral, dg::DGSEM, cache)
    nelements(dg, cache) == 0 && return nothing
    kernel! = volume_integral_KAkernel!(backend)
    kernel_cache = kernel_filter_cache(cache)
    kernel!(du, u, typeof(mesh), have_nonconservative_terms, equations,
            volume_integral, dg, kernel_cache,
            ndrange = nelements(dg, cache))
    return nothing
end

@kernel function volume_integral_KAkernel!(du, u, meshT,
                                           have_nonconservative_terms, equations,
                                           volume_integral, dg::DGSEM, cache)
    element = @index(Global)
    volume_integral_kernel!(du, u, element, meshT, have_nonconservative_terms,
                            equations, volume_integral, dg, cache)
end

With the old code looking more like


function calc_volume_integral!(backend::Nothing, du, u, mesh,
                               have_nonconservative_terms, equations,
                               volume_integral, dg::DGSEM, cache)
    @threaded for element in eachelement(dg, cache)
        volume_integral_kernel!(du, u, element, typeof(mesh),
                                have_nonconservative_terms, equations,
                                volume_integral, dg, cache)
    end

    return nothing
end

@efaulhaber has been using a macro of this style (and a clear evolution of Trixi own @threaded) to great effect,
so it would make sense to have one shared definition.

Considerations:

  1. Is this the right place for the code to live? Or is TrixiBase to "low" and does not want the additional dependencies.
  2. I would like to have the Polyester code being an extension, but the default_backend definition makes that non-sensical

AcceleratedKernels.jl has a very similar definition in foreachindex, but also has the parallel mapreduce we would need in Trixi.

https://github.com/JuliaGPU/AcceleratedKernels.jl/blob/e641c281f9373825f9852fb84457c608b6074586/src/foreachindex.jl#L122

I could see this definition living in KernelAbstractions (but it would then kinda conflict with AcceleratedKernels, and the Polyester dependency is a no-go).

Putting this up as a draft so that we can mull about it.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 79.16667% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.52%. Comparing base (9fb93f7) to head (29b11f5).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/parallel.jl 86.84% 5 Missing ⚠️
test/test_parallel.jl 50.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #66      +/-   ##
==========================================
- Coverage   94.56%   92.52%   -2.05%     
==========================================
  Files           5        7       +2     
  Lines         313      361      +48     
==========================================
+ Hits          296      334      +38     
- Misses         17       27      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@efaulhaber
Copy link
Copy Markdown
Member

I don't think it makes sense to include default_backend. In TrixiParticles.jl, we basically only use that for preprocessing (sampling STL files) when we don't yet have a semidiscretization. Everywhere else, we use @threaded semi and defined

@inline function PointNeighbors.parallel_foreach(f, iterator, semi::Semidiscretization)
    PointNeighbors.parallel_foreach(f, iterator, semi.parallelization_backend)
end

so that the semidiscretization defines the backend.

And then you can make the PolyesterBackend a package extension.

@vchuravy
Copy link
Copy Markdown
Member Author

To keep dependencies small, there could also be an argument to make KA an extension.

@sloede
Copy link
Copy Markdown
Member

sloede commented Mar 20, 2026

Thanks a lot for this push! I haven't looked at the implementation yet, but I very much like the idea of having a sensible default approach for multithreading and offloading to accelerators for all our packages, such that improvements/extensions directly benefit all downstream packages.

Regarding dependencies and where to put them: I'd like to make using GPUs as easy as using standard Trixi{P,SW,A}.jl. Having to load additional packages is somewhat counter to that goal. On the other hand, if we could define a package "TrixiBoost.jl" (or "TrixiWarp.jl" or "TrixiNitro.jl") that captures all the necessary additional dependencies and the common code. These are just some thoughts I had, I do not have a definite answer.

On a more procedural note: Would it make sense to keep this PR as a reference for the parallel implementations for now, and create a new PR to Trixi.jl that converts the GPU implementation in trixi-framework/Trixi.jl#2590 (hopefully merged soon) to this form of acceleration? My thoughts are that this would help us better understand if the same form works well for both TrixiParticles.jl and Trixi.jl, and if yes, then go ahead and tackle the detailed question of whether this should live here or in another - potentially optional - upstream package.

@efaulhaber
Copy link
Copy Markdown
Member

Having to load additional packages is somewhat counter to that goal.

If Polyester and KA are not added as dependencies and only in package extensions, then you only have to write using TrixiBase, Polyester, KernelAbstractions in src/Trixi.jl. That doesn't seem like a disadvantage to me. Especially since this means that any Trixi*.jl package can decide at any point to drop the Polyester dependency and make Julia Threads the default. Even further, if a package doesn't support GPUs (yet), they can avoid the KA dependency.

if the same form works well for both TrixiParticles.jl and Trixi.jl

Why would it not? This is just a convenient way to write kernels. Even if you want to thread over the nodes and not the elements, you can simply do something like this:

iterator = CartesianIndices((1:n_elements, 1:n_nodes_per_element))
@par backend for i in iterator
    element, node = Tuple(i)
    ...
end

@JoshuaLampert
Copy link
Copy Markdown
Member

Even further, if a package doesn't support GPUs (yet), they can avoid the KA dependency.

I agree. I have some projects, where I depend on TrixiBase.jl and which do not support GPUs. I would like to avoid (implicitly) depending on KA.jl for these projects. Therefore, I would prefer an extension, too.

@sloede
Copy link
Copy Markdown
Member

sloede commented Mar 20, 2026

Thanks for the clarification. I misunderstood the extension part and thought it would require to do using Trixi, KernelAbstractions on the user side. But Erik and Valentin enlightened me, so it's all good :-)

@sloede
Copy link
Copy Markdown
Member

sloede commented Mar 20, 2026

IMHO we also need to come up with a better name than the original, historically chosen @threaded for the macro. @par is a little too generic for my taste, while a more descriptive name such as @intra_node_parallelization is unacceptably long. Maybe we can find a good middle ground :-)

@vchuravy
Copy link
Copy Markdown
Member Author

have some projects, where I depend on TrixiBase.jl and which do not support GPUs. I would like to avoid (implicitly) depending on KA.jl for these projects. Therefore, I would prefer an extension, too.

KernelAbstractions is intentionally a fairly light dependency, but I understand the sentiment.

The only problem is the definition of const ParallelizationBackend = Union{AbstractThreadingBackend, KernelAbstractions.Backend} which this provides for easier dispatch..

@efaulhaber
Copy link
Copy Markdown
Member

The only problem is the definition of const ParallelizationBackend = Union{AbstractThreadingBackend, KernelAbstractions.Backend} which this provides for easier dispatch..

We don't actually use this in TrixiP and PointNeighbors other than to restrict functions to not take any other types where a parallelization backend is required, but parallel_foreach would fail anyway, so we don't need this at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants