Skip to content

Replace Kokkos::single with parallel_scan in wet_dep compute_q_tendencies#515

Draft
Copilot wants to merge 1 commit intomainfrom
copilot/remove-kokkos-single-loop-again
Draft

Replace Kokkos::single with parallel_scan in wet_dep compute_q_tendencies#515
Copilot wants to merge 1 commit intomainfrom
copilot/remove-kokkos-single-loop-again

Conversation

Copy link

Copilot AI commented Mar 2, 2026

compute_q_tendencies in wet_dep.hpp contained several Kokkos::single(PerTeam) regions that serialized GPU execution across the entire team for each atmospheric column.

Key insight: exact parallel formula for precabs recurrence

The core precabs/precabc recurrence prec[k+1] = max(rain[k], prec[k] + net[k]) (where net = rain - evap) admits a closed-form prefix-scan solution:

prec[k] = cumnet[k] + max_{j≤k} D[j]
prec_base[k] = cumrain[k] - E_at_argmax_D[k]

where D[j] = rain[j-1] - cumnet[j] and E[j] = cumrain[j-1].

This yields a 4-component scan state (cumnet, cumrain, maxD, E_at_maxD) with an associative join:

void join(PrecScanState& dst, const PrecScanState& src) const {
  Real src_maxD_corrected = src.maxD - dst.cumnet;
  Real combined_cumnet    = dst.cumnet  + src.cumnet;
  Real combined_cumrain   = dst.cumrain + src.cumrain;
  if (dst.maxD >= src_maxD_corrected) {
    dst = {combined_cumnet, combined_cumrain, dst.maxD,             dst.E_at_maxD};
  } else {
    dst = {combined_cumnet, combined_cumrain, src_maxD_corrected, dst.cumrain + src.E_at_maxD};
  }
}

Changes

  • Kokkos::single at lines 1191, 1293 (precabs/precabc recurrence): replaced with Kokkos::parallel_scan using the PrecScanState functor above. Eliminates the bndd and precabs_base_tmp intermediate workspace arrays and the two subsequent redundant parallel_fors.

  • Kokkos::single at lines 1535, 1631 (scavabs/scavabc linear recurrence s[k+1] = a[k]*s[k] + b[k]): replaced with Kokkos::parallel_scan using affine-map composition (a₂,b₂)∘(a₁,b₁) = (a₂a₁, a₂b₁+b₂), which is associative.

  • Kokkos::single at lines 1265, 1360, 1569, 1664 (copy_from_prev propagation): replaced with Kokkos::parallel_scan using a PropState {Real value; int has_value} struct where join = "take right if has_value, else take left" — a standard segmented-fill scan.

  • New unit test in mam4_wet_deposition_unit_tests.cpp: runs both the original serial reference and the new parallel implementation over a representative atmospheric column, asserting element-wise relative error < 1e-8.

Original prompt

This section details on the original issue you should resolve

<issue_title>Remove Kokkos::single loop in wet_dep.hpp</issue_title>
<issue_description>## Description

In wet_dep.hpp, there is a Kokkos::single region that appears to serialize part of the WetDep kernel and may be contributing to GPU slowdown.

Location (single loop):

Kokkos::single(Kokkos::PerTeam(team), [=]() {

Loop to parallelize:
https://github.com/eagles-project/mam4xx/blob/79ebf961d2b03ad7cf6874d886865265fb72c5f7/src/mam4xx/wet_dep.hpp#L1195C7-L1195C41

Proposed approach

  • Investigate options to remove the Kokkos::single section.
  • Try rewriting the computation using Kokkos::parallel_scan (or another Kokkos pattern) to enable parallel execution on GPUs.
  • Create a unit test where both the serial block and the parallel block are executed, then compute the differences. The test passes if the differences are below a relative error threshold of 1e-8.

Acceptance criteria

No Kokkos::single in this kernel region.

</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Remove Kokkos::single loop in wet_dep.hpp Replace Kokkos::single with parallel_scan in wet_dep compute_q_tendencies Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove Kokkos::single loop in wet_dep.hpp

2 participants