Skip to content

Conversation

@daverumph
Copy link
Contributor

Sedimentation and the microphysics tracers loop were affected. Three new scratch variables were introduced.

To-do

Profile the advection kernels from manual_sparse_jacobian.jl:1041 and 1051 to see
what can be done to speed those up.

Content

nsys profiling revealed that four kernels were taking up over 25% of the total time running
prognostic 1M. With these changes, they account for a few percent of the changes, and overall
step times dropped about 33%.
Because this is just a performance rewrite, there is no new functionality, and no new tests
are needed.


  • I have read and checked the items on the review checklist.

…xpressions

Sedimentation and the microphysics tracers loop were affected
Three new scratch variables were introduced.
@imreddyTeja
Copy link
Member

The JET failures are expected here. This is something @haakon-e ran into as well with #4141. On my list of todos is to make a mwe of this and open an issue in ClimaCore. I don't think this is an issue on gpu (inference behaves differently), but it might be a good idea to check this PR degrades cpu performance.

@imreddyTeja
Copy link
Member

Tested on clima with amip_progedmf_1m_land_he16.yml
Before this PR reported final sypd: 0.0745
After this PR final SYPD: 0.100

Before this PR walltime per step: "1 second, 159 milliseconds"
After this PR walltime per step: "854 milliseconds"
new step takes ~73.7% of step before changes
expected sypd increase: 1/(73.7%) = 35.7%
expected new sypd: 0.0745 * 135.7 % = 0.101

Copy link
Member

@imreddyTeja imreddyTeja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these changes are covered by the reproducibility tests. We should probably add a reproducibility test that covers these changes, or manually verify that results not change.

Comment on lines +125 to +128
ᶜadvection_matrix_2 = similar(
Y.c,
BidiagonalMatrixRow{Adjoint{FT, C3{FT}}},
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use ᶜbidiagonal_adjoint_matrix_c3 instead of making this new scratch var?

},
),
ᶠtracer_advection = similar(Y.f, BidiagonalMatrixRow{Adjoint{FT, C3{FT}}}),
ᶠtracer_advection_upwind = similar(Y.f, TridiagonalMatrixRow{FT}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use ᶠtridiagonal_matrix_c3 instead?

Comment on lines +1094 to +1095
ᶠsed_tracer_advection =
@. DiagonalMatrixRow(ᶠinterp(ᶜρʲs.:(1) * ᶜJ) / ᶠJ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ᶠsed_tracer_advection =
@. DiagonalMatrixRow(ᶠinterp(ᶜρʲs.:(1) * ᶜJ) / ᶠJ)
@. ᶠsed_tracer_advection =
DiagonalMatrixRow(ᶠinterp(ᶜρʲs.:(1) * ᶜJ) / ᶠJ)


# pull common subexpressions that don't depend on which
# tracer out of the tracer loop for performance
ᶠtracer_advection = @. -(ᶜadvdivᵥ_matrix())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ᶠtracer_advection = @. -(ᶜadvdivᵥ_matrix())
@. ᶠtracer_advection = -(ᶜadvdivᵥ_matrix())

Comment on lines +1370 to +1371
ᶠtracer_advection_upwind =
@. ᶠtracer_advection ᶠset_tracer_upwind_matrix_bcs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ᶠtracer_advection_upwind =
@. ᶠtracer_advection ᶠset_tracer_upwind_matrix_bcs(
@. ᶠtracer_advection_upwind =
ᶠtracer_advection ᶠset_tracer_upwind_matrix_bcs(

@daverumph daverumph force-pushed the dr/gpu_perf/jacobian_advection_split branch from 860b40f to 0ea1d03 Compare December 17, 2025 00:52
@imreddyTeja
Copy link
Member

I just checked the plots produced by the coupler 1M prognostic edmf benchmark config, and they are visually identical to before the change.

@daverumph daverumph closed this Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants