-
Notifications
You must be signed in to change notification settings - Fork 27
Improve GPU performance of update_jacobian by pulling out subexpressions #4170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…xpressions Sedimentation and the microphysics tracers loop were affected Three new scratch variables were introduced.
|
The JET failures are expected here. This is something @haakon-e ran into as well with #4141. On my list of todos is to make a mwe of this and open an issue in ClimaCore. I don't think this is an issue on gpu (inference behaves differently), but it might be a good idea to check this PR degrades cpu performance. |
|
Tested on clima with Before this PR walltime per step: "1 second, 159 milliseconds" |
imreddyTeja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these changes are covered by the reproducibility tests. We should probably add a reproducibility test that covers these changes, or manually verify that results not change.
| ᶜadvection_matrix_2 = similar( | ||
| Y.c, | ||
| BidiagonalMatrixRow{Adjoint{FT, C3{FT}}}, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use ᶜbidiagonal_adjoint_matrix_c3 instead of making this new scratch var?
| }, | ||
| ), | ||
| ᶠtracer_advection = similar(Y.f, BidiagonalMatrixRow{Adjoint{FT, C3{FT}}}), | ||
| ᶠtracer_advection_upwind = similar(Y.f, TridiagonalMatrixRow{FT}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use ᶠtridiagonal_matrix_c3 instead?
| ᶠsed_tracer_advection = | ||
| @. DiagonalMatrixRow(ᶠinterp(ᶜρʲs.:(1) * ᶜJ) / ᶠJ) ⋅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ᶠsed_tracer_advection = | |
| @. DiagonalMatrixRow(ᶠinterp(ᶜρʲs.:(1) * ᶜJ) / ᶠJ) ⋅ | |
| @. ᶠsed_tracer_advection = | |
| DiagonalMatrixRow(ᶠinterp(ᶜρʲs.:(1) * ᶜJ) / ᶠJ) ⋅ |
|
|
||
| # pull common subexpressions that don't depend on which | ||
| # tracer out of the tracer loop for performance | ||
| ᶠtracer_advection = @. -(ᶜadvdivᵥ_matrix()) ⋅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ᶠtracer_advection = @. -(ᶜadvdivᵥ_matrix()) ⋅ | |
| @. ᶠtracer_advection = -(ᶜadvdivᵥ_matrix()) ⋅ |
| ᶠtracer_advection_upwind = | ||
| @. ᶠtracer_advection ⋅ ᶠset_tracer_upwind_matrix_bcs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ᶠtracer_advection_upwind = | |
| @. ᶠtracer_advection ⋅ ᶠset_tracer_upwind_matrix_bcs( | |
| @. ᶠtracer_advection_upwind = | |
| ᶠtracer_advection ⋅ ᶠset_tracer_upwind_matrix_bcs( |
860b40f to
0ea1d03
Compare
|
I just checked the plots produced by the coupler 1M prognostic edmf benchmark config, and they are visually identical to before the change. |
Sedimentation and the microphysics tracers loop were affected. Three new scratch variables were introduced.
To-do
Profile the advection kernels from manual_sparse_jacobian.jl:1041 and 1051 to see
what can be done to speed those up.
Content
nsys profiling revealed that four kernels were taking up over 25% of the total time running
prognostic 1M. With these changes, they account for a few percent of the changes, and overall
step times dropped about 33%.
Because this is just a performance rewrite, there is no new functionality, and no new tests
are needed.