Fickle KGO-fail in lfric_apps #567
MichaelWhitall
started this conversation in
LFRic
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
Can anyone suggest a way to debug / avoid an unexpected change in KGO that seems to stem from the Cray compiler settings used in the lfric_atm build?
This is really a continuation of an earlier discussion:
How to get more detailed output from the LFRic compile using CCE-fast?
But the question has now changed...
To recap, I have a big branch that really needs to preserve KGO in order for me to straightforwardly prove it hasn't broken something. But the lfric_apps comorph rose stem tests using the "cce_fast-debug" compile change answers with this branch. I've traced the change in answers to timestep 15 in a single block of code contained in a do-loop which my branch has moved from one subroutine to another. All variables going into that calculation are identical to bit-level on that timestep, yet the result of the calculation has slightly changed.
Thanks to excellent suggestions from Sam Clarke-Green and James Bruten, I've managed to get detailed compiler listings from Cray fortran for the files containing this calculation :) But bafflingly, the compiler listings don't seem to show any change in the optimisations applied to this calculation in my branch versus the trunk. Both make the exact same 2 comments about the enclosing do-loop:
A loop starting at line 600 would benefit from "!dir$ safe_address".
A loop starting at line 600 was vectorized.
I know that this KGO-change is just a compiler-related blip, because:
a) If I add a print statement inside the affected do-loop (in both a copy of the trunk and in my branch), the difference in answers between the two mysteriously goes away.
b) If I test the exact-same code-changes in the UM copy of comorph instead of the lfric_apps copy, I don't see this KGO-fail even for the "CCE-high" compile. So the problem must be related to different compiler settings used in lfric_apps versus the UM.
Can anyone suggest next steps for debugging this KGO-fail? What else might cause a calculation to change answers? e.g. if some of the arrays involved change from being local (stack) variables to being input arguments in the new subroutine containing the calculation?
Cheers!
Mike
Beta Was this translation helpful? Give feedback.
All reactions