Update env, cmake, GEOS_Util and MAPL releases in components.yaml & update README.md after decommissioning of SLES12 at NCCS#796
Conversation
|
Testing summary:
`Runtype Clone Build Build Time Model Run/Compare Assim Run/Compare conus pass pass 13 min pass/FAIL -- / -- Note: Helfand is not used as option during testing ( we use Louis as default) so for PR this change to use branch is trivially zero diff. |
|
Hmm. You are getting failures from the new Intel? Are these build-time or run-time? That is, did things crash or just get different answers? |
|
I'm a bit confused by this PR and think we need to separate the update of the environment and the Helfand update. Specifically:
I think it would be best to remove the Helfand branch from the PR and examine the impact of the environment update in isolation. |
|
|
|
Thanks, @biljanaorescanin. Here are my 2c:
This is great, but I still think we want this to be in a separate PR for clarity. When releases are made, the release doc is basically a collection of PR titles. Having a separate PR for the zero-diff helfsurface() optimization change makes it much easier to understand what was done when a few months have passed and nobody can remember off the top of their head. I edited the present PR accordingly. Once the present PR has been merged, we can test and merge the GMAO_Shared helfsurface() optimzation PR GEOS-ESM/GMAO_Shared#348
What does it take to include the GCC-14 change into this PR? It doesn't make sense to me to merge this PR when it doesn't work with the current GNU version. Maybe I'm missing something. Also, I'm still surprised that the comparison passes for the GLOBAL/assim test but fails for the GLOBAL/model test (and similarly for other tests). This could be a difference in 1d (tile) vs. 2d output and MAPL HISTORY regridding. Before we can merge the PR, we need to understand better what exactly is not zero-diff here. |
|
If I only focus to intel you will see only NC4 files fail and it is for roundoff: |
This is actually an issue with the scripting. In the regression scripts, for GNU runs I have to replace the I can change that in the scripting, and then the GNU tests would go NZD the next time things run. |
tick up minor release, should be 0-diff per respective release notes
gmao-rreichle
left a comment
There was a problem hiding this comment.
See inline comments below.
| local: ./@env | ||
| remote: ../ESMA_env.git | ||
| tag: v4.29.1 | ||
| tag: v4.36.0 |
There was a problem hiding this comment.
@biljanaorescanin, @mathomp4 : I ticked up the versions of env, cmake, and MAPL (c1007c8). Based on the documentation of the respective releases, this should be zero-diff w.r.t. what was on the PR before my latest edits (but definitely non-0-diff w.r.t. current develop). @mathomp4, please let me know if you have any objections or suggestions. @biljanaorescanin, when you get a chance, please re-test the PR. If all is as expected, the new test is 0-diff w.r.t. the most recent test (if you still have a copy).
|
Tests are zero diff to previous iteration of testing. |
|
@mathomp4, @biljanaorescanin, @weiyuan-jiang: I am still trying to understand the very unusual non-0-diff character of this PR. Specifically, the LDAS_GLOBAL/model test fails the comparison for the nc4 files (in just a small subset of variables, and within what seems to be roundoff). The curious thing is that the LDAS_GLOBAL/assim test passes! If there was any change in the science code (or a roundoff change in the science calcs triggered by the newer env/baselibs), then the assim test should also fail the nc4 comparison. The fact that the assim test passes suggests that it's something in MAPL and/or the LDAS tile_bin2nc4 utility. That is, the variables would need to be 0-diff when they in memory during the simulation, but then something changes when the data are written out. I noticed that the Intel tests with standard optimization that do pass have no bit shaving in HISTORY.rc, whereas the tests that fail the nc4 comparison have bit shaving enabled. I went through the documentation of the MAPL releases between 2.50.1 and 2.54.2 and didn't notice anything that might have impacted the bit shaving, and the documentation suggests that for the most part the MAPL releases in question should all be 0-diff among themselves (for the GCM, which is usually a bigger hurdle than LDAS when it comes to 0-diff). So I can't see how exactly the bit shaving might cause the non-0-diffs seen here, but I also can't quite rule it out. Thoughts? |
|
The failed comparison in model run is on files that are not in the assim run. So we probably don't need to worry about this. What happened to GCM's history output with bit shaving? @mathomp4 |
@weiyuan-jiang, I'm not sure I understand the reasoning. Of course the "model" test case has different HISTORY output. What I'm after is understanding which exact changes in the PR caused the non-0-diff result for the model test case. Normally, anything that causes non-0-diff in the model test case would also cause non-0-diff in the assim test case. The fact that the model case is non-0-diff but the assim case is 0-diff is very unusual, and I'd really like to be able to explain this so we can make more informed decisions about how to interpret the non-0-diff changes in science applications going forward |
|
For lack of a better idea I just tested a variant of the PR's branch that reverts MAPL back to 2.50.1. I only ran the Intel tests w/ standard optimization. The result is 0-diff w.r.t. using MAPL 2.54.2, so MAPL is not the cause of the non-0-diff result vs. develop. Still a mystery to me why we get non-0-diff for output from the "model" test but 0-diff for the "assim" test. |
|
In our regression testing we got test fail: If I run just GLOBAL/model test and comment out in |

This non-0-diff PR updates the
components.yamlto approximately match that of GEOSgcmmainas of 2025-Mar-19.The non-0-diff changes are within "roundoff," and are caused by the newer compiler/baselibs version. Intel tests with standard optimization are 0-diff when bit shaving is not used.
Note that the ESMA_env, ESMA_cmake, and MAPL versions are slightly newer than those of GEOSgcm
main, but per the respective release notes this should be 0-diff w.r.t. what is in GEOSgcmmain(but not 0-diff w.r.t. what is on GEOSldasdevelopbefore this PR!).The PR also updates README.md to reflect that SLES15 is now the only O/S on the NCCS Discover platform.
Earlier versions of this PR also included the helfsurface() optimization of GEOS-ESM/GMAO_Shared#348, which requires a newer Intel compiler but should be zero-diff (which is why it will be done in a separate PR).