Skip to content

Change default format of tile-space HISTORY output to nc4#144

Merged
gmao-rreichle merged 6 commits intodevelopfrom
feature/wjiang/Tile_nc4
Nov 25, 2025
Merged

Change default format of tile-space HISTORY output to nc4#144
gmao-rreichle merged 6 commits intodevelopfrom
feature/wjiang/Tile_nc4

Conversation

@weiyuan-jiang
Copy link
Contributor

@weiyuan-jiang weiyuan-jiang commented Oct 3, 2025

Use the new MAPL to produce nc4 file in tile space directly.

Remove tile_bin2nc4.F90 post-processing.

Related PRs:

Testing:

  • Successfully 0-diff tested by @gmao-rreichle on 24-Nov-2025.
  • Requires revised versions of the test-specific HISTORY.rc files for the GLOBAL/model, GLOBAL/assim, and GLOBALCS/assim tests. These tests fail to run with the existing test-specific HISTORY.rc files.
  • Comparison of nc4 output was done manually. Result is zero-diff for all fields except lat/lon, which have known roundoff differences resulting from writing output directly from MAPL (i.e., supported by the new EASE grid factory). Also, (dummy) "time" variable is present in direct nc4 output from MAPL but not in old nc4 output generated with tile_bin2nc4.

@weiyuan-jiang weiyuan-jiang requested a review from a team as a code owner October 3, 2025 18:08
@gmao-rreichle gmao-rreichle changed the title test history Change default format of tile-space HISTORY output to nc4 Nov 14, 2025
@gmao-rreichle
Copy link
Collaborator

@weiyuan-jiang, thanks for putting this together. A couple of things:

  1. There are six additional instances of ".bin" in GEOSldas_HIST.rc. We should probably change them as well.

  2. Have you verified that the output is 0-diff for the two collections that were changed to nc4? Not sure if the Matt's standard testing script works here, since the baseline has binary output and the experiment produces nc4. This probably needs to be done manually.

@weiyuan-jiang
Copy link
Contributor Author

I have tested they are zero-diff if there is no bit shave for the binary output. Somehow bit shave is not used in nc4.

@gmao-rreichle
Copy link
Collaborator

I have tested they are zero-diff if there is no bit shave for the binary output. Somehow bit shave is not used in nc4.

Thanks. While it's much better to have nc4 output, for production runs we can't really afford to skip bit shaving. I'm surprised that bit shaving isn't working for nc4. I assume this is true only for EASE and/or tile-space output. I would be very surprised if bit shaving wasn't working for 2-d CF output. What does it take to have bit shaving work for tile-space output and EASE (1d & 2d)?

@weiyuan-jiang
Copy link
Contributor Author

The bit shaving would work with this PR (GEOS-ESM/MAPL#4190). The final history outputs are zero-diff except some meta data (attributes, long names..) and lats and lons. Their 6 significant digits are the same though.

@gmao-rreichle gmao-rreichle requested a review from a team as a code owner November 21, 2025 22:58
@gmao-rreichle
Copy link
Collaborator

@weiyuan-jiang :

Here's an update:

  1. I verified that the direct nc4 output via MAPL from your land+landice simulation is binary identical to the binary output (after applying tile_bin2nc4.x) from your separate land and landice simulations. As noted earlier, there is a roundoff difference in the values of the lat and lon coordinates, but this is a known feature of the EASE grid factory and not related to the present PR.

  2. I removed the 1d "lfs" collection. It no longer makes sense to group the variables in this way in the 1d format.

  3. I updated the HISTORY templates to use nc4 output for all 1d collections. Here, I still need to verify that the "increments" output in the coupled land-atm DAS is ingested by the GCM in nc4 format.

  4. I removed tile_bin2nc4.F90. I added an "exit" statement into lenkf_j_template.py when binary output is encountered, but this might not work if, say, tilecoord.bin is part of the set of files that are moved. This probably needs more thought.

  5. I still need to update the documentation.

  6. I noticed that the landice ("glc") collections are moved into the ./cat output directory. They should probably get their own ./glc directory, although this would require maintaining lists of collection names. Maybe we should just rename the ./cat directory to something more general that reflects "ldas" diagnostic output.

@weiyuan-jiang
Copy link
Contributor Author

weiyuan-jiang commented Nov 24, 2025

4. I removed tile_bin2nc4.F90. I added an "exit" statement into lenkf_j_template.py when binary output is encountered, but this might not work if, say, tilecoord.bin is part of the set of files that are moved. This probably needs more thought.

The tilecoord.bin is not in scratch directory but in rc_out directory. So it will not be converted and moved.

@gmao-rreichle
Copy link
Collaborator

@weiyuan-jiang: Thanks for pointing out that tilecoord.bin is in ./rc_out. Here are more updates:

  1. I updated the HISTORY templates to use nc4 output for all 1d collections. Here, I still need to verify that the "increments" output in the coupled land-atm DAS is ingested by the GCM in nc4 format.

I verified that the "increments" files in the coupled land-atm DAS are in nc4 format, so no further action should be needed on this item.

  1. I still need to update the documentation.

See GEOS-ESM/GEOSldas#841.

@weiyuan-jiang, please review my recent changes, especially 1cc676d and the updated documentation.

If everything looks ok to you, I'll test this PR. Thanks!

@weiyuan-jiang
Copy link
Contributor Author

It looks good to me.

@gmao-rreichle gmao-rreichle merged commit b29481e into develop Nov 25, 2025
15 checks passed
@gmao-rreichle gmao-rreichle deleted the feature/wjiang/Tile_nc4 branch November 25, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments