Skip to content

Match coordinate names against list of vertical levels from eccodes in extract_dataset_chunk_index#552

Merged
martindurant merged 2 commits intofsspec:mainfrom
krlor17:extract_dataset_chunk_use_level_list
Apr 19, 2025
Merged

Match coordinate names against list of vertical levels from eccodes in extract_dataset_chunk_index#552
martindurant merged 2 commits intofsspec:mainfrom
krlor17:extract_dataset_chunk_use_level_list

Conversation

@krlor17
Copy link
Contributor

@krlor17 krlor17 commented Apr 10, 2025

The issue

The current logic for grib files in extract_dataset_chunk_index assumes that any coordinate that doesn't match valid_time , time, step, latitude or longitude must correspond to a vertical level. From the current implementation :

if grib:
        # Grib data has only one level coordinate
        cname = (
            cname
            if cname
            in ("valid_time", "time", "step", "latitude", "longitude")
            else "level"
        )

This has the unfortunate effect of overwriting level values if non-compliant coordinates are processed after the vertical level.

For instance, building a map from a gefs forecast:

gefs_files = [
    "s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000",
]

# Rebuild the reference store to ensure correct offsets and lengths
gefs_scans = []
for f in gefs_files:
    gefs_scans.extend(scan_grib(f, storage_options=dict(anon=True)))

gefs_grib_tree_store = grib_tree(gefs_scans, remote_options=dict(anon=True))

gefs_dt = xr.open_datatree(fsspec.filesystem("reference", fo=gefs_grib_tree_store).get_mapper(""), engine="zarr", consolidated=False)

gefs_kind = extract_datatree_chunk_index(gefs_dt, gefs_grib_tree_store, grib=True)

deflated_gfs_grib_tree_store = copy.deepcopy(gefs_grib_tree_store)
strip_datavar_chunks(deflated_gfs_grib_tree_store)

gefs_kind[gefs_kind["varname"] == "gh"]

yields the kerchunk index:

Unnamed: 0 varname typeOfLevel stepType name level step time valid_time uri offset length inline_value
0 18 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 0 214455 nan
1 19 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 650074 224949 nan
2 20 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 1366141 220337 nan
3 21 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 2343152 216114 nan
4 22 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 3209101 214109 nan
5 23 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 4095294 211932 nan
6 24 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 4977091 235125 nan
7 25 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 6079183 246148 nan
8 26 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 7181107 255680 nan
9 27 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 8632924 264666 nan
10 28 gh isobaricInhPa instant Geopotential height 0 0 days 00:00:00 2024-01-01 2024-01-01 00:00:00 s3://noaa-gefs-pds/gefs.20240101/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000 10664797 276236 nan

Note that every level value is 0!

The issue stems from the "number" coordinate being renamed to "level" in extract_dataset_chunk_index and overwriting the level value with 0.

gefs_dt.gh.instant.isobaricInhPa.coords
Coordinates:
  * isobaricInhPa  (isobaricInhPa) float64 88B 10.0 50.0 100.0 ... 925.0 1e+03
  * latitude       (latitude) float64 3kB 90.0 89.5 89.0 ... -89.0 -89.5 -90.0
  * longitude      (longitude) float64 6kB 0.0 0.5 1.0 1.5 ... 358.5 359.0 359.5
    number         (time, step, isobaricInhPa) int64 264B 0 0 0 0 0 ... 0 0 0 0
  * step           (step) timedelta64[ns] 24B 00:00:00 03:00:00 06:00:00
  * time           (time) datetime64[ns] 8B 2024-01-01
    valid_time     (time, step, isobaricInhPa) datetime64[ns] 264B 2024-01-01...

Suggested solution

I suggest only renaming coordinate names matching the list of valid vertical coordinates used by eccodes. I couldn't find any way of accessing this through the eccodes python interface, so I just parsed https://github.com/ecmwf/eccodes/blob/develop/definitions/grib2/typeOfLevelConcept.def to a list ECCODES_VERTICAL_LEVEL_LIST.

The relevant logic in extract_dataset_chunk_index can then be changed to:

cname = "level" if cname in ECCODES_VERTICAL_LEVEL_LIST else cname

This does however introduce the restriction that the vertical coordinate must either correspond to an entry on the list of be "level".

@krlor17
Copy link
Contributor Author

krlor17 commented Apr 10, 2025

This might also address issue 546

@martindurant
Copy link
Member

@emfdavid , any thoughts on this? I think it's OK...

Do we have any datasets to hand to serve for testing?

in ("valid_time", "time", "step", "latitude", "longitude")
else "level"
)
cname = "level" if cname in ECCODES_VERTICAL_LEVEL_LIST else cname
Copy link
Member

@martindurant martindurant Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this inversed? If cname="time", the previous code would return "time", but now it is "level".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the suggested implementation correct? If cname=time then it is not in the list ECCODES_VERTICAL_LEVEL_LIST and consequently cname itself will be returned i.e. "time" will map to "time"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so none of the names I had were in this list. Sorry for the confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries man :)

@martindurant martindurant merged commit a619238 into fsspec:main Apr 19, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants