Skip to content

Coordinates lost with GPM-IMERG file #342

@rabernat

Description

@rabernat

I'm working with the GPM-IMERG files from NASA. Here's an example

import xarray as xr
import fsspec

url = "https://earthmover-sample-data.s3.us-east-1.amazonaws.com/hdf5/3B-HHR.MS.MRG.3IMERG.19980101-S000000-E002959.0000.V07B.HDF5"
ds_nc = xr.open_dataset(fsspec.open(url).open(), engine="h5netcdf", group="Grid", decode_coords="all")
print(ds)
<xarray.Dataset> Size: 104MB
Dimensions:                         (time: 1, lon: 3600, lat: 1800, nv: 2,
                                     lonv: 2, latv: 2)
Coordinates:
  * time                            (time) object 8B 1998-01-01 00:00:00
  * lon                             (lon) float32 14kB -179.9 -179.9 ... 179.9
  * lat                             (lat) float32 7kB -89.95 -89.85 ... 89.95
    time_bnds                       (time, nv) object 16B ...
    lon_bnds                        (lon, lonv) float32 29kB ...
    lat_bnds                        (lat, latv) float32 14kB ...
Dimensions without coordinates: nv, lonv, latv
Data variables:
    precipitation                   (time, lon, lat) float32 26MB ...
    randomError                     (time, lon, lat) float32 26MB ...
    probabilityLiquidPrecipitation  (time, lon, lat) float32 26MB ...
    precipitationQualityIndex       (time, lon, lat) float32 26MB ...
Attributes:
    GridHeader:  BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatitudeRe...

When opening this dataset virtually, many of the coordinates are lost.

from virtualizarr import open_virtual_dataset

dsv = open_virtual_dataset(url, indexes={}, group="Grid")
print(dsv)

This spits out the following warning

[/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py:547](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py#line=546): UserWarning: The following excepion was caught and quashed while traversing HDF5
Can't get fill value (fill value is undefined)
Traceback (most recent call last):
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py", line 363](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py#line=362), in _translator
    fill = h5obj.fillvalue
           ^^^^^^^^^^^^^^^
  File "h5py[/_objects.pyx", line 54](https://hub.openveda.cloud/_objects.pyx#line=53), in h5py._objects.with_phil.wrapper
  File "h5py[/_objects.pyx", line 55](https://hub.openveda.cloud/_objects.pyx#line=54), in h5py._objects.with_phil.wrapper
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/dataset.py", line 622](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/dataset.py#line=621), in fillvalue
    self._dcpl.get_fill_value(arr)
  File "h5py[/_objects.pyx", line 54](https://hub.openveda.cloud/_objects.pyx#line=53), in h5py._objects.with_phil.wrapper
RuntimeError: Can't get fill value (fill value is undefined)

  warnings.warn(msg)

and returns a dataset with most of the coordinates missing

<xarray.Dataset> Size: 91MB
Dimensions:                         (time: 1, lon: 3600, lat: 1800, nv: 2)
Coordinates:
    time                            (time) int32 4B ManifestArray<shape=(1,),...
Dimensions without coordinates: lon, lat, nv
Data variables:
    precipitation                   (time, lon, lat) float32 26MB ManifestArr...
    precipitationQualityIndex       (time, lon, lat) float32 26MB ManifestArr...
    probabilityLiquidPrecipitation  (time, lon, lat) int16 13MB ManifestArray...
    randomError                     (time, lon, lat) float32 26MB ManifestArr...
    time_bnds                       (time, nv) int32 8B ManifestArray<shape=(...
Attributes:
    GridHeader:   BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatitudeR...
    coordinates:  time

I also tried with the new kerchunk-free backend and got an error

from virtualizarr.readers.hdf import HDFVirtualBackend

open_virtual_dataset(url, indexes={}, group="Grid", drop_variables=["Intermediate"], backend=HDFVirtualBackend)
File [/srv/conda/envs/notebook/lib/python3.12/site-packages/virtualizarr/readers/hdf/hdf.py:129](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/virtualizarr/readers/hdf/hdf.py#line=128), in HDFVirtualBackend._dataset_chunk_manifest(path, dataset)
    127 num_chunks = dsid.get_num_chunks()
    128 if num_chunks == 0:
--> 129     raise ValueError("The dataset is chunked but contains no chunks")
    130 shape = tuple(
    131     math.ceil(a [/](https://hub.openveda.cloud/) b) for a, b in zip(dataset.shape, dataset.chunks)
    132 )
    133 paths = np.empty(shape, dtype=np.dtypes.StringDType)  # type: ignore

ValueError: The dataset is chunked but contains no chunks

Metadata

Metadata

Assignees

No one assigned

    Labels

    CF conventionsKerchunkRelating to the kerchunk library / specification itselfbugSomething isn't workingparsers

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions