-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Labels
CF conventionsKerchunkRelating to the kerchunk library / specification itselfRelating to the kerchunk library / specification itselfbugSomething isn't workingSomething isn't workingparsers
Milestone
Description
I'm working with the GPM-IMERG files from NASA. Here's an example
import xarray as xr
import fsspec
url = "https://earthmover-sample-data.s3.us-east-1.amazonaws.com/hdf5/3B-HHR.MS.MRG.3IMERG.19980101-S000000-E002959.0000.V07B.HDF5"
ds_nc = xr.open_dataset(fsspec.open(url).open(), engine="h5netcdf", group="Grid", decode_coords="all")
print(ds)<xarray.Dataset> Size: 104MB
Dimensions: (time: 1, lon: 3600, lat: 1800, nv: 2,
lonv: 2, latv: 2)
Coordinates:
* time (time) object 8B 1998-01-01 00:00:00
* lon (lon) float32 14kB -179.9 -179.9 ... 179.9
* lat (lat) float32 7kB -89.95 -89.85 ... 89.95
time_bnds (time, nv) object 16B ...
lon_bnds (lon, lonv) float32 29kB ...
lat_bnds (lat, latv) float32 14kB ...
Dimensions without coordinates: nv, lonv, latv
Data variables:
precipitation (time, lon, lat) float32 26MB ...
randomError (time, lon, lat) float32 26MB ...
probabilityLiquidPrecipitation (time, lon, lat) float32 26MB ...
precipitationQualityIndex (time, lon, lat) float32 26MB ...
Attributes:
GridHeader: BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatitudeRe...
When opening this dataset virtually, many of the coordinates are lost.
from virtualizarr import open_virtual_dataset
dsv = open_virtual_dataset(url, indexes={}, group="Grid")
print(dsv)This spits out the following warning
[/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py:547](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py#line=546): UserWarning: The following excepion was caught and quashed while traversing HDF5
Can't get fill value (fill value is undefined)
Traceback (most recent call last):
File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py", line 363](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/kerchunk/hdf.py#line=362), in _translator
fill = h5obj.fillvalue
^^^^^^^^^^^^^^^
File "h5py[/_objects.pyx", line 54](https://hub.openveda.cloud/_objects.pyx#line=53), in h5py._objects.with_phil.wrapper
File "h5py[/_objects.pyx", line 55](https://hub.openveda.cloud/_objects.pyx#line=54), in h5py._objects.with_phil.wrapper
File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/dataset.py", line 622](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/dataset.py#line=621), in fillvalue
self._dcpl.get_fill_value(arr)
File "h5py[/_objects.pyx", line 54](https://hub.openveda.cloud/_objects.pyx#line=53), in h5py._objects.with_phil.wrapper
RuntimeError: Can't get fill value (fill value is undefined)
warnings.warn(msg)
and returns a dataset with most of the coordinates missing
<xarray.Dataset> Size: 91MB
Dimensions: (time: 1, lon: 3600, lat: 1800, nv: 2)
Coordinates:
time (time) int32 4B ManifestArray<shape=(1,),...
Dimensions without coordinates: lon, lat, nv
Data variables:
precipitation (time, lon, lat) float32 26MB ManifestArr...
precipitationQualityIndex (time, lon, lat) float32 26MB ManifestArr...
probabilityLiquidPrecipitation (time, lon, lat) int16 13MB ManifestArray...
randomError (time, lon, lat) float32 26MB ManifestArr...
time_bnds (time, nv) int32 8B ManifestArray<shape=(...
Attributes:
GridHeader: BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatitudeR...
coordinates: time
I also tried with the new kerchunk-free backend and got an error
from virtualizarr.readers.hdf import HDFVirtualBackend
open_virtual_dataset(url, indexes={}, group="Grid", drop_variables=["Intermediate"], backend=HDFVirtualBackend)File [/srv/conda/envs/notebook/lib/python3.12/site-packages/virtualizarr/readers/hdf/hdf.py:129](https://hub.openveda.cloud/srv/conda/envs/notebook/lib/python3.12/site-packages/virtualizarr/readers/hdf/hdf.py#line=128), in HDFVirtualBackend._dataset_chunk_manifest(path, dataset)
127 num_chunks = dsid.get_num_chunks()
128 if num_chunks == 0:
--> 129 raise ValueError("The dataset is chunked but contains no chunks")
130 shape = tuple(
131 math.ceil(a [/](https://hub.openveda.cloud/) b) for a, b in zip(dataset.shape, dataset.chunks)
132 )
133 paths = np.empty(shape, dtype=np.dtypes.StringDType) # type: ignore
ValueError: The dataset is chunked but contains no chunks
Metadata
Metadata
Assignees
Labels
CF conventionsKerchunkRelating to the kerchunk library / specification itselfRelating to the kerchunk library / specification itselfbugSomething isn't workingSomething isn't workingparsers