Skip to content

DMRParser fails to parse time values correctly for netcdf4 file #904

@chuckwondo

Description

@chuckwondo

This appears to be specific to Linux, as this error does not occur on macOS (don't know about Windows), but in virtualizarr/tests/test_parsers/test_dmrpp.py, the test_parse_dataset test fails (currently marked as xfail on Linux). For an example failure, see https://github.com/zarr-developers/VirtualiZarr/actions/runs/22743577652/job/65962396672#step:5:881.

This issue is that the netcdf4 fixture file for the test includes a time coordinate with dtype datatime64[ns], which is a 64-bit int type. However, the associated dmrpp metadata (see the value for the "netcdf4_file" key in the DMRPP_XML_STRINGS dict in virtualizarr/tests/test_parsers/test_dmrpp.py) describes time as Float32.

At some point during parsing, the 64-bit int values are converted to 32-bit floats, resulting in negative values, which eventually fail to be converted into timedelta values, resulting in a pandas.errors.OutOfBoundsTimedelta error (see https://github.com/zarr-developers/VirtualiZarr/actions/runs/22743577652/job/65962396672#step:5:934).

I'm not particularly familiar with all the moving parts, but it seems to me that the generated XML metadata is incorrect, since it describes time as Float32 rather than Int64. It's unclear to me why the metadata generator did this, but that seems to be the culprit, as far as I can tell. Oddly though, the metadata for "hdf5_groups_file" within DMRPP_XML_STRINGS appears to be effectively identical, but the test for the hdf file does not fail.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions