Skip to content

Support for pathlib paths in earthkit xarray engine #737

@andreas-grafberger

Description

@andreas-grafberger

Is your feature request related to a problem? Please describe.

Hi! I was just using the earthkit xarray engine to read grib files directly via xr.open_dataset and noticed that it fails on the latest develop branch (8af12aa) when I pass a pathlib.Path or pathlib.PosixPath object instead of formatting the path as a string. For now I solved it by simply calling my_path.as_posix() on my path but thought that it would be nice to support pathlib natively.

The error I'm getting is the following:

>>> ds = xr.open_dataset(pathlib.Path("t_time_series.grib"), engine='earthkit')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/etc/ecmwf/nfs/dh2_home_a/ecm0724/workspace/earthkit-data/.venv/lib/python3.12/site-packages/xarray/backends/api.py", line 687, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/etc/ecmwf/nfs/dh2_home_a/ecm0724/workspace/earthkit-data/src/earthkit/data/utils/xarray/engine.py", line 309, in open_dataset
    fieldlist = self._fieldlist(filename_or_obj, source_type)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/etc/ecmwf/nfs/dh2_home_a/ecm0724/workspace/earthkit-data/src/earthkit/data/utils/xarray/engine.py", line 384, in _fieldlist
    return ds
           ^^
UnboundLocalError: cannot access local variable 'ds' where it is not associated with a value

Describe the solution you'd like

The following code snippets should work:

ds = xr.open_dataset(pathlib.PosixPath("t_time_series.grib"), engine='earthkit')
ds = xr.open_dataset(pathlib.Path("t_time_series.grib"), engine='earthkit')

Describe alternatives you've considered

No response

Additional context

Here's a small script to reproduce the issue I'm facing

import pathlib
import urllib.request
import earthkit.data as ekd

# Download test file
urllib.request.urlretrieve(
    "https://github.com/ecmwf/earthkit-data/raw/refs/heads/develop/tests/data/t_time_series.grib", 
    "t_time_series.grib"
)

# This all works
ekd.from_source("file", "t_time_series.grib").to_xarray() 
ekd.from_source("file", pathlib.Path("t_time_series.grib")).to_xarray()
xr.open_dataset("t_time_series.grib")
xr.open_dataset("t_time_series.grib", engine='earthkit')

# This fails
xr.open_dataset(pathlib.Path("t_time_series.grib"), engine='earthkit')

I suspect that this could amount to simply changing this condition here to also allow pathlib Path objects but haven't looked at it in more detail. If you agree with this feature/change, I'm happy to create a PR for it.

Organisation

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions