-
Couldn't load subscription status.
- Fork 43
Zarr support (backend, in esmvalcore.preprocessor._io.py)
#2785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 55 commits
e347f40
5c32b55
682f46d
81c254c
6a02757
84412ab
1f8e127
5b97169
8bcc15f
c0b049c
e5f8c4e
9265b0d
4be6152
28f647f
6da4183
fb7712a
95a92c9
872be18
971cf34
0eeeb50
f5d13c8
fa8b90a
cccdb39
1618076
e2ed41c
fe7326e
39df34e
2b44ac9
caff216
d48418c
0909770
e87b12b
37d8a31
0d446af
37fcfff
94d8677
7ac7b45
caa3657
b4c6b6f
48db5f3
0afcec7
0c4a16f
72d79c2
8e54f1e
1572fff
b1fe4b8
f5c5979
8cddb55
0d71de7
2ab8fc0
b01b578
ea9377a
72d87bc
76b32b4
90c8963
7af2ec4
c151b57
ab78052
b5c3301
d514b67
2852381
49fb643
6a554d8
683b6e8
f2923e6
8c49e20
8b6f221
63411cb
84a33f2
8909b7d
eff8956
a2e31ab
a387558
37266da
e13a19e
cef79ce
63b817f
71ebe4e
171ea74
464c9f3
66f9811
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -9,7 +9,9 @@ | |||||||||||||||
| from itertools import groupby | ||||||||||||||||
| from pathlib import Path | ||||||||||||||||
| from typing import TYPE_CHECKING, Any | ||||||||||||||||
| from urllib.parse import urlparse | ||||||||||||||||
|
|
||||||||||||||||
| import fsspec | ||||||||||||||||
| import iris | ||||||||||||||||
| import ncdata | ||||||||||||||||
| import xarray as xr | ||||||||||||||||
|
|
@@ -75,6 +77,7 @@ | |||||||||||||||
| def load( | ||||||||||||||||
| file: str | Path | Cube | CubeList | xr.Dataset | ncdata.NcData, | ||||||||||||||||
| ignore_warnings: list[dict[str, Any]] | None = None, | ||||||||||||||||
| backend_kwargs: dict[str, Any] | None = None, | ||||||||||||||||
| ) -> CubeList: | ||||||||||||||||
| """Load Iris cubes. | ||||||||||||||||
|
|
@@ -83,10 +86,18 @@ | |||||||||||||||
| file: | ||||||||||||||||
| File to be loaded. If ``file`` is already a loaded dataset, return it | ||||||||||||||||
| as a :class:`~iris.cube.CubeList`. | ||||||||||||||||
| File as ``Path`` object could be a Zarr store. | ||||||||||||||||
| ignore_warnings: | ||||||||||||||||
| Keyword arguments passed to :func:`warnings.filterwarnings` used to | ||||||||||||||||
| ignore warnings issued by :func:`iris.load_raw`. Each list element | ||||||||||||||||
| corresponds to one call to :func:`warnings.filterwarnings`. | ||||||||||||||||
| backend_kwargs: | ||||||||||||||||
schlunma marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
| Dict to hold info needed by storage backend e.g. to access | ||||||||||||||||
| a PRIVATE S3 bucket containing object stores (e.g. netCDF4 files); | ||||||||||||||||
| needed by ``fsspec`` and its extensions e.g. ``s3fs``, so | ||||||||||||||||
| most of the times it is a ``storage_options`` dict. Note that Zarr | ||||||||||||||||
valeriupredoi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||
| files are opened via ``http`` extension of ``fsspec``, so no need | ||||||||||||||||
| for ``storage_options`` in that case (ie anon/anon). | ||||||||||||||||
| Returns | ||||||||||||||||
| ------- | ||||||||||||||||
|
|
@@ -102,7 +113,19 @@ | |||||||||||||||
| """ | ||||||||||||||||
| if isinstance(file, (str, Path)): | ||||||||||||||||
| cubes = _load_from_file(file, ignore_warnings=ignore_warnings) | ||||||||||||||||
| extension = ( | ||||||||||||||||
| file.suffix | ||||||||||||||||
| if isinstance(file, Path) | ||||||||||||||||
| else os.path.splitext(file)[1] | ||||||||||||||||
| ) | ||||||||||||||||
| if "zarr" not in extension: | ||||||||||||||||
| cubes = _load_from_file(file, ignore_warnings=ignore_warnings) | ||||||||||||||||
schlunma marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
| else: | ||||||||||||||||
| cubes = _load_zarr( | ||||||||||||||||
| file, | ||||||||||||||||
| ignore_warnings=ignore_warnings, | ||||||||||||||||
| backend_kwargs=backend_kwargs, | ||||||||||||||||
| ) | ||||||||||||||||
| elif isinstance(file, Cube): | ||||||||||||||||
| cubes = CubeList([file]) | ||||||||||||||||
| elif isinstance(file, CubeList): | ||||||||||||||||
|
|
@@ -134,6 +157,50 @@ | |||||||||||||||
| return cubes | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| def _load_zarr( | ||||||||||||||||
| file: str | Path | Cube | CubeList | xr.Dataset | ncdata.NcData, | ||||||||||||||||
| ignore_warnings: list[dict[str, Any]] | None = None, | ||||||||||||||||
| backend_kwargs: dict[str, Any] | None = None, | ||||||||||||||||
| ) -> CubeList: | ||||||||||||||||
| if isinstance(file, Path): | ||||||||||||||||
| zarr_xr = xr.open_dataset( | ||||||||||||||||
| file, | ||||||||||||||||
| consolidated=False, | ||||||||||||||||
| engine="zarr", | ||||||||||||||||
| ) | ||||||||||||||||
|
||||||||||||||||
| if isinstance(file, Path): | |
| zarr_xr = xr.open_dataset( | |
| file, | |
| consolidated=False, | |
| engine="zarr", | |
| ) |
This is the part which is supposed to treat files from the local filesystem, right? You also need to consider the case where file is a str and points to an object in the local filesystem. Currently, this will just run into the else part and return an empty list.
I would actually put this in the else condition below (see my comments below) and let xarray raise any potential errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent point! This is how it looks now d514b67
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and it's also tested fully now 🍺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I cannot find the test. Would you point me to it? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one and the one below it
| def test_load_zarr_local_not_file(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a nice test, but not exactly what I meant. I think it would be nice to simply replicate test_load_zarr2_local with a str as file arguement, i.e., simply use cubes = load(str(zarr_path)) in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that I missed, plopped in 84a33f2
valeriupredoi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
valeriupredoi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
valeriupredoi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
valeriupredoi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
valeriupredoi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
valeriupredoi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you try to read files that are not Zarr2 or Zarr3? Will this raise an error? If yes, I would simply let the underlying code raise that error and not build in this logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are no other Zarr files than Zarr2 and Zarr3: Zarr used to be just Zarr, before the major super release of 3.0, that contained a lot of structural changes, also a lot of breaking changes, after that it's sort of lore that people call Zarr2 and Zarr3 to differentiate between the two formats, it's not an official format per se, it's still "Zarr". Whatever it's not Zarr is caught by this test, also this catches issues with the network or temporary unavailability of the object store (that's why I phrased the exception "at the moment" 😉 ). Here is the Zarr3 mega release (note that they had some serious issues with 3.0, so they had to yank it) https://zarr.dev/blog/zarr-python-3-release/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, but what happens if a non-Zarr2/non-Zarr3 is passed to xr.open_dataset. I would hope that this raises an error. In that case, I think we can remove all the code here that is just there to raise error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bingo! That's why the test here: Xarray uses Zarr that uses fsspec and aiohttp to access files over the network, and indeed you will get an aiohttp error if one tries to access a Zarr file that is not Zarr, but that error usually comes after a good few minutes of waiting, and with a huge stracktrace - we want to avoid exactly that and literally poke the file before trying to open it. Telling you, man, it's only a few lines of code in this PR, but that's been a quite a bit of sweat and guts before these 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if the network's crap or there are privacy restrictions on the file - the very quick call to fsspec resolves those a lot faster than trying to open the entire file/store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points! If that's a robust way to identify Zarr files then this is fine for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is, even cf-python stole this test 😁
Uh oh!
There was an error while loading. Please reload this page.