Skip to content

Commit 4d85a03

Browse files
abarciauskas-bgseTomNicholaspre-commit-ci[bot]mpiannucci
authored
Append to icechunk stores (#272)
* Initial attempt at appending * Working on tests for generate chunk key function * Linting * Refactor gen virtual dataset method * Fix spelling * Linting * Linting * Linting * Passing compression test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * linting * Fix test failing due to incorrect dtype * linting * Linting * Remove obsolete test file for appending * Create netcdf4 files factor in conftest * Linting * Refactor to use combineable zarr arrays * linting * Implement no append dim test * Add test for when append dim is not in dims * Fix mypy errors * type ignore import untyped zarr * Use Union type for check_combineable_zarr_arrays arg * Fix import * Fix imports for get_codecs * use new factory in test * Remove need for dask in fixture * Fix for when zarr is not installed * Address test failures * Add get_codecs file * Add dask to upstream * Remove dependency on dask and h5netcdf engine * Remove obsolete comment * Remove duplicate zarr array type check * Move codecs module and type output * Actually add codecs file * Fix merge mistake * Ignore import untyped * Add tests for codecs * Resolve mypy errors * Fix test * Import zarr in function * Use existing importorskip function * Modify comments * Comment updates and spelling of combinable * Revert change to check compatible encoding * Ignore zarr untyped import errors * Implement a manifest.utils module * pass the array into resize_array Co-authored-by: Tom Nicholas <tom@cworthy.org> * Refactor resize_array * Remove unnecessary zarr imports * Add pinned version of icechunk as an optional dependency * Add append_dim in docstring * Kludgy solution to v2 v3 codecs difference * Add normalize to v3 parameter * Add more info to docstring * Fix typing issues * Add decorator for zarr python v3 test * Fix mypy and ruff errors * Only append if append_dim in dims * Add example notebook * Add a runtime * Add failing test * Fix multiple appends * Fix test error message * Add new cell to notebook to display original time chunk * Upgrade icechunk to 1.0.0a5 * Upgrade icechunk in upstream.yml * Updated notebook with kechunk comment an upgraded icechunk version * Modify test so it fails without updated icechunk * Update icechunk dependency * Fix mypy errors * update icechunk version in pyproject * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove obsolete comment * Use icechunk 0.1.0a7 * Updated notebook * Updated notebook * print store * Update notebook (#327) Co-authored-by: Aimee Barciauskas <aimee@developmentseed.org> * Add append to examples * Add to releases.rst * Revert change to .gitignore * Update ci/upstream.yml Co-authored-by: Tom Nicholas <tom@cworthy.org> * Update pyproject.toml Co-authored-by: Tom Nicholas <tom@cworthy.org> * Update virtualizarr/tests/test_writers/test_icechunk.py Co-authored-by: Tom Nicholas <tom@cworthy.org> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update virtualizarr/accessor.py Co-authored-by: Tom Nicholas <tom@cworthy.org> * Separate out multiple arrays test --------- Co-authored-by: Tom Nicholas <tom@cworthy.org> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthew Iannucci <matthew@earthmover.io>
1 parent 20dd9dc commit 4d85a03

File tree

15 files changed

+2412
-178
lines changed

15 files changed

+2412
-178
lines changed

ci/upstream.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,6 @@ dependencies:
2828
- fsspec
2929
- pip
3030
- pip:
31-
- icechunk # Installs zarr v3 as dependency
32-
# - git+https://github.com/fsspec/kerchunk@main # kerchunk is currently incompatible with zarr-python v3 (https://github.com/fsspec/kerchunk/pull/516)
33-
- imagecodecs-numcodecs==2024.6.1
31+
- icechunk>=0.1.0a7 # Installs zarr v3 as dependency
32+
# - git+https://github.com/fsspec/kerchunk@main # kerchunk is currently incompatible with zarr-python v3 (https://github.com/fsspec/kerchunk/pull/516)
33+
- imagecodecs-numcodecs==2024.6.1

conftest.py

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
from typing import Any, Dict, Optional
2+
13
import h5py
24
import numpy as np
35
import pytest
@@ -35,6 +37,32 @@ def netcdf4_file(tmpdir):
3537
return filepath
3638

3739

40+
@pytest.fixture
41+
def netcdf4_files_factory(tmpdir) -> callable:
42+
def create_netcdf4_files(
43+
encoding: Optional[Dict[str, Dict[str, Any]]] = None,
44+
) -> tuple[str, str]:
45+
ds = xr.tutorial.open_dataset("air_temperature")
46+
47+
# Split dataset into two parts
48+
ds1 = ds.isel(time=slice(None, 1460))
49+
ds2 = ds.isel(time=slice(1460, None))
50+
51+
# Save datasets to disk as NetCDF in the temporary directory with the provided encoding
52+
filepath1 = f"{tmpdir}/air1.nc"
53+
filepath2 = f"{tmpdir}/air2.nc"
54+
ds1.to_netcdf(filepath1, encoding=encoding)
55+
ds2.to_netcdf(filepath2, encoding=encoding)
56+
57+
# Close datasets
58+
ds1.close()
59+
ds2.close()
60+
61+
return filepath1, filepath2
62+
63+
return create_netcdf4_files
64+
65+
3866
@pytest.fixture
3967
def netcdf4_file_with_2d_coords(tmpdir):
4068
ds = xr.tutorial.open_dataset("ROMS_example")
@@ -71,26 +99,6 @@ def hdf5_groups_file(tmpdir):
7199
return filepath
72100

73101

74-
@pytest.fixture
75-
def netcdf4_files(tmpdir):
76-
# Set up example xarray dataset
77-
ds = xr.tutorial.open_dataset("air_temperature")
78-
79-
# split inrto equal chunks so we can concatenate them back together later
80-
ds1 = ds.isel(time=slice(None, 1460))
81-
ds2 = ds.isel(time=slice(1460, None))
82-
83-
# Save it to disk as netCDF (in temporary directory)
84-
filepath1 = f"{tmpdir}/air1.nc"
85-
filepath2 = f"{tmpdir}/air2.nc"
86-
ds1.to_netcdf(filepath1)
87-
ds2.to_netcdf(filepath2)
88-
ds1.close()
89-
ds2.close()
90-
91-
return filepath1, filepath2
92-
93-
94102
@pytest.fixture
95103
def hdf5_empty(tmpdir):
96104
filepath = f"{tmpdir}/empty.nc"

docs/releases.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ New Features
1111

1212
- Add a ``virtual_backend_kwargs`` keyword argument to file readers and to ``open_virtual_dataset``, to allow reader-specific options to be passed down.
1313
(:pull:`315`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
14+
- Added append functionality to `to_icechunk` (:pull:`272`) By `Aimee Barciauskas <https://github.com/abarciauskas-bgse>`_.
1415

1516
Breaking changes
1617
~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)