-
Notifications
You must be signed in to change notification settings - Fork 54
Change default loadable_variables (and indexes) to match xarray's behaviour #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default loadable_variables (and indexes) to match xarray's behaviour #477
Conversation
for more information, see https://pre-commit.ci
…icholas/VirtualiZarr into refactor_loadable_variables
for more information, see https://pre-commit.ci
|
I just removed |
|
FYI @maxrjones @sharkinsspatial this PR has got to the point where I think the only failing tests are those which use a kerchunk-based reader, as I haven't ported the kerchunk translation code yet. So you could maybe build off this branch already... |
| from virtualizarr import open_virtual_dataset | ||
|
|
||
| with open_virtual_dataset(netcdf4_file, indexes={}) as ds: | ||
| with open_virtual_dataset(netcdf4_file, loadable_variables=[]) as ds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required otherwise we get inlined variables in the kerchunk file which we don't know how to read (#489)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I also moved this to a new api.py file.
…icholas/VirtualiZarr into refactor_loadable_variables
for more information, see https://pre-commit.ci
maxrjones
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you @TomNicholas! Just had a few nits
* need latest version of xarray to import internals correctly * Fix metadata equality for nan fill value (#502) * add check that works for fill_values too * note about removing once merged upstream * type hint * regression test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove accidental changes to pyproject.toml * Update pyproject.toml * ignore mypy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Setup intersphinx mapping for docs (#503) * Setup intersphinx mapping for docs --------- Co-authored-by: Kyle Barron <[email protected]> * Change default loadable_variables (and indexes) to match xarray's behaviour (#477) * draft refactor * sketch of simplified handling of loadable_variables * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get at least some tests working * separate VirtualBackend api definition from common utilities * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove indexes={} everywhere in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * stop passing through loadable_variables to where it isn't used * implement logic to load 1D dimension coords by default * remove more instances of indexes={} * remove more indexes={} * refactor logic for choosing loadable_variables * fix more tets * xfail Aimee's test that I don't understand * xfail test that explicitly specifies no indexes * made a bunch more stuff pass * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix netcdf3 reader * fix bad import in FITS reader * fix import in tiff reader * fix import in icechunk test * release note * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update docstring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix fits reader * xfail on empty dict for indexes * linting * actually test new expected behaviour * fix logic for setting loadable_variables * update docs page to reflect new behaviour * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix expected behaviour in another tests * additional assert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use encode_dataset_coordinates in kerchunk writer * Encode zarr vars * fix some mypy errors * move drop_variables implmentation to the end of every reader * override loadable_variables and raise warning * fix failing test by not creating loadable variables that would get inlined by default * improve error message * remove some more occurrences of indexes={} * skip slow test * slay mypy errors * docs typos * should fix dmrpp test * Delete commented-out code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unecessary test skip --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <[email protected]> * Update pyproject.toml deps (#504) * re-add icechunk to upstream tests * add pytest-asyncio to test envs * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <[email protected]> Co-authored-by: Kyle Barron <[email protected]>
* copy implementation from xarray * sketch idea for lithops parallelization * standardize naming of variables * add to public API * fix errors caused by trying to import xarray types * start writing tests * passing test for combining in serial * requires_kerchunk * test for lithops with default LocalHost executor * notes on confusing AssertionError * ensure lithops is installed * remove uneeded fixture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Additions to `open_virtual_mfdataset` (#508) * need latest version of xarray to import internals correctly * Fix metadata equality for nan fill value (#502) * add check that works for fill_values too * note about removing once merged upstream * type hint * regression test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove accidental changes to pyproject.toml * Update pyproject.toml * ignore mypy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Setup intersphinx mapping for docs (#503) * Setup intersphinx mapping for docs --------- Co-authored-by: Kyle Barron <[email protected]> * Change default loadable_variables (and indexes) to match xarray's behaviour (#477) * draft refactor * sketch of simplified handling of loadable_variables * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get at least some tests working * separate VirtualBackend api definition from common utilities * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove indexes={} everywhere in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * stop passing through loadable_variables to where it isn't used * implement logic to load 1D dimension coords by default * remove more instances of indexes={} * remove more indexes={} * refactor logic for choosing loadable_variables * fix more tets * xfail Aimee's test that I don't understand * xfail test that explicitly specifies no indexes * made a bunch more stuff pass * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix netcdf3 reader * fix bad import in FITS reader * fix import in tiff reader * fix import in icechunk test * release note * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update docstring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix fits reader * xfail on empty dict for indexes * linting * actually test new expected behaviour * fix logic for setting loadable_variables * update docs page to reflect new behaviour * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix expected behaviour in another tests * additional assert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use encode_dataset_coordinates in kerchunk writer * Encode zarr vars * fix some mypy errors * move drop_variables implmentation to the end of every reader * override loadable_variables and raise warning * fix failing test by not creating loadable variables that would get inlined by default * improve error message * remove some more occurrences of indexes={} * skip slow test * slay mypy errors * docs typos * should fix dmrpp test * Delete commented-out code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unecessary test skip --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <[email protected]> * Update pyproject.toml deps (#504) * re-add icechunk to upstream tests * add pytest-asyncio to test envs * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <[email protected]> Co-authored-by: Kyle Barron <[email protected]> * Additions to `open_virtual_mfdataset` (#509) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * Add ManifestStore for loading data from ManifestArrays (#490) * Draft ManifestStore implementation --------- Co-authored-by: Tom Nicholas <[email protected]> Co-authored-by: Kyle Barron <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * make it work for dask delayed * correct docstring --------- Co-authored-by: Max Jones <[email protected]> Co-authored-by: Kyle Barron <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * More open_virtual_mfdataset (#510) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * make it work for dask delayed * correct docstring * added compliant executor for lithops * add links to lithops issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Final fixes for open_virtual_mfdataset (#517) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * make it work for dask delayed * correct docstring * added compliant executor for lithops * add links to lithops issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * specify dask and lithops executors with a string again * fix easy typing stuff * fix typing errors by aligning executor signatures * remove open_virtual_mfdataset from public API for now * release note * refactor construction of expected result * implement preprocess arg, and dodge lithops bug * update comment --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply suggestions from code reviewRemRemove new deps * remove rogue print statement --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <[email protected]> Co-authored-by: Kyle Barron <[email protected]>
This is a second attempt at addressing #335, being more brutal about removing options that aren't used. It is also intended to make implementing #473 easier.
The idea is that no-one really cares about all the complexity of distinguishing between 1D coordinate variables with and without indexes. Instead we should just default to the same index-creation behaviour as xarray uses, and the easiest way to do that is just to use
xr.open_datasetand drop variables the user didn't actually want to load.This will be inefficient right now (in the same way that the current implementation is inefficient) because we fully scan over the whole file twice. But this sets up for #473, which will avoid scanning over the file more than once.
docs/releases.rstNew functions/methods are listed inapi.rst