Skip to content

Add chunks='auto' support for cftime datasets #10527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
eb1a967
All works, just need to satisfy mypy and whatnot now
charles-turner-1 Jul 13, 2025
852476d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2025
c921c59
Merge branch 'main' into autochunk-cftime
charles-turner-1 Jul 13, 2025
1aba531
Fix moving import to be optional
charles-turner-1 Jul 13, 2025
9429c3d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2025
3c9d27e
Make mypy happy
charles-turner-1 Jul 13, 2025
5153d2d
Add some clarifying comments about what we need to do to optimise this
charles-turner-1 Jul 13, 2025
62e71e6
Merge branch 'autochunk-cftime' of https://github.com/charles-turner-…
charles-turner-1 Jul 14, 2025
cfdc31b
@dcherian's suggestions. Just need to update chunking strategy to res…
charles-turner-1 Jul 14, 2025
2f16bc7
Merge branch 'main' of https://github.com/charles-turner-1/xarray
charles-turner-1 Jul 14, 2025
ce720fa
Merge branch 'main' into autochunk-cftime
charles-turner-1 Jul 14, 2025
4fa58c1
Merge branch 'main' into autochunk-cftime
charles-turner-1 Jul 23, 2025
e58d6d7
Can now load cftime arrays with auto-chunking. Implementation still k…
charles-turner-1 Jul 23, 2025
590e503
Merge branch 'autochunk-cftime' of https://github.com/charles-turner-…
charles-turner-1 Jul 23, 2025
f953976
Test for autochunking when reading from disk
charles-turner-1 Jul 25, 2025
6706524
replace `build_chunkspec` with faking the dtype of a cftime array & a…
charles-turner-1 Jul 25, 2025
4e56acd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 25, 2025
0d008cd
Merge branch 'autochunk-cftime' of https://github.com/charles-turner-…
charles-turner-1 Jul 25, 2025
49c4e9c
Merge branch 'main' into autochunk-cftime
charles-turner-1 Jul 25, 2025
4594099
Merge branch 'autochunk-cftime' of https://github.com/charles-turner-…
charles-turner-1 Jul 25, 2025
5d00b0a
Remove redundant comments, rename things to make them clearer, add mo…
charles-turner-1 Jul 25, 2025
80421ef
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 25, 2025
d1f7ad3
Refactor to move most of the changes into the DaskManager
charles-turner-1 Jul 25, 2025
1b7de62
Merge branch 'autochunk-cftime' of https://github.com/charles-turner-…
charles-turner-1 Jul 25, 2025
4407185
bare-min tests should pass now?
charles-turner-1 Jul 25, 2025
d8f45b2
Deepak's suggestions (think mypy is still going to be angry for now)
charles-turner-1 Jul 28, 2025
20226c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 28, 2025
11ac9f0
Merge branch 'autochunk-cftime' of https://github.com/charles-turner-…
charles-turner-1 Jul 28, 2025
8485df5
Fix errant line
charles-turner-1 Jul 28, 2025
2c27877
Clean up `DaskManager.rechunk` a bit - maybe possible to remove more …
charles-turner-1 Jul 28, 2025
0983261
Remove unused import
charles-turner-1 Jul 28, 2025
c4ec31f
Merge branch 'main' into autochunk-cftime
charles-turner-1 Aug 8, 2025
adbf5b2
Merge branch 'autochunk-cftime' of https://github.com/charles-turner-…
charles-turner-1 Aug 8, 2025
6c93bc4
Fix a couple of type errors
charles-turner-1 Aug 8, 2025
74bc0ea
Mypy & tests passing locally
charles-turner-1 Aug 8, 2025
0b9bbd0
Merge branch 'main' into autochunk-cftime
charles-turner-1 Aug 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions xarray/structure/chunks.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from typing import TYPE_CHECKING, Any, Literal, TypeVar, Union, overload

from xarray.core import utils
from xarray.core.common import _contains_cftime_datetimes
from xarray.core.utils import emit_user_level_warning
from xarray.core.variable import IndexVariable, Variable
from xarray.namedarray.parallelcompat import (
Expand Down Expand Up @@ -83,9 +84,27 @@ def _get_chunk(var: Variable, chunks, chunkmanager: ChunkManagerEntrypoint):
for dim, preferred_chunk_sizes in zip(dims, preferred_chunk_shape, strict=True)
)

chunk_shape = chunkmanager.normalize_chunks(
chunk_shape, shape=shape, dtype=var.dtype, previous_chunks=preferred_chunk_shape
)
if _contains_cftime_datetimes(var):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _contains_cftime_datetimes(var):
if _contains_cftime_datetimes(var) and chunks == "auto":

# If we have cftime datetimes, need to preprocess them - we can't pass
# an object dtype into DaskManager.normalize_chunks.
from dask import config as dask_config
from dask.utils import parse_bytes

from xarray.namedarray.utils import build_chunkspec

target_chunksize = parse_bytes(dask_config.get("array.chunk-size"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding get_auto_chunk_size to the ChunkManager class; and put the dask-specific stuff in the DaskManager.

cc @TomNicholas

chunk_shape = build_chunkspec(var, target_chunksize=target_chunksize)

chunk_shape = chunkmanager.normalize_chunks(
chunk_shape, shape=shape, previous_chunks=preferred_chunk_shape
)
else:
chunk_shape = chunkmanager.normalize_chunks(
chunk_shape,
shape=shape,
dtype=var.dtype,
previous_chunks=preferred_chunk_shape,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chunk_shape = chunkmanager.normalize_chunks(
chunk_shape, shape=shape, previous_chunks=preferred_chunk_shape
)
else:
chunk_shape = chunkmanager.normalize_chunks(
chunk_shape,
shape=shape,
dtype=var.dtype,
previous_chunks=preferred_chunk_shape,
)
chunk_shape = chunkmanager.normalize_chunks(
chunk_shape,
shape=shape,
dtype=var.dtype,
previous_chunks=preferred_chunk_shape,
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no dtype=var.dtype in the if _contains_cftime_datetimes(var) clause. We could do this:

if _contains_cftime_datetimes(var):
    ...
    chunk_shape = build_chunkspec(...)
    var_dtype = None
else:
    var_dtype = var.dtype

chunk_shape = chunkmanager.normalize_chunks(
            chunk_shape,
            shape=shape,
            dtype=var_dtype,
            previous_chunks=preferred_chunk_shape,
        )

which seems cleaner than what I've currently got?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore that, I've changed how this works to allow us to use the dask native chunk normalization - by computing a ratio of sizes to a np.float64 and then adjusting our limit by that so we get the correct size chunks.


# Warn where requested chunks break preferred chunks, provided that the variable
# contains data.
Expand Down