Skip to content

Commit 98cfee5

Browse files
dcherianmax-sixtyclaudepre-commit-ci[bot]Claude
authored
Allow mean with time dtypes (#10227)
* Allow `mean` with time dtypes Closes #5897 Closes #6995 Closes #10217 * Allow mean() to work with datetime dtypes - Changed numeric_only from True to False in mean() aggregations - Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types - Only filters data variables, preserves all coordinates - Added comprehensive tests for datetime mean with edge cases including NaT handling - Tests cover Dataset, DataArray, groupby, and mixed type scenarios This enables mean() to work with datetime64 and timedelta64 types as requested in PR #10227 while preventing errors from string variables. * Skip cftime test when cftime not available Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays to skip the test in minimal dependency environments where cftime is not installed. * Fix string/datetime handling in mean() aggregations - Revert numeric_only back to True for mean() to prevent strings from being included - Add datetime64/timedelta64 types to numeric_only checks in Dataset.reduce() and flox path - Also check for object arrays containing datetime-like objects (cftime dates) - This allows mean() to work with datetime types while excluding strings that would cause errors * Trigger CI workflow * Format groupby.py numeric_only condition Auto-formatted the multi-line condition for better readability * Apply formatting changes to dataset.py and test_groupby.py - Auto-formatted multi-line conditions for better readability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix numeric_only parameter in groupby mean for non-flox path The generated mean() methods for DatasetGroupBy and ResampleGroupBy were incorrectly passing numeric_only=False in the non-flox path, causing string variables to fail during reduction. Changed to numeric_only=True to match the flox path behavior. This fixes test_groupby_dataset_reduce_ellipsis failures. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Simplify dtype checking with _is_numeric_aggregatable_dtype helper Created a canonical helper function to check if a dtype can be used in numeric aggregations. This replaces complex repeated conditionals in dataset.py and groupby.py with a single, well-documented function. The helper checks for: - Numeric types (int, float, complex) - Boolean type - Datetime types (datetime64, timedelta64) - Object arrays containing datetime-like objects (e.g., cftime) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Address review feedback from @dcherian - Document datetime mean behavior in generate_aggregations.py - Simplify test assertion using isnull().item() instead of pd.notna - Remove redundant test_mean_dataarray_datetime test - Add comprehensive tests for linked issues: - Issue #5897: Test mean with cftime objects - Issue #6995: Test groupby_bins with datetime64 mean - Issue #10217: Test groupby_bins mean on time series data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix test_mean_with_cftime_objects for non-dask environments The test was unconditionally using dask operations which caused failures in the "all-but-dask" CI environment. Now the dask-specific tests are only run when dask is available. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Use standard @requires_dask decorator for dask-specific tests Split test_mean_with_cftime_objects into two tests: - Base test runs without dask - Dask-specific test uses @requires_dask decorator This follows xarray's standard pattern for dependency-specific tests and is cleaner than using if has_dask conditionals. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Add testing guidelines and use proper decorators - Created xarray/tests/CLAUDE.md with comprehensive guidelines for handling optional dependencies in tests - Updated cftime tests to use @requires_cftime decorator instead of pytest.importorskip, following xarray's standard patterns - This ensures consistent handling across CI environments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix CLAUDE.md blackdoc formatting Ensure all Python code blocks have complete, valid syntax for blackdoc. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Simplify cftime check in _is_numeric_aggregatable_dtype As suggested by @dcherian in review comment r2337966239, directly use _contains_cftime_datetimes(var._data) instead of the more complex check. This is cleaner since _contains_cftime_datetimes already handles the object dtype check internally. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Streamline CLAUDE.md and move import to top of file - Made CLAUDE.md more concise by removing verbose decorator listings - Moved _is_numeric_aggregatable_dtype import to top of groupby.py as suggested by @dcherian in review comment r2337968851 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Address review comments: clarify numeric_only scope and merge tests - Move datetime/timedelta note to global _NUMERIC_ONLY_NOTES constant (addresses comment r2337970143) - Merge test_mean_preserves_non_string_object_arrays into test_mean_with_cftime_objects (addresses comment r2337128692) - Both changes address reviewer feedback about simplification * Clarify that @requires decorators should be used instead of skipif patterns Update testing guidelines to explicitly discourage pytest.mark.skipif in parametrize, recommending splitting tests or using @requires decorators * Fix CLAUDE.md blackdoc formatting Co-authored-by: Claude <[email protected]> --------- Co-authored-by: Maximilian Roos <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claude <[email protected]>
1 parent dff84af commit 98cfee5

File tree

8 files changed

+387
-25
lines changed

8 files changed

+387
-25
lines changed

xarray/core/_aggregations.py

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1776,10 +1776,6 @@ def mean(
17761776
:ref:`agg`
17771777
User guide on reduction or aggregation operations.
17781778
1779-
Notes
1780-
-----
1781-
Non-numeric variables will be removed prior to reducing.
1782-
17831779
Examples
17841780
--------
17851781
>>> da = xr.DataArray(
@@ -2948,10 +2944,6 @@ def mean(
29482944
:ref:`agg`
29492945
User guide on reduction or aggregation operations.
29502946
2951-
Notes
2952-
-----
2953-
Non-numeric variables will be removed prior to reducing.
2954-
29552947
Examples
29562948
--------
29572949
>>> da = xr.DataArray(
@@ -4231,8 +4223,6 @@ def mean(
42314223
Pass flox-specific keyword arguments in ``**kwargs``.
42324224
See the `flox documentation <https://flox.readthedocs.io>`_ for more.
42334225
4234-
Non-numeric variables will be removed prior to reducing.
4235-
42364226
Examples
42374227
--------
42384228
>>> da = xr.DataArray(
@@ -5729,8 +5719,6 @@ def mean(
57295719
Pass flox-specific keyword arguments in ``**kwargs``.
57305720
See the `flox documentation <https://flox.readthedocs.io>`_ for more.
57315721
5732-
Non-numeric variables will be removed prior to reducing.
5733-
57345722
Examples
57355723
--------
57365724
>>> da = xr.DataArray(
@@ -7188,8 +7176,6 @@ def mean(
71887176
Pass flox-specific keyword arguments in ``**kwargs``.
71897177
See the `flox documentation <https://flox.readthedocs.io>`_ for more.
71907178
7191-
Non-numeric variables will be removed prior to reducing.
7192-
71937179
Examples
71947180
--------
71957181
>>> da = xr.DataArray(
@@ -8578,8 +8564,6 @@ def mean(
85788564
Pass flox-specific keyword arguments in ``**kwargs``.
85798565
See the `flox documentation <https://flox.readthedocs.io>`_ for more.
85808566
8581-
Non-numeric variables will be removed prior to reducing.
8582-
85838567
Examples
85848568
--------
85858569
>>> da = xr.DataArray(

xarray/core/common.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2108,3 +2108,21 @@ def _contains_datetime_like_objects(var: T_Variable) -> bool:
21082108
np.datetime64, np.timedelta64, or cftime.datetime)
21092109
"""
21102110
return is_np_datetime_like(var.dtype) or contains_cftime_datetimes(var)
2111+
2112+
2113+
def _is_numeric_aggregatable_dtype(var: T_Variable) -> bool:
2114+
"""Check if a variable's dtype can be used in numeric aggregations like mean().
2115+
2116+
This includes:
2117+
- Numeric types (int, float, complex)
2118+
- Boolean type
2119+
- Datetime types (datetime64, timedelta64)
2120+
- Object arrays containing datetime-like objects (e.g., cftime)
2121+
"""
2122+
return (
2123+
np.issubdtype(var.dtype, np.number)
2124+
or (var.dtype == np.bool_)
2125+
or np.issubdtype(var.dtype, np.datetime64)
2126+
or np.issubdtype(var.dtype, np.timedelta64)
2127+
or _contains_cftime_datetimes(var._data)
2128+
)

xarray/core/dataset.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
from xarray.core.common import (
4141
DataWithCoords,
4242
_contains_datetime_like_objects,
43+
_is_numeric_aggregatable_dtype,
4344
get_chunksizes,
4445
)
4546
from xarray.core.coordinates import (
@@ -6847,8 +6848,7 @@ def reduce(
68476848
and (
68486849
not reduce_dims
68496850
or not numeric_only
6850-
or np.issubdtype(var.dtype, np.number)
6851-
or (var.dtype == np.bool_)
6851+
or _is_numeric_aggregatable_dtype(var)
68526852
)
68536853
):
68546854
# prefer to aggregate over axis=None rather than

xarray/core/groupby.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,11 @@
2222
DataArrayGroupByAggregations,
2323
DatasetGroupByAggregations,
2424
)
25-
from xarray.core.common import ImplementsArrayReduce, ImplementsDatasetReduce
25+
from xarray.core.common import (
26+
ImplementsArrayReduce,
27+
ImplementsDatasetReduce,
28+
_is_numeric_aggregatable_dtype,
29+
)
2630
from xarray.core.coordinates import Coordinates, coordinates_from_variable
2731
from xarray.core.duck_array_ops import where
2832
from xarray.core.formatting import format_array_flat
@@ -1068,7 +1072,7 @@ def _flox_reduce(
10681072
name: var
10691073
for name, var in variables.items()
10701074
if (
1071-
not (np.issubdtype(var.dtype, np.number) or (var.dtype == np.bool_))
1075+
not _is_numeric_aggregatable_dtype(var)
10721076
# this avoids dropping any levels of a MultiIndex, which raises
10731077
# a warning
10741078
and name not in midx_grouping_vars

xarray/namedarray/_aggregations.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -352,10 +352,6 @@ def mean(
352352
:ref:`agg`
353353
User guide on reduction or aggregation operations.
354354
355-
Notes
356-
-----
357-
Non-numeric variables will be removed prior to reducing.
358-
359355
Examples
360356
--------
361357
>>> from xarray.namedarray.core import NamedArray

xarray/tests/CLAUDE.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Testing Guidelines for xarray
2+
3+
## Handling Optional Dependencies
4+
5+
xarray has many optional dependencies that may not be available in all testing environments. Always use the standard decorators and patterns when writing tests that require specific dependencies.
6+
7+
### Standard Decorators
8+
9+
**ALWAYS use decorators** like `@requires_dask`, `@requires_cftime`, etc. instead of conditional `if` statements.
10+
11+
All available decorators are defined in `xarray/tests/__init__.py` (look for `requires_*` decorators).
12+
13+
### DO NOT use conditional imports or skipif
14+
15+
**WRONG - Do not do this:**
16+
17+
```python
18+
def test_mean_with_cftime():
19+
if has_dask: # WRONG!
20+
ds = ds.chunk({})
21+
result = ds.mean()
22+
```
23+
24+
**ALSO WRONG - Avoid pytest.mark.skipif in parametrize:**
25+
26+
```python
27+
@pytest.mark.parametrize(
28+
"chunk",
29+
[
30+
pytest.param(
31+
True, marks=pytest.mark.skipif(not has_dask, reason="requires dask")
32+
),
33+
False,
34+
],
35+
)
36+
def test_something(chunk): ...
37+
```
38+
39+
**CORRECT - Do this instead:**
40+
41+
```python
42+
def test_mean_with_cftime():
43+
# Test without dask
44+
result = ds.mean()
45+
46+
47+
@requires_dask
48+
def test_mean_with_cftime_dask():
49+
# Separate test for dask functionality
50+
ds = ds.chunk({})
51+
result = ds.mean()
52+
```
53+
54+
**OR for parametrized tests, split them:**
55+
56+
```python
57+
def test_something_without_dask():
58+
# Test the False case
59+
...
60+
61+
62+
@requires_dask
63+
def test_something_with_dask():
64+
# Test the True case with dask
65+
...
66+
```
67+
68+
### Multiple dependencies
69+
70+
When a test requires multiple optional dependencies:
71+
72+
```python
73+
@requires_dask
74+
@requires_scipy
75+
def test_interpolation_with_dask(): ...
76+
```
77+
78+
### Importing optional dependencies in tests
79+
80+
For imports within test functions, use `pytest.importorskip`:
81+
82+
```python
83+
def test_cftime_functionality():
84+
cftime = pytest.importorskip("cftime")
85+
# Now use cftime
86+
```
87+
88+
### Common patterns
89+
90+
1. **Split tests by dependency** - Don't mix optional dependency code with base functionality:
91+
92+
```python
93+
def test_base_functionality():
94+
# Core test without optional deps
95+
result = ds.mean()
96+
assert result is not None
97+
98+
99+
@requires_dask
100+
def test_dask_functionality():
101+
# Dask-specific test
102+
ds_chunked = ds.chunk({})
103+
result = ds_chunked.mean()
104+
assert result is not None
105+
```
106+
107+
2. **Use fixtures for dependency-specific setup**:
108+
109+
```python
110+
@pytest.fixture
111+
def dask_array():
112+
pytest.importorskip("dask.array")
113+
import dask.array as da
114+
115+
return da.from_array([1, 2, 3], chunks=2)
116+
```
117+
118+
3. **Check available implementations**:
119+
120+
```python
121+
from xarray.core.duck_array_ops import available_implementations
122+
123+
124+
@pytest.mark.parametrize("implementation", available_implementations())
125+
def test_with_available_backends(implementation): ...
126+
```
127+
128+
### Key Points
129+
130+
- CI environments intentionally exclude certain dependencies (e.g., `all-but-dask`, `bare-minimum`)
131+
- A test failing in "all-but-dask" because it uses dask is a test bug, not a CI issue
132+
- Look at similar existing tests for patterns to follow

0 commit comments

Comments
 (0)