Skip to content

Commit 97d408e

Browse files
dcherianclaude
andauthored
Update minimum Python version from 3.10 to 3.11 (#453)
Co-authored-by: Claude <[email protected]>
1 parent 84c7a7f commit 97d408e

File tree

11 files changed

+176
-29
lines changed

11 files changed

+176
-29
lines changed

.github/workflows/ci.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
matrix:
2727
os: ["ubuntu-latest"]
2828
env: ["environment"]
29-
python-version: ["3.10", "3.13"]
29+
python-version: ["3.11", "3.13"]
3030
include:
3131
- os: "windows-latest"
3232
env: "environment"
@@ -36,10 +36,10 @@ jobs:
3636
python-version: "3.13"
3737
- os: "ubuntu-latest"
3838
env: "minimal-requirements"
39-
python-version: "3.10"
39+
python-version: "3.11"
4040
- os: "windows-latest"
4141
env: "env-numpy1"
42-
python-version: "3.10"
42+
python-version: "3.11"
4343
steps:
4444
- uses: actions/checkout@v4
4545
with:

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,3 +110,6 @@ venv.bak/
110110
.mypy_cache/
111111

112112
.DS_Store
113+
114+
# Git worktrees
115+
worktrees/

CLAUDE.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
**flox** is a Python library providing fast GroupBy reduction operations for `dask.array`. It implements parallel-friendly GroupBy reductions using the MapReduce paradigm and integrates with xarray for labeled multidimensional arrays.
8+
9+
## Development Commands
10+
11+
### Environment Setup
12+
13+
```bash
14+
# Create and activate development environment
15+
mamba env create -f ci/environment.yml
16+
conda activate flox-tests
17+
python -m pip install --no-deps -e .
18+
```
19+
20+
### Testing
21+
22+
```bash
23+
# Run full test suite (as used in CI)
24+
pytest --durations=20 --durations-min=0.5 -n auto --cov=./ --cov-report=xml --hypothesis-profile ci
25+
26+
# Run tests without coverage
27+
pytest -n auto
28+
29+
# Run single test file
30+
pytest tests/test_core.py
31+
32+
# Run specific test
33+
pytest tests/test_core.py::test_function_name
34+
```
35+
36+
### Code Quality
37+
38+
```bash
39+
# Run all pre-commit hooks
40+
pre-commit run --all-files
41+
42+
# Format code with ruff
43+
ruff format .
44+
45+
# Lint and fix with ruff
46+
ruff check --fix .
47+
48+
# Type checking
49+
mypy flox/
50+
51+
# Spell checking
52+
codespell
53+
```
54+
55+
### Benchmarking
56+
57+
```bash
58+
# Performance benchmarking (from asv_bench/ directory)
59+
cd asv_bench
60+
asv run
61+
asv publish
62+
asv preview
63+
```
64+
65+
## CI Configuration
66+
67+
### GitHub Workflows (`.github/workflows/`)
68+
69+
- **`ci.yaml`** - Main CI pipeline with test matrix across Python versions (3.11, 3.13) and operating systems (Ubuntu, Windows)
70+
- **`ci-additional.yaml`** - Additional CI jobs including doctests and mypy type checking
71+
- **`upstream-dev-ci.yaml`** - Tests against development versions of upstream dependencies
72+
- **`pypi.yaml`** - PyPI publishing workflow
73+
- **`testpypi-release.yaml`** - Test PyPI release workflow
74+
- **`benchmarks.yml`** - Performance benchmarking workflow
75+
76+
### Environment Files (`ci/`)
77+
78+
- **`environment.yml`** - Main test environment with all dependencies
79+
- **`minimal-requirements.yml`** - Minimal requirements testing (pandas==1.5, numpy==1.22, etc.)
80+
- **`no-dask.yml`** - Testing without dask dependency
81+
- **`no-numba.yml`** - Testing without numba dependency
82+
- **`no-xarray.yml`** - Testing without xarray dependency
83+
- **`env-numpy1.yml`** - Testing with numpy\<2 constraint
84+
- **`docs.yml`** - Documentation building environment
85+
- **`upstream-dev-env.yml`** - Development versions of dependencies
86+
- **`benchmark.yml`** - Benchmarking environment
87+
88+
### ReadTheDocs Configuration
89+
90+
- **`.readthedocs.yml`** - ReadTheDocs configuration using `ci/docs.yml` environment
91+
92+
## Code Architecture
93+
94+
### Core Modules (`flox/`)
95+
96+
- **`core.py`** - Main reduction logic, central orchestrator of groupby operations
97+
- **`aggregations.py`** - Defines the `Aggregation` class and built-in aggregation operations
98+
- **`xarray.py`** - Primary integration with xarray, provides `xarray_reduce()` API
99+
- **`dask_array_ops.py`** - Dask-specific array operations and optimizations
100+
101+
### Aggregation Backends (`flox/aggregate_*.py`)
102+
103+
- **`aggregate_flox.py`** - Native flox implementation
104+
- **`aggregate_npg.py`** - numpy-groupies backend
105+
- **`aggregate_numbagg.py`** - numbagg backend for JIT-compiled operations
106+
- **`aggregate_sparse.py`** - Support for sparse arrays
107+
108+
### Utilities
109+
110+
- **`cache.py`** - Caching mechanisms for performance
111+
- **`visualize.py`** - Tools for visualizing groupby operations
112+
- **`lib.py`** - General utility functions
113+
- **`xrutils.py`** & **`xrdtypes.py`** - xarray-specific utilities and types
114+
115+
### Main APIs
116+
117+
- `flox.groupby_reduce()` - Pure dask array interface
118+
- `flox.xarray.xarray_reduce()` - Pure xarray interface
119+
120+
## Key Design Patterns
121+
122+
**Engine Selection**: The library supports multiple computation backends ("flox", "numpy", "numbagg") that can be chosen based on data characteristics and performance requirements.
123+
124+
**MapReduce Strategy**: Implements groupby reductions using a two-stage approach (blockwise + tree reduction) to avoid expensive sort/shuffle operations in parallel computing.
125+
126+
**Chunking Intelligence**: Automatically rechunks data to optimize groupby operations, particularly important for the current `auto-blockwise-rechunk` branch.
127+
128+
**Integration Testing**: Extensive testing against xarray's groupby functionality to ensure compatibility with the broader scientific Python ecosystem.
129+
130+
## Testing Configuration
131+
132+
- **Framework**: pytest with coverage, parallel execution (pytest-xdist), and property-based testing (hypothesis)
133+
- **Coverage Target**: 95%
134+
- **Test Environments**: Multiple conda environments test optional dependencies (no-dask, no-numba, no-xarray)
135+
- **CI Matrices**: Tests across Python 3.11-3.13, Ubuntu/Windows, multiple dependency configurations
136+
137+
## Dependencies
138+
139+
**Core**: pandas>=1.5, numpy>=1.22, numpy_groupies>=0.9.19, scipy>=1.9, toolz, packaging>=21.3
140+
141+
**Optional**: cachey, dask, numba, numbagg, xarray (enable with `pip install flox[all]`)
142+
143+
## Development Notes
144+
145+
- Uses `setuptools_scm` for automatic versioning from git tags
146+
- Heavy emphasis on performance with ASV benchmarking infrastructure
147+
- Type hints throughout with mypy checking
148+
- Pre-commit hooks enforce code quality (ruff, prettier, codespell)
149+
- Integration testing with xarray upstream development branch
150+
- **Python Support**: Minimum version 3.11 (updated from 3.10)
151+
- **Git Worktrees**: `worktrees/` directory is ignored for development workflows

ci/docs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ dependencies:
77
- dask-core
88
- pip
99
- xarray
10-
- numpy>=1.22
10+
- numpy>=1.26
1111
- scipy
1212
- numpydoc
1313
- numpy_groupies>=0.9.19

ci/environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ dependencies:
99
- cubed>=0.20.0
1010
- dask-core
1111
- pandas
12-
- numpy>=1.22
12+
- numpy>=1.26
1313
- scipy
1414
- sparse
1515
- lxml # for mypy coverage report

ci/minimal-requirements.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ dependencies:
1010
- pytest-pretty
1111
- pytest-xdist
1212
- syrupy
13-
- numpy==1.22
14-
- scipy==1.9.0
13+
- numpy==1.26
14+
- scipy==1.12
1515
- numpy_groupies==0.9.19
16-
- pandas==1.5
16+
- pandas==2.1
1717
- pooch
1818
- toolz

ci/no-dask.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ dependencies:
66
- pandas
77
- hypothesis
88
- cftime
9-
- numpy>=1.22
9+
- numpy>=1.26
1010
- scipy
1111
- sparse
1212
- pip

ci/no-numba.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ dependencies:
99
- dask-core
1010
- hypothesis
1111
- pandas
12-
- numpy>=1.22
12+
- numpy>=1.26
1313
- scipy
1414
- sparse
1515
- lxml # for mypy coverage report

ci/no-xarray.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- codecov
66
- syrupy
77
- pandas
8-
- numpy>=1.22
8+
- numpy>=1.26
99
- scipy
1010
- sparse
1111
- pip

flox/core.py

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
import logging
77
import math
88
import operator
9-
import sys
109
import warnings
1110
from collections import namedtuple
1211
from collections.abc import Callable, Sequence
@@ -71,13 +70,6 @@
7170
HAS_SPARSE = module_available("sparse")
7271

7372
if TYPE_CHECKING:
74-
try:
75-
if sys.version_info < (3, 11):
76-
from typing_extensions import Unpack
77-
else:
78-
from typing import Unpack
79-
except (ModuleNotFoundError, ImportError):
80-
Unpack: Any # type: ignore[no-redef]
8173
from .types import CubedArray, DaskArray, Graph
8274

8375
T_DuckArray: TypeAlias = np.ndarray | DaskArray | CubedArray # Any ?
@@ -2500,7 +2492,7 @@ def groupby_reduce(
25002492
engine: T_EngineOpt = None,
25012493
reindex: ReindexStrategy | bool | None = None,
25022494
finalize_kwargs: dict[Any, Any] | None = None,
2503-
) -> tuple[DaskArray, Unpack[tuple[np.ndarray | DaskArray, ...]]]:
2495+
) -> tuple[DaskArray, *tuple[np.ndarray | DaskArray, ...]]:
25042496
"""
25052497
GroupBy reductions using tree reductions for dask.array
25062498

0 commit comments

Comments
 (0)