Skip to content

Commit 86e101c

Browse files
authored
Add Overlapping trick to docs. (#130)
1 parent bed52b1 commit 86e101c

File tree

2 files changed

+62
-0
lines changed

2 files changed

+62
-0
lines changed

docs/source/user-stories.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
.. toctree::
55
:maxdepth: 1
66
7+
user-stories/overlaps.md
78
user-stories/climatology.ipynb
89
user-stories/climatology-hourly.ipynb
910
user-stories/custom-aggregations.ipynb

docs/source/user-stories/overlaps.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
format_name: myst
5+
kernelspec:
6+
display_name: Python 3
7+
name: python3
8+
---
9+
10+
```{eval-rst}
11+
.. currentmodule:: flox
12+
```
13+
14+
# Overlapping Groups
15+
16+
Generally group-by problems involve non-overlapping groups. Consider the following group of labels:
17+
18+
```{code-cell}
19+
import numpy as np
20+
import xarray as xr
21+
22+
from flox.xarray import xarray_reduce
23+
24+
labels = xr.DataArray(
25+
[1, 2, 3, 1, 2, 3, 0, 0, 0],
26+
dims="x",
27+
name="label",
28+
)
29+
labels
30+
```
31+
32+
These labels are non-overlapping. So when we reduce this data array over those labels along `x`
33+
```{code-cell}
34+
da = xr.ones_like(labels)
35+
da
36+
```
37+
we get (note the reduction over `x` is implicit here):
38+
39+
```{code-cell}
40+
xarray_reduce(da, labels, func="sum")
41+
```
42+
43+
Now let's calculate the `sum` where `labels` is either `1` or `2`. The trick is to add a new dimension to `labels` of size `2` and assign a new label `4` in the appropriate locations.
44+
```{code-cell}
45+
# assign expanded=4 where label == 1 or 2, and -1 otherwise
46+
newlabels = xr.where(labels.isin([1, 2]), 4, -1)
47+
48+
# alternative:
49+
expanded = xr.concat([labels, newlabels], dim="y")
50+
expanded
51+
```
52+
53+
Now we reduce over `x` _and_ `y` (again implicitly) to get the appropriate sum under `label=4` (and `label=-1`). We can discard the value accumulated under `label=-1` later.
54+
```{code-cell}
55+
xarray_reduce(da, expanded, func="sum")
56+
```
57+
58+
This technique generalizes to more complicated aggregations. The trick is to
59+
- generate appropriate labels
60+
- concatenate these new labels along a new dimension (`y`) absent on the object being reduced (`da`), and
61+
- reduce over that new dimension in addition to any others.

0 commit comments

Comments
 (0)