Skip to content

Commit 11f5c7d

Browse files
committed
doc tweaks
1 parent 4fc3d11 commit 11f5c7d

File tree

3 files changed

+42
-84
lines changed

3 files changed

+42
-84
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,6 @@ pip-wheel-metadata
4242

4343
# numba
4444
*/__pycache__/*
45+
46+
# docs
47+
site

README.md

Lines changed: 7 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -22,94 +22,18 @@ from xarray_multiscale import multiscale, windowed_mean
2222
import numpy as np
2323

2424
data = np.arange(4)
25-
multiscale(data, windowed_mean, 2)
25+
print(*multiscale(data, windowed_mean, 2), sep='\n')
2626
"""
27-
[<xarray.DataArray (dim_0: 4)>
27+
<xarray.DataArray 's0' (dim_0: 4)> Size: 32B
2828
array([0, 1, 2, 3])
2929
Coordinates:
30-
* dim_0 (dim_0) float64 0.0 1.0 2.0 3.0,
31-
<xarray.DataArray (dim_0: 2)>
30+
* dim_0 (dim_0) float64 32B 0.0 1.0 2.0 3.0
31+
32+
<xarray.DataArray 's1' (dim_0: 2)> Size: 16B
3233
array([0, 2])
3334
Coordinates:
34-
* dim_0 (dim_0) float64 0.5 2.5]
35+
* dim_0 (dim_0) float64 16B 0.5 2.5
3536
"""
3637
```
3738

38-
39-
By default, the values of the downsampled arrays are cast to the same data type as the input. This behavior can be changed with the ``preserve_dtype`` keyword argument to ``multiscale``.
40-
41-
Generate a multiscale representation of an ``xarray.DataArray``:
42-
43-
44-
```python
45-
from xarray_multiscale import multiscale, windowed_mean
46-
from xarray import DataArray
47-
import numpy as np
48-
49-
data = np.arange(16).reshape((4,4))
50-
coords = (DataArray(np.arange(data.shape[0]), dims=('y',), attrs={'units' : 'm'}),
51-
DataArray(np.arange(data.shape[0]), dims=('x',), attrs={'units' : 'm'}))
52-
53-
arr = DataArray(data, coords)
54-
multiscale(arr, windowed_mean, (2,2))
55-
"""
56-
[<xarray.DataArray (y: 4, x: 4)>
57-
array([[ 0, 1, 2, 3],
58-
[ 4, 5, 6, 7],
59-
[ 8, 9, 10, 11],
60-
[12, 13, 14, 15]])
61-
Coordinates:
62-
* y (y) int64 0 1 2 3
63-
* x (x) int64 0 1 2 3, <xarray.DataArray (y: 2, x: 2)>
64-
array([[ 2, 4],
65-
[10, 12]])
66-
Coordinates:
67-
* y (y) float64 0.5 2.5
68-
* x (x) float64 0.5 2.5]
69-
"""
70-
```
71-
72-
Dask arrays work too. Note the control over output chunks via the ``chunks`` keyword argument.
73-
74-
```python
75-
from xarray_multiscale import multiscale, windowed_mean
76-
import dask.array as da
77-
78-
arr = da.random.randint(0, 255, (10,10,10))
79-
multiscale(arr, windowed_mean, 2, chunks=2)
80-
"""
81-
[<xarray.DataArray 'randint-f83260ed51a44f24aeccd95bc23e73ae' (dim_0: 10,
82-
dim_1: 10,
83-
dim_2: 10)>
84-
dask.array<rechunk-merge, shape=(10, 10, 10), dtype=int64, chunksize=(2, 2, 2), chunktype=numpy.ndarray>
85-
Coordinates:
86-
* dim_0 (dim_0) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
87-
* dim_1 (dim_1) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
88-
* dim_2 (dim_2) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0,
89-
<xarray.DataArray 'astype-0c3c3e397345ddeedff07ecf2d9fad17' (dim_0: 5,
90-
dim_1: 5, dim_2: 5)>
91-
dask.array<rechunk-merge, shape=(5, 5, 5), dtype=int64, chunksize=(2, 2, 2), chunktype=numpy.ndarray>
92-
Coordinates:
93-
* dim_0 (dim_0) float64 0.5 2.5 4.5 6.5 8.5
94-
* dim_1 (dim_1) float64 0.5 2.5 4.5 6.5 8.5
95-
* dim_2 (dim_2) float64 0.5 2.5 4.5 6.5 8.5,
96-
<xarray.DataArray 'astype-675175a39bec4fea06b8668053458285' (dim_0: 2,
97-
dim_1: 2, dim_2: 2)>
98-
dask.array<astype, shape=(2, 2, 2), dtype=int64, chunksize=(2, 2, 2), chunktype=numpy.ndarray>
99-
Coordinates:
100-
* dim_0 (dim_0) float64 1.5 5.5
101-
* dim_1 (dim_1) float64 1.5 5.5
102-
* dim_2 (dim_2) float64 1.5 5.5]
103-
"""
104-
```
105-
106-
### Caveats
107-
108-
* Arrays that are not evenly divisible by the downsampling factors will be trimmed as needed. If this behavior is undesirable, consider padding your array appropriately prior to downsampling.
109-
* For chunked arrays (e.g., dask arrays), the current implementation divides the input data into *contiguous* chunks. This means that attempting to use downsampling schemes based on sliding windowed smoothing will produce edge artifacts.
110-
111-
### Development
112-
113-
This project is developed using [`hatch`](https://hatch.pypa.io/latest/).
114-
Run tests with `hatch run test:pytest`.
115-
Serve docs with `hatch run docs:serve`.
39+
read more in the [project documentation](https://JaneliaSciComp.github.io/xarray-multiscale/).

docs/index.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,38 @@ Simple tools for creating multiscale representations of large images.
1010

1111
Many image processing applications benefit from representing images at multiple scales (also known as [image pyramids](https://en.wikipedia.org/wiki/Pyramid_(image_processing)). This package provides tools for generating lazy multiscale representations of N-dimensional data using [`xarray`](http://xarray.pydata.org/en/stable/) to ensure that the downsampled images have the correct coordinates.
1212

13-
Why are coordinates important for this application? Because a downsampled image is typically scaled and *translated* relative to the source image. Without a coordinate-aware representation of the data, the scaling and translation information is easily lost.
13+
### Coordinates matter when you downsample images
14+
15+
It's obvious that downsampling an image applies a scaling transformation, i.e. downsampling increases the distance between image samples. This is the whole purpose of downsampling the image. But it is less obvious that most downsampling operations also apply a *translation transformation* -- downsampling an image (generally) shifts the origin of the output relative to the source.
16+
17+
In signal processing terms, image downsampling combines an image filtering step (blending local intensities) with a resampling step (sampling intensities at a set of positions in the signal). When you resample an image, you get to choose which points to resample on, and the best choice for most simple downsampling routines is to resample on points that are slightly translated relative to the original image. For simple windowed downsampling, this means that the first element of the downsampled image lies
18+
at the center (i.e., the mean) of the coordinate values of the window.
19+
20+
We can illustrate this with some simple examples:
21+
22+
```
23+
2x windowed downsampling, in one dimension:
24+
25+
source coordinates: | 0 | 1 |
26+
downsampled coordinates: | 0.5 |
27+
```
28+
29+
```
30+
3x windowed downsampling, in two dimensions:
31+
32+
source coordinates: | (0,0) | (0,1) | (0,2) |
33+
| (1,0) | (1,1) | (1,2) |
34+
| (2,0) | (2,1) | (2,2) |
35+
36+
downsampled coordinates: | |
37+
| (1,1) |
38+
| |
39+
40+
```
41+
42+
Another way of thinking about this is that if you downsample an arbitrarily large image to a single value, then the only sensible place to localize that value is at the center of the image. Thus, incrementally downsampling slightly shifts the downsampled image toward that point.
43+
44+
Why should you care? If you work with images where the coordinates matter (for example, images recorded from scientific instruments), then you should care about keeping track of those coordinates. Tools like numpy or scikit-image make it very easy to ignore the coordinates of your image. These tools model images as simple arrays, and from the array perspective `data[0,0]` and `downsampled_data[0,0]` lie on the same position in space because they take the same array index. However, `downsampled_data[0,0]` is almost certainly shifted relative to `data[0,0]`. Coordinate-blind tools like `scikit-image` force your to track the coordinates on your own, which is a recipe for mistakes. This is the value of `xarray`. By explicitly modelling coordinates alongside data values, `xarray` ensures that you never lose track of where your data comes from, which is why `xarray-multiscale` uses it.
1445

1546
### Who needs this
1647

0 commit comments

Comments
 (0)