-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Is your feature request related to a problem?
To enable chipping/batching datasets with different spatial resolutions, each dataset (either an xarray.DataArray
or xarray.Dataset
) currently needs to be sliced separately in xbatcher v0.1.0. The key limitation is that xbatcher assumes every xarray.DataArray
'layer' to have the same resolution, and xbatcher.BatchGenerator
would use xarray's .isel
method to index and slice along the specified dimensions.
xbatcher/xbatcher/generators.py
Lines 41 to 43 in 72ce00f
for slices in itertools.product(*dim_slices): | |
selector = {key: slice for key, slice in zip(dims, slices)} | |
yield ds.isel(**selector) |
However, this is not always the case, for example:
- Sentinel-2's optical bands can be 10m, 20m or 60m in spatial resolution (unfortunately, people usually resample the 20m and 60m bands to 10m just to make it 'easy' to do chipping)
- For super-resolution tasks, where a low spatial resolution image is passed through a model to produce a high spatial resolution image, there would be a need to create low resolution chips mapped to high resolution chips (e.g. https://github.com/carbonplan/cmip6-downscaling)
Describe the solution you'd like
Ideally, there would be:
- A way to store multi-resolution datasets in a single datacube-like structure, while having each layer stacked in the same geographical (or n-dimensional space). This is something
datatree
might be able to handle, i.e. have each data layer with a different resolution be on a separate node of the datatree.
---- Sentinel-2 10m bands
|--- 20m bands
|--- 60m bands
- From the multi-resolution data object,
xbatcher
would then need to have a way of slicing these multi-resolution datasets. MaybeDataTree.isel
could work?
Describe alternatives you've considered
Keep xbatcher
to be focused on xarray.DataArray
and xarray.Dataset
only (and not bring in xarray.DataTree
). Users would then need to implement their own way of slicing multi-resolution datasets themselves in an ad-hoc way.
Additional context
There was some discussion before at microsoft/torchgeo#279 about sampling in pixel/image units or coordinate reference system (CRS) units. If working with multi-resolution datasets though, sampling in pixel/images would require some math (e.g. 20 pixels for a 500m resolution grid would be 10 pixels for a 1000m resolution grid). The CRS based indexing method however, would require something like https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_dataset.RasterDataset.clip_box.
Other references: