Skip to content

Slicing multiple data layers or band channels with different spatial resolutions #93

@weiji14

Description

@weiji14

Is your feature request related to a problem?

To enable chipping/batching datasets with different spatial resolutions, each dataset (either an xarray.DataArray or xarray.Dataset) currently needs to be sliced separately in xbatcher v0.1.0. The key limitation is that xbatcher assumes every xarray.DataArray 'layer' to have the same resolution, and xbatcher.BatchGenerator would use xarray's .isel method to index and slice along the specified dimensions.

for slices in itertools.product(*dim_slices):
selector = {key: slice for key, slice in zip(dims, slices)}
yield ds.isel(**selector)

However, this is not always the case, for example:

  • Sentinel-2's optical bands can be 10m, 20m or 60m in spatial resolution (unfortunately, people usually resample the 20m and 60m bands to 10m just to make it 'easy' to do chipping)
  • For super-resolution tasks, where a low spatial resolution image is passed through a model to produce a high spatial resolution image, there would be a need to create low resolution chips mapped to high resolution chips (e.g. https://github.com/carbonplan/cmip6-downscaling)

Describe the solution you'd like

Ideally, there would be:

  1. A way to store multi-resolution datasets in a single datacube-like structure, while having each layer stacked in the same geographical (or n-dimensional space). This is something datatree might be able to handle, i.e. have each data layer with a different resolution be on a separate node of the datatree.
---- Sentinel-2 10m bands
|---            20m bands
|---            60m bands
  1. From the multi-resolution data object, xbatcher would then need to have a way of slicing these multi-resolution datasets. Maybe DataTree.isel could work?

Describe alternatives you've considered

Keep xbatcher to be focused on xarray.DataArray and xarray.Dataset only (and not bring in xarray.DataTree). Users would then need to implement their own way of slicing multi-resolution datasets themselves in an ad-hoc way.

Additional context

There was some discussion before at microsoft/torchgeo#279 about sampling in pixel/image units or coordinate reference system (CRS) units. If working with multi-resolution datasets though, sampling in pixel/images would require some math (e.g. 20 pixels for a 500m resolution grid would be 10 pixels for a 1000m resolution grid). The CRS based indexing method however, would require something like https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_dataset.RasterDataset.clip_box.

Other references:

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions