Allow regridding for very large grids

Regridding to and/or from a very large grid (in terms of number of grid points) can fail due to running out of memory.

This occurs during the call to `esmpy.Regrid` which we use to calculate the regridding weights (https://github.com/NCAS-CMS/cf-python/blob/v3.18.0/cf/regrid/regrid.py#L2434). The problem isn't (so much) the size of the final weights matrix (although that can be quite large), more that `esmpy.Regrid` uses up a lot of memory for temporary arrays.

A real use case using 'linear' regridding has destination grid size 181440000, resulting in a sparse matrix representation of the weights taking up $$8.79\ GiB \equiv 4 \times 181440000 \times 3.25 \times 4 / 2^{30}\ GiB$$ (have a poke round inside the guts of a `scipy` CSR sparse array to see where that number comes from). The peak memory memory required by `esmpy` to create this is many times that! 

A work around for this is to calculate weights separately for non-overlapping partitions of the destination (not the source) grid, and then combining the weights from each of these partitions to create the final weights  to partition the final weights matrix.

We can add this as an option to the regrid API:

```python
>>> # Calculate the weights using 10 separate partitions
>>> f.regrids(g, 'linear', dst_grid_partitions=10)
>>> # Calculate the weights using the maximum possible number of partitions
>>> f.regrids(g, 'linear', dst_grid_partitions='maximum')
```

Why not do this always? Because it is often not required, and there can be a large speed performance hit for doing this. (Therefore it could be recommended that when using `dst_grid_partitions` you also save the resulting weights to a file for future use.)

----

To see how this works, consider a nearest neighbour regridding from a size 6 source grid to a size 5 destination grid. The `5 x 6` weights matrix for this could look like:

```
             I = 6
        + - - - - - - 
        | 1 0 0 0 0 0 
        | 0 0 0 0 1 0
  J = 5 | 0 0 1 0 0 0
        | 0 0 0 1 0 0
        | 0 0 0 0 0 1
```
This is clearly just a concatenation of the following `3 x 6` and `2 x 6` weights matrices:
```
             I = 6
        + - - - - - - 
        | 1 0 0 0 0 0 
 J0 = 3 | 0 0 0 0 1 0
        | 0 0 1 0 0 0
        + - - - - - - 
 J1 = 2 | 0 0 0 1 0 0
        | 0 0 0 0 0 1
```
The key is that two calls to `esmpy.Regrid` (one for the `J0` partition, one of the `J1` partition) create the correct rows of this matrix for any subset of the destination grid, provided:
1. the source grid is left un-partitioned, 
2. the regridded result at destination grid cell _j_ depends only on grid cell _j_, and not any other destination grid cells. If this were the case, as happens for `conservative_2d` and `nearest_dtos` methods, then some weights would be incorrect, as when they are calculated they "don't know" about destination grid cells in any of the other partitions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow regridding for very large grids #878

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow regridding for very large grids #878

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions