Skip to content

Allow regridding for very large gridsΒ #878

@davidhassell

Description

@davidhassell

Regridding to and/or from a very large grid (in terms of number of grid points) can fail due to running out of memory.

This occurs during the call to esmpy.Regrid which we use to calculate the regridding weights (https://github.com/NCAS-CMS/cf-python/blob/v3.18.0/cf/regrid/regrid.py#L2434). The problem isn't (so much) the size of the final weights matrix (although that can be quite large), more that esmpy.Regrid uses up a lot of memory for temporary arrays.

A real use case using 'linear' regridding has destination grid size 181440000, resulting in a sparse matrix representation of the weights taking up $$8.79\ GiB \equiv 4 \times 181440000 \times 3.25 \times 4 / 2^{30}\ GiB$$ (have a poke round inside the guts of a scipy CSR sparse array to see where that number comes from). The peak memory memory required by esmpy to create this is many times that!

A work around for this is to calculate weights separately for non-overlapping partitions of the destination (not the source) grid, and then combining the weights from each of these partitions to create the final weights to partition the final weights matrix.

We can add this as an option to the regrid API:

>>> # Calculate the weights using 10 separate partitions
>>> f.regrids(g, 'linear', dst_grid_partitions=10)
>>> # Calculate the weights using the maximum possible number of partitions
>>> f.regrids(g, 'linear', dst_grid_partitions='maximum')

Why not do this always? Because it is often not required, and there can be a large speed performance hit for doing this. (Therefore it could be recommended that when using dst_grid_partitions you also save the resulting weights to a file for future use.)


To see how this works, consider a nearest neighbour regridding from a size 6 source grid to a size 5 destination grid. The 5 x 6 weights matrix for this could look like:

             I = 6
        + - - - - - - 
        | 1 0 0 0 0 0 
        | 0 0 0 0 1 0
  J = 5 | 0 0 1 0 0 0
        | 0 0 0 1 0 0
        | 0 0 0 0 0 1

This is clearly just a concatenation of the following 3 x 6 and 2 x 6 weights matrices:

             I = 6
        + - - - - - - 
        | 1 0 0 0 0 0 
 J0 = 3 | 0 0 0 0 1 0
        | 0 0 1 0 0 0
        + - - - - - - 
 J1 = 2 | 0 0 0 1 0 0
        | 0 0 0 0 0 1

The key is that two calls to esmpy.Regrid (one for the J0 partition, one of the J1 partition) create the correct rows of this matrix for any subset of the destination grid, provided:

  1. the source grid is left un-partitioned,
  2. the regridded result at destination grid cell j depends only on grid cell j, and not any other destination grid cells. If this were the case, as happens for conservative_2d and nearest_dtos methods, then some weights would be incorrect, as when they are calculated they "don't know" about destination grid cells in any of the other partitions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestregriddingRelating to regridding operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions