Skip to content

Support subspacing to nearest neighbour of provided value #835

@sadielbartholomew

Description

@sadielbartholomew

A feature request for new functionality for the subspace and indices methods. (xarray supports this functionality for subspacing but we don't, however it seems too useful not to want to steal!)

Picture the scene. You want to do a subspace on a specific data value for some coordinate but due to natural complexity of real-life data, it is a float with many digits to define, as it often is. At the moment in cf-python you have to specify that value exactly e.g. f.subspace(grid_latitude=7.480000078678131) to subspace down to the grid_latitude of 7.480000078678131 (assuming you want to use subspacing by metadata, let's assume you do, you could use indexing but that involves knowing/calculating the appropriate index). But it would be nice to be able to also request by some means 'the value nearest to' a specified value, in this case provide only 7.48 for the grid_latitude to avoid having to put in the exact float, just a close-enough approximation.

xarray supports such 'nearest neighbour' lookups, where you can specify a 'nearest' neighbour method as a method keyword, e.g. 7.48 to go to 7.480000078678131 as the nearest neighbour, plus a tolerance on inexact look-up e.g. 7.5 with +/- 0.1 tolerance to catch this same value that way.

I would like us to support directly the nearest neightbour match, and better advertise how to do the inexact subspace in two lines using our tolerance functions in a context manager.

We already have a new-ish 'halo' subspacing approach and are going to add a 'bounding box' query too. It would be good to consider how to include the above 'nearest neighbour' to the new possibilities for more flexible subspacing. I have made some suggestions below to get the conversation started.

Example

Example set up of a subspaces we'd like to provide the above functionality to simplify:

>>> print(f)
Field: relative_humidity (ncvar%UM_m01s16i204_vn405)
----------------------------------------------------
Data            : relative_humidity(air_pressure(17), grid_latitude(30), grid_longitude(24)) %
Cell methods    : time(1): mean
Dimension coords: time(1) = [1978-12-16 12:00:00] gregorian
                : air_pressure(17) = [1000.0000610351562, ..., 10.0] hPa
                : grid_latitude(30) = [7.480000078678131, ..., -5.279999852180481] degrees
                : grid_longitude(24) = [-5.720003664493561, ..., 4.399996280670166] degrees
Auxiliary coords: latitude(grid_latitude(30), grid_longitude(24)) = [[61.004354306111864, ..., 48.51422609871432]] degrees_north
                : longitude(grid_latitude(30), grid_longitude(24)) = [[-13.762685427418687, ..., 4.622216504491947]] degrees_east
Coord references: grid_mapping_name:rotated_latitude_longitude
>>> f1 = f.subspace(air_pressure=1000.0000610351562)
>>> f2 = f.subspace(grid_latitude=7.480000078678131)

Suggestion for API to provide support

Add a new 'mode' with string identifier 'nearest' to do the nearest neighbour case, e.g. for the above example this would work:

f.subspace(air_pressure=1000, mode="nearest")

The latter case is controlled by our tolerance functions cf.atol and cf.rtol but there are no examples in the documentation to advertise how simply one can control the subspacing tolerance with:

with cf.atol(1e-2):
    f.subspace(grid_latitude=7.480000078678131)

so I'd also like us to showcase examples of using a context manager like above as a two-line means to easily do inexact subspacing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions