-
Notifications
You must be signed in to change notification settings - Fork 23
Description
A feature request for new functionality for the subspace
and indices
methods. (xarray supports this functionality for subspacing but we don't, however it seems too useful not to want to steal!)
Picture the scene. You want to do a subspace on a specific data value for some coordinate but due to natural complexity of real-life data, it is a float with many digits to define, as it often is. At the moment in cf-python you have to specify that value exactly e.g. f.subspace(grid_latitude=7.480000078678131)
to subspace down to the grid_latitude
of 7.480000078678131
(assuming you want to use subspacing by metadata, let's assume you do, you could use indexing but that involves knowing/calculating the appropriate index). But it would be nice to be able to also request by some means 'the value nearest to' a specified value, in this case provide only 7.48
for the grid_latitude
to avoid having to put in the exact float, just a close-enough approximation.
xarray
supports such 'nearest neighbour' lookups, where you can specify a 'nearest'
neighbour method as a method
keyword, e.g. 7.48
to go to 7.480000078678131
as the nearest neighbour, plus a tolerance on inexact look-up e.g. 7.5
with +/- 0.1
tolerance to catch this same value that way.
I would like us to support directly the nearest neightbour match, and better advertise how to do the inexact subspace in two lines using our tolerance functions in a context manager.
We already have a new-ish 'halo' subspacing approach and are going to add a 'bounding box' query too. It would be good to consider how to include the above 'nearest neighbour' to the new possibilities for more flexible subspacing. I have made some suggestions below to get the conversation started.
Example
Example set up of a subspaces we'd like to provide the above functionality to simplify:
>>> print(f)
Field: relative_humidity (ncvar%UM_m01s16i204_vn405)
----------------------------------------------------
Data : relative_humidity(air_pressure(17), grid_latitude(30), grid_longitude(24)) %
Cell methods : time(1): mean
Dimension coords: time(1) = [1978-12-16 12:00:00] gregorian
: air_pressure(17) = [1000.0000610351562, ..., 10.0] hPa
: grid_latitude(30) = [7.480000078678131, ..., -5.279999852180481] degrees
: grid_longitude(24) = [-5.720003664493561, ..., 4.399996280670166] degrees
Auxiliary coords: latitude(grid_latitude(30), grid_longitude(24)) = [[61.004354306111864, ..., 48.51422609871432]] degrees_north
: longitude(grid_latitude(30), grid_longitude(24)) = [[-13.762685427418687, ..., 4.622216504491947]] degrees_east
Coord references: grid_mapping_name:rotated_latitude_longitude
>>> f1 = f.subspace(air_pressure=1000.0000610351562)
>>> f2 = f.subspace(grid_latitude=7.480000078678131)
Suggestion for API to provide support
Add a new 'mode' with string identifier 'nearest' to do the nearest neighbour case, e.g. for the above example this would work:
f.subspace(air_pressure=1000, mode="nearest")
The latter case is controlled by our tolerance functions cf.atol
and cf.rtol
but there are no examples in the documentation to advertise how simply one can control the subspacing tolerance with:
with cf.atol(1e-2):
f.subspace(grid_latitude=7.480000078678131)
so I'd also like us to showcase examples of using a context manager like above as a two-line means to easily do inexact subspacing.