Skip to content

Commit 2ef5940

Browse files
authored
Merge dataless (#6741)
* Initial WIP for dataless merges -- cannot yet merge datafull+dataless. * Starting tests. * Functioning backstop: merge can pass-through dataless, but not actually merge them. * Dataless merge, combine dataless with/without dataful. * Tidy awkward layout in test. * Ensure that cube.shape can only be a tuple (or None). * Make test_merge check against dataless input in all its tests. * Improve tests, and test for lazy merge result. * Fix typo. * Expand documentation. * Fix broken ref + tweak whatsnew. * Fixes following implementation of dataless save-and-load (#6739). * Remove redundant checks. * Make make_gridcube() dataless, and improve documentation cross-refs. * Review changes: small fixes to docs. * Use the intended dtype for data of all-masked arrays.
1 parent 2adb411 commit 2ef5940

File tree

12 files changed

+408
-133
lines changed

12 files changed

+408
-133
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
.. _dataless-cubes:
2+
3+
==============
4+
Dataless Cubes
5+
==============
6+
It is possible for a cube to exist without a data payload.
7+
In this case ``cube.data`` is ``None``, instead of containing an array (real or lazy) as
8+
usual.
9+
10+
This can be useful when the cube is used purely as a placeholder for metadata, e.g. to
11+
represent a combination of coordinates.
12+
13+
Most notably, dataless cubes can be used as the target "grid cube" for most regridding
14+
schemes, since in that case the cube's coordinates are all that the method uses.
15+
See also :meth:`iris.util.make_gridcube`.
16+
17+
18+
Properties of dataless cubes
19+
----------------------------
20+
21+
* ``cube.shape`` is unchanged
22+
* ``cube.data`` == ``None``
23+
* ``cube.dtype`` == ``None``
24+
* ``cube.core_data()`` == ``cube.lazy_data()`` == ``None``
25+
* ``cube.is_dataless()`` == ``True``
26+
* ``cube.has_lazy_data()`` == ``False``
27+
28+
29+
Cube creation
30+
-------------
31+
You can create a dataless cube with the :meth:`~iris.cube.Cube` constructor
32+
(i.e. ``__init__`` call), by specifying the ``shape`` keyword in place of ``data``.
33+
If both are specified, an error is raised (even if data and shape are compatible).
34+
35+
36+
Data assignment
37+
---------------
38+
You can make an existing cube dataless, by setting ``cube.data = None``.
39+
The data array is simply discarded.
40+
41+
Likewise, you can add data by assigning any data array of the correct shape, which
42+
turns it into a 'normal' cube.
43+
44+
Note that ``cube.dtype`` always matches ``cube.data.dtype``. A dataless cube has a
45+
dtype of ``None``.
46+
47+
48+
Cube copy
49+
---------
50+
The syntax that allows you to replace data on copying,
51+
e.g. ``cube2 = cube.copy(new_data)``, has been extended to accept the special value
52+
:data:`iris.DATALESS`.
53+
54+
So, ``cube2 = cube.copy(iris.DATALESS)`` makes ``cube2`` a
55+
dataless copy of ``cube``.
56+
This is equivalent to ``cube2 = cube.copy(); cube2.data = None``.
57+
58+
59+
Save and Load
60+
-------------
61+
The netcdf file interface can save and re-load dataless cubes correctly.
62+
See: :ref:`save_load_dataless`.
63+
64+
.. _dataless_merge:
65+
66+
Merging
67+
-------
68+
Merging is fully supported for dataless cubes, including combining them with "normal"
69+
cubes.
70+
71+
* in all cases, the result has the same shape and metadata as if the same cubes had
72+
data.
73+
* Merging multiple dataless cubes produces a dataless result.
74+
* Merging dataless and non-dataless cubes results in a partially 'missing' data array,
75+
i.e. the relevant sections are filled with masked data.
76+
* Laziness is also preserved.
77+
78+
79+
Operations NOT supported
80+
-------------------------
81+
Dataless cubes are relatively new, and only partly integrated with Iris cube operations
82+
generally.
83+
84+
The following are some of the notable features which do *not* support dataless cubes,
85+
at least as yet :
86+
87+
* plotting
88+
89+
* cube arithmetic
90+
91+
* statistics
92+
93+
* concatenation
94+
95+
* :meth:`iris.cube.CubeList.realise_data`
96+
97+
* various :class:`~iris.cube.Cube` methods, including at least:
98+
99+
* :meth:`~iris.cube.Cube.convert_units`
100+
101+
* :meth:`~iris.cube.Cube.subset`
102+
103+
* :meth:`~iris.cube.Cube.intersection`
104+
105+
* :meth:`~iris.cube.Cube.slices`
106+
107+
* :meth:`~iris.cube.Cube.interpolate`
108+
109+
* :meth:`~iris.cube.Cube.regrid`
110+
Note: in this case the target ``grid`` can be dataless, but not the source
111+
(``self``) cube.

docs/src/further_topics/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Extra information on specific technical issues.
1515
lenient_maths
1616
um_files_loading
1717
missing_data_handling
18+
dataless_cubes
1819
netcdf_io
1920
dask_best_practices/index
2021
ugrid/index

docs/src/further_topics/netcdf_io.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -188,9 +188,10 @@ Deferred Saving
188188

189189
TBC
190190

191+
.. _save_load_dataless:
191192

192-
Dataless Cubes
193-
--------------
193+
Dataless Cubes in NetCDF files
194+
------------------------------
194195
It now possible to have "dataless" cubes, where ``cube.data is None``.
195196
When these are saved to a NetCDF file interface, this results in a netcdf file variable
196197
with all-unwritten data (meaning that it takes up no storage space).

docs/src/whatsnew/latest.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,9 @@ This document explains the changes made to Iris for this release
3030
✨ Features
3131
===========
3232

33-
#. `@pp-mo`_ added a new utility function for making a test cube with a specified 2D
34-
horizontal grid.
35-
(:issue:`5770`, :pull:`6581`)
33+
#. `@pp-mo`_ added the :func:`~iris.util.make_gridcube` utility function, for making a
34+
dataless test-cube with a specified 2D horizontal grid.
35+
(:issue:`5770`, :pull:`6581`, :pull:`6741`)
3636

3737
#. `@hsteptoe`_ and `@trexfeathers`_ (reviewer) added :func:`iris.util.mask_cube_from_shape`
3838
to handle additional Point and Line shape types. This change also facilitates the use of
@@ -59,6 +59,11 @@ This document explains the changes made to Iris for this release
5959
:func:`~iris.cube.Cube.slices` to work with dataless cubes.
6060
(:issue:`6725`, :pull:`6724`)
6161

62+
#. `@pp-mo`_ added the ability to merge dataless cubes. This also means they can be
63+
re-loaded normally with :meth:`iris.load`. See: :ref:`dataless_merge`.
64+
Also added a new documentation section on dataless cubes.
65+
(:issue:`6740`, :pull:`6741`)
66+
6267

6368
🐛 Bugs Fixed
6469
=============

lib/iris/_data_manager.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,16 @@ def __init__(self, data, shape=None):
3434
dataless.
3535
3636
"""
37-
if (shape is None) and (data is None):
38-
msg = 'one of "shape" or "data" should be provided; both are None'
39-
raise ValueError(msg)
40-
elif (shape is not None) and (data is not None):
41-
msg = '"shape" should only be provided if "data" is None'
42-
raise ValueError(msg)
37+
if shape is None:
38+
if data is None:
39+
msg = 'one of "shape" or "data" should be provided; both are None'
40+
raise ValueError(msg)
41+
else:
42+
if data is not None:
43+
msg = '"shape" should only be provided if "data" is None'
44+
raise ValueError(msg)
45+
# Normalise how shape is recorded
46+
shape = tuple(shape)
4347

4448
# Initialise the instance.
4549
self._shape = shape

lib/iris/_merge.py

Lines changed: 56 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from collections import OrderedDict, namedtuple
1313
from copy import deepcopy
1414

15+
import dask.array as da
1516
import numpy as np
1617

1718
from iris._lazy_data import (
@@ -430,7 +431,13 @@ def match(self, other, error_on_mismatch):
430431
if self.data_shape != other.data_shape:
431432
msg = "cube.shape differs: {} != {}"
432433
msgs.append(msg.format(self.data_shape, other.data_shape))
433-
if self.data_type != other.data_type:
434+
if (
435+
self.data_type is not None
436+
and other.data_type is not None
437+
and self.data_type != other.data_type
438+
):
439+
# N.B. allow "None" to match any other dtype: this means that dataless
440+
# cubes can merge with 'dataful' ones.
434441
msg = "cube data dtype differs: {} != {}"
435442
msgs.append(msg.format(self.data_type, other.data_type))
436443
# Both cell_measures_and_dims and ancillary_variables_and_dims are
@@ -1109,8 +1116,6 @@ def __init__(self, cube):
11091116
source-cube.
11101117
11111118
"""
1112-
if cube.is_dataless():
1113-
raise iris.exceptions.DatalessError("merge")
11141119
# Default hint ordering for candidate dimension coordinates.
11151120
self._hints = [
11161121
"time",
@@ -1240,7 +1245,10 @@ def merge(self, unique=True):
12401245
# their data loaded then at the end we convert the stack back
12411246
# into a plain numpy array.
12421247
stack = np.empty(self._stack_shape, "object")
1243-
all_have_data = True
1248+
all_have_real_data = True
1249+
some_are_dataless = False
1250+
part_shape: tuple = None
1251+
part_dtype: np.dtype = None
12441252
for nd_index in nd_indexes:
12451253
# Get the data of the current existing or last known
12461254
# good source-cube
@@ -1249,18 +1257,45 @@ def merge(self, unique=True):
12491257
data = self._skeletons[group[offset]].data
12501258
# Ensure the data is represented as a dask array and
12511259
# slot that array into the stack.
1252-
if is_lazy_data(data):
1253-
all_have_data = False
1260+
if data is None:
1261+
some_are_dataless = True
12541262
else:
1255-
data = as_lazy_data(data)
1263+
# We have (at least one) array content : Record the shape+dtype
1264+
part_shape = data.shape
1265+
part_dtype = data.dtype
1266+
# ensure lazy (we make the result real, later, if all were real)
1267+
if is_lazy_data(data):
1268+
all_have_real_data = False
1269+
else:
1270+
data = as_lazy_data(data)
12561271
stack[nd_index] = data
12571272

1258-
merged_data = multidim_lazy_stack(stack)
1259-
if all_have_data:
1260-
# All inputs were concrete, so turn the result back into a
1261-
# normal array.
1262-
merged_data = as_concrete_data(merged_data)
1263-
merged_cube = self._get_cube(merged_data)
1273+
if part_shape is None:
1274+
# NO parts had data : the result will also be dataless
1275+
merged_data = None
1276+
merged_shape = self._shape
1277+
else:
1278+
# At least some inputs had data : the result will have a data array.
1279+
if some_are_dataless:
1280+
# Some parts were dataless: fill these with a lazy all-missing array.
1281+
missing_part = da.ma.masked_array(
1282+
data=da.zeros(part_shape, dtype=part_dtype),
1283+
mask=da.ones(part_shape, dtype=bool),
1284+
dtype=part_dtype,
1285+
)
1286+
for inds in np.ndindex(stack.shape):
1287+
if stack[inds] is None:
1288+
stack[inds] = missing_part
1289+
1290+
# Make a single lazy merged result array
1291+
merged_data = multidim_lazy_stack(stack)
1292+
merged_shape = None
1293+
if all_have_real_data:
1294+
# All inputs were concrete, so turn the result back into a
1295+
# normal array.
1296+
merged_data = as_concrete_data(merged_data)
1297+
1298+
merged_cube = self._get_cube(merged_data, shape=merged_shape)
12641299
merged_cubes.append(merged_cube)
12651300

12661301
return merged_cubes
@@ -1291,8 +1326,6 @@ def register(self, cube, error_on_mismatch=False):
12911326
this :class:`ProtoCube`.
12921327
12931328
"""
1294-
if cube.is_dataless():
1295-
raise iris.exceptions.DatalessError("merge")
12961329
cube_signature = self._cube_signature
12971330
other = self._build_signature(cube)
12981331
match = cube_signature.match(other, error_on_mismatch)
@@ -1545,12 +1578,18 @@ def name_in_independents():
15451578
# deferred loading, this does NOT change the shape.
15461579
self._shape.extend(signature.data_shape)
15471580

1548-
def _get_cube(self, data):
1581+
def _get_cube(self, data, shape=None):
15491582
"""Generate fully constructed cube.
15501583
15511584
Return a fully constructed cube for the given data, containing
15521585
all its coordinates and metadata.
15531586
1587+
Parameters
1588+
----------
1589+
data : array_like
1590+
Cube data content. If None, `shape` must be set and the result is dataless.
1591+
shape : tuple, optional
1592+
Cube data shape, only used if data is None.
15541593
"""
15551594
signature = self._cube_signature
15561595
dim_coords_and_dims = [
@@ -1573,6 +1612,7 @@ def _get_cube(self, data):
15731612
aux_coords_and_dims=aux_coords_and_dims,
15741613
cell_measures_and_dims=cms_and_dims,
15751614
ancillary_variables_and_dims=avs_and_dims,
1615+
shape=shape,
15761616
**kwargs,
15771617
)
15781618

lib/iris/cube.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5160,7 +5160,7 @@ def interpolate(
51605160
51615161
"""
51625162
if self.is_dataless():
5163-
raise iris.exceptions.DatalessError("interoplate")
5163+
raise iris.exceptions.DatalessError("interpolate")
51645164
coords, points = zip(*sample_points)
51655165
interp = scheme.interpolator(self, coords) # type: ignore[arg-type]
51665166
return interp(points, collapse_scalar=collapse_scalar)

lib/iris/fileformats/pp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
"save_pairs_from_cube",
5555
]
5656

57-
57+
#: Standard spherical earth radius, as defined for MetOffice Unified Model.
5858
EARTH_RADIUS = 6371229.0
5959

6060

0 commit comments

Comments
 (0)