From a48c6c0999fda79069566582909bb95f319e8e2a Mon Sep 17 00:00:00 2001
From: Tonio Fincke <tonio.fincke@brockmann-consult.de>
Date: Thu, 22 Jul 2021 17:10:17 +0200
Subject: [PATCH 1/7] created cube convention document

---
 docs/source/cubeconv.md | 180 ++++++++++++++++++++++++++++++++++++++++
 docs/source/cubespec.md | 157 -----------------------------------
 2 files changed, 180 insertions(+), 157 deletions(-)
 create mode 100644 docs/source/cubeconv.md
 delete mode 100644 docs/source/cubespec.md
diff --git a/docs/source/cubeconv.md b/docs/source/cubeconv.md
new file mode 100644
index 000000000..4f68f4617
--- /dev/null
+++ b/docs/source/cubeconv.md
@@ -0,0 +1,180 @@
+# xcube Dataset Convention
+
+This document describes a convention for *xcube datasets*, which are data cubes 
+in the xcube sense. Any dataset can be considered a data cube as long as at 
+least a subset of its data variables are cube-like, i.e., meet the requirements 
+listed in this document. 
+
+The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, 
+“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this 
+document are to be interpreted as described in 
+[RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
+
+## Document Status
+
+This is the latest version, which is still in development.
+
+Version: 1.0, draft
+
+Updated: 21.07.2021
+
+
+## Motivation
+
+For many users of Earth observation data, common operations such as 
+multivariate co-registration, extraction, comparison, and analysis of different 
+data sources are difficult, while data is provided in various formats and at 
+different spatio-temporal resolutions.
+
+## High-level requirements
+
+xcube datasets 
+
+* SHALL be time series of gridded, geo-spatial, geo-physical variables.  
+* SHALL use a common, equidistant, global or regional geo-spatial grid.
+* SHALL be easy to read, write, process, generate.
+* SHALL conform to the requirements of analysis ready data (ARD).
+* SHALL be compatible with existing tools and APIs.
+* SHALL conform to standards or common practices and follow a common 
+  data model.
+* SHALL be formatted as self-contained datasets.
+* SHALL be "cloud ready", in the sense that subsets of the data can be
+  accessed by individual URIs.
+
+ARD links:
+
+* http://ceos.org/ard/
+* https://www.usgs.gov/core-science-systems/nli/landsat/us-landsat-analysis-ready-data
+* https://medium.com/planet-stories/analysis-ready-data-defined-5694f6f48815
+ 
+
+## xcube Dataset Schemas
+
+### Basic Schema
+
+* Attributes 
+  * SHALL be [CF](http://cfconventions.org/) >= 1.7 
+  * SHOULD adhere to 
+    [Attribute Convention for Data Discovery](http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery) 
+* Dimensions: 
+  * SHALL all be greater than zero.
+  + SHALL include two spatial dimensions  
+  * SHOULD include a dimension `time`
+  * SHOULD include a dimension `bnds` of size 2 that may be used by bounding 
+    coordinate variables
+* Coordinate Variables
+  * SHALL contain labels for a dimension
+  * SHOULD be 1-dimensional
+  * MAY be 2-dimensional if, e.g., they are bound coordinate variables (see 
+    below) or they carry `latitude`/`longitude` values in case of
+  * 1-dimensional coordinate variables SHOULD be named like the dimension they 
+    describe 
+  * For each dimension of a data variable, a coordinate variable MUST exist
+* Temporal coordinate variables: 
+  * SHALL provide time coordinates for a given time index.
+  * MAY be non-equidistant or equidistant.
+  * SHOULD be named `time`    
+  * One variable value SHALL provide observation or average time of 
+    *cell centers*.
+  * Attributes: 
+    * Temporal coordinate variables MUST have `units`, `standard_name`, 
+      and any others.
+    * `standard_name` MUST be `"time"`, `units` MUST have format 
+      `"<deltatime> since <datetime>"`, where `datetime` must have 
+      ISO-format. `calendar` may be given, if not, `"gregorian"` is 
+      assumed.
+* Spatial coordinate variables
+  * SHALL provide spatial coordinates for given spatial index.
+  * SHALL be equidistant in either angular or metric units
+  * Different spatial coordinate variables MAY have different spatial 
+    resolutions 
+* Bound coordinate variables
+  * SHOULD be included for any spatial or temporal coordinate variable
+  * SHALL consist of two dimensions: The one of the respective coordinate 
+    variable and another one of length 2, that SHOULD be named `bnds`
+  * SHOULD be named `<dim_name>_bnds`
+  * `<bound_var>[<coord_dim>, 0]` SHALL provide the *lower cell boundary*,
+    `<bound_var>[<coord_dim>, 1]` SHALL provide the *upper cell boundary*
+* Data variables: 
+  * MAY have any dimensionality, including no dimensions at all.
+  * SHALL have the spatial dimensions at the innermost position in case it has 
+    spatial dimensions (e.g., [..., y, x])
+  * SHALL have its time dimension at the outermost position in case it has a
+    time dimension (e.g., [time, ...])
+  * MAY have extra dimensions, e.g. `layer` (of the atmosphere) or 
+    `band` (of a spectrum). These extra dimensions MUST be positioned between
+    the time and the spatial coordinates
+  * SHALL provide *cube cells* with the dimensions as index.
+  * SHOULD specify the `units` metadata attribute.
+  * SHOULD specify metadata attributes that are used to identify 
+    missing values, namely `_FillValue` and / or `valid_min`, 
+    `valid_max`, see notes in CF conventions on these attributes.
+  * MAY specify metadata attributes that can be used to visualise the 
+    data:
+    * `color_bar_name`: Name of a predefined colour mapping. 
+       The colour bar is applied between a minimum and a maximum value. 
+    * `color_value_min`, `color_value_max`: Minimum and maximum value 
+       for applying the colour bar. If not provided, minimum and maximum
+       default to `valid_min`, `valid_max`. If neither are provided, 
+       minimum and maximum default to `0` and `1`.
+
+### WGS84 Schema (extends Basic)
+
+* Dimensions:
+  * SHALL include two spatial dimensions, which SHOULD be named `lat` and `lon`
+* Spatial coordinate variables: 
+  * SHALL use WGS84 (EPSG:4326) CRS.
+  * One entry of the variable describing the latitude SHALL provide the 
+    observation or average latitude of *cell centers*. It SHOULD have the 
+    attributes: `standard_name="latitude"` `units="degrees_north"`.
+  * One entry of the variable describing the longitude SHALL provide the 
+    observation or average longitude of *cell centers*. It SHOULD have the 
+    attributes: `standard_name="longitude"` `units="degrees_east"`.
+
+### Generic Schema (extends Basic)
+
+* Dimensions:
+  * SHALL include two spatial dimensions, which SHOULD be named `y` and `x`
+* Spatial coordinate variables: 
+  * MAY use any spatial grid and CRS.
+  * SHOULD have attributes `standard_name`, `units`
+  * MAY have `lat[<y_dim_name>,<x_dim_name>]`: latitude of *cell centers*. 
+    *  Attributes: `standard_name="latitude"`, `units="degrees_north"`.
+  * MAY have `lon[<y_dim_name>,<x_dim_name>]`: longitude of *cell centers*. 
+    *  Attributes: `standard_name="longitude"`, `units="degrees_east"`.
+* Grid Mapping variable:
+  * SHALL be included in case the CRS is not WGS84.
+  * SHALL not carry any data, therefore it MAY be of any type
+  * SHOULD be named `crs`  
+  * MUST have attributes that describe a CF Grid Mapping v1.8 (see 
+    http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#grid-mappings-and-projections 
+    ). This means that there MUST either be 
+      * an attribute `crs_wkt` that desribes a CRS in WKT format
+      * an attribute `spatial_ref` (e.g., an EPSG code)
+      * an attribute `grid_mapping_name`. If this is given, more attributes 
+        MAY be required, depending on the grid mapping.
+
+
+
+## xcube EO Processing Levels
+
+This section provides an attempt to characterize xcube datasets 
+generated from Earth Observation (EO) data according to their 
+processing levels as they are commonly used in EO data processing.
+
+### Level-1C and Level-2C 
+
+* Generated from Level-1A, -1B, -2A, -2B EO data.
+* Spatially resampled to common grid
+  * Typically resampled at original resolution.
+  * May be down-sampled: aggregation/integration.
+  * May be upsampled: interpolation.
+* No temporal aggregation/integration.
+* Temporally non-equidistant.
+
+### Level-3
+
+* Generated from Level-2C or -3 by temporal aggregation.
+* No spatial processing.
+* Temporally equidistant.
+* Temporally integrated/aggregated.
diff --git a/docs/source/cubespec.md b/docs/source/cubespec.md
deleted file mode 100644
index a5a69c486..000000000
--- a/docs/source/cubespec.md
+++ /dev/null
@@ -1,157 +0,0 @@
-# xcube Dataset Specification
-
-This document provides a technical specification of the protocol and 
-format for *xcube datasets*, data cubes in the xcube sense. 
-
-The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, 
-“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this 
-document are to be interpreted as described in 
-[RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
-
-## Document Status
-
-This is the latest version, which is still in development.
-
-Version: 1.0, draft
-
-Updated: 31.05.2018
-
-
-## Motivation
-
-For many users of Earth observation data, multivariate coregistration, 
-extraction, comparison, and analysis of different data sources is 
-difficult, while data is provided in various formats and at different 
-spatio-temporal resolutions.
-
-## High-level requirements
-
-xcube datasets 
-
-* SHALL be time series of gridded, geo-spatial, geo-physical variables.  
-* SHALL use a common, equidistant, global or regional geo-spatial grid.
-* SHALL shall be easy to read, write, process, generate.
-* SHALL conform to the requirements of analysis ready data (ARD).
-* SHALL be compatible with existing tools and APIs.
-* SHALL conform to standards or common practices and follow a common 
-  data model.
-* SHALL be formatted as self-contained datasets.
-* SHALL be "cloud ready", in the sense that subsets of the data can be
-  accessed by individual URIs.
-
-ARD links:
-
-* http://ceos.org/ard/
-* https://landsat.usgs.gov/ard
-* https://medium.com/planet-stories/analysis-ready-data-defined-5694f6f48815
- 
-
-## xcube Dataset Schemas
-
-### Basic Schema
-
-* Attributes metadata convention 
-  * SHALL be [CF](http://cfconventions.org/) >= 1.7 
-  * SHOULD adhere to 
-    [Attribute Convention for Data Discovery](http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery) 
-* Dimensions: 
-  * SHALL be at least `time`, `bnds`, and MAY be any others.
-  * SHALL all be greater than zero, but `bnds` must always be two. 
-* Temporal coordinate variables: 
-  * SHALL provide time coordinates for given time index.
-  * MAY be non-equidistant or equidistant. 
-  * `time[time]` SHALL provide observation or average time of 
-    *cell centers*. 
-  * `time_bnds[time, bnds]` SHALL provide observation or integration 
-    time of *cell boundaries*. 
-  * Attributes: 
-    * Temporal coordinate variables MUST have `units`, `standard_name`, 
-      and any others.
-    * `standard_name` MUST be `"time"`, `units` MUST have format 
-      `"<deltatime> since <datetime>"`, where `datetime` must have 
-      ISO-format. `calendar` may be given, if not, `"gregorian"` is 
-      assumed.
-* Spatial coordinate variables
-  * SHALL provide spatial coordinates for given spatial index.
-  * SHALL be equidistant in either angular or metric units 
-* Cube variables: 
-  * SHALL provide *cube cells* with the dimensions as index.
-  * SHALL have shape 
-    * `[time, ..., lat, lon]` (see WGS84 schema) or 
-    * `[time, ..., y, x]` (see Generic schema) 
-  * MAY have extra dimensions, e.g. `layer` (of the atmosphere), 
-    `band` (of a spectrum).
-  * SHALL specify the `units` metadata attribute.
-  * SHOULD specify metadata attributes that are used to identify 
-    missing values, namely `_FillValue` and / or `valid_min`, 
-    `valid_max`, see notes in CF conventions on these attributes.
-  * MAY specify metadata attributes that can be used to visualise the 
-    data:
-    * `color_bar_name`: Name of a predefined colour mapping. 
-       The colour bar is applied between a minimum and a maximum value. 
-    * `color_value_min`, `color_value_max`: Minimum and maximum value 
-       for applying the colour bar. If not provided, minimum and maximum
-       default to `valid_min`, `valid_max`. If neither are provided, 
-       minimum and maximum default to `0` and `1`.
-
-### WGS84 Schema (extends Basic)
-
-* Dimensions:
-  * SHALL be at least `time`, `lat`, `lon`, `bnds`, and MAY be any 
-    others. 
-* Spatial coordinate variables: 
-  * SHALL use WGS84 (EPSG:4326) CRS.
-  * SHALL have `lat[lat]` that provides observation or average latitude
-    of *cell centers*
-    with attributes: `standard_name="latitude"` `units="degrees_north"`.
-  * SHALL have `lon[lon]` that provides observation or average longitude
-    of *cell centers* with attributes: `standard_name="longitude"` and
-    `units="degrees_east"`. 
-  * SHOULD HAVE `lat_bnds[lat, bnds]`, `lon_bnds[lon, bnds]`: provide
-    geodetic observation or integration coordinates of
-    *cell boundaries*. 
-* Cube variables: 
-  * SHALL have shape `[time, ..., lat, lon]`. 
-
-### Generic Schema (extends Basic)
-
-* Dimensions: `time`, `y`, `x`, `bnds`, and any others. 
-  * SHALL be at least `time`, `y`, `x`, `bnds`, and MAY be any others. 
-* Spatial coordinate variables: 
-  * Any spatial grid and CRS.
-  * `y[y]`, `x[x]`: provide spatial observation or average coordinates
-    of *cell centers*.
-    * Attributes: `standard_name`, `units`, other units describe the 
-      CRS / projections, see CF.
-  * `y_bnds[y, bnds]`, `x_bnds[x, bnds]`: provide spatial observation
-    or integration coordinates of *cell boundaries*.
-  * MAY have `lat[y,x]`: latitude of *cell centers*. 
-    *  Attributes: `standard_name="latitude"`, `units="degrees_north"`.
-  * `lon[y,x]`: longitude of *cell centers*. 
-    *  Attributes: `standard_name="longitude"`, `units="degrees_east"`.
-* Cube variables: 
-  * MUST have shape `[time, ..., y, x]`. 
-
-
-## xcube EO Processing Levels
-
-This section provides an attempt to characterize xcube datasets 
-generated from Earth Observation (EO) data according to their 
-processing levels as they are commonly used in EO data processing.
-
-### Level-1C and Level-2C 
-
-* Generated from Level-1A, -1B, -2A, -2B EO data.
-* Spatially resampled to common grid
-  * Typically resampled at original resolution.
-  * May be down-sampled: aggregation/integration.
-  * May be upsampled: interpolation.
-* No temporal aggregation/integration.
-* Temporally non-equidistant.
-
-### Level-3
-
-* Generated from Level-2C or -3 by temporal aggregation.
-* No spatial processing.
-* Temporally equidistant.
-* Temporally integrated/aggregated.

From b9de2acf7033abe6b22ff56eb0d1358adcede8ba Mon Sep 17 00:00:00 2001
From: Tonio Fincke <tonio.fincke@brockmann-consult.de>
Date: Fri, 23 Jul 2021 11:47:55 +0200
Subject: [PATCH 2/7] formatting

---
 docs/source/cubeconv.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/cubeconv.md b/docs/source/cubeconv.md
index 4f68f4617..c8adae265 100644
--- a/docs/source/cubeconv.md
+++ b/docs/source/cubeconv.md
@@ -98,9 +98,9 @@ ARD links:
 * Data variables: 
   * MAY have any dimensionality, including no dimensions at all.
   * SHALL have the spatial dimensions at the innermost position in case it has 
-    spatial dimensions (e.g., [..., y, x])
+    spatial dimensions (e.g., `[..., y, x]`)
   * SHALL have its time dimension at the outermost position in case it has a
-    time dimension (e.g., [time, ...])
+    time dimension (e.g., `[time, ...]`)
   * MAY have extra dimensions, e.g. `layer` (of the atmosphere) or 
     `band` (of a spectrum). These extra dimensions MUST be positioned between
     the time and the spatial coordinates

From f1e6a15f159d910d35995a4292522c34f8eafa61 Mon Sep 17 00:00:00 2001
From: Tonio Fincke <tonio.fincke@brockmann-consult.de>
Date: Tue, 27 Jul 2021 15:13:03 +0200
Subject: [PATCH 3/7] updated link

---
 docs/source/devguide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/devguide.md b/docs/source/devguide.md
index 3bd0b43f1..226c24315 100644
--- a/docs/source/devguide.md
+++ b/docs/source/devguide.md
@@ -181,7 +181,7 @@ Create new module in `xcube.core` and add your functions.
 For any functions added make sure naming is in line with other API.
 Add clear doc-string to the new API. Use Sphinx RST format.
 
-Decide if your API methods requires [xcube datasets](./cubespec.md) as 
+Decide if your API methods requires [xcube datasets](./cubeconv.md) as 
 inputs, if so, name the primary dataset argument `cube` and add a 
 keyword parameter `cube_asserted: bool = False`. 
 Otherwise name the primary dataset argument `dataset`.

From cb6b381782c7e13ad31b83b4a122ebcacafcaf0d Mon Sep 17 00:00:00 2001
From: Tonio Fincke <tonio.fincke@brockmann-consult.de>
Date: Tue, 27 Jul 2021 15:14:33 +0200
Subject: [PATCH 4/7] added option to only normalize non-spatial properties

---
 xcube/core/normalize.py | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/xcube/core/normalize.py b/xcube/core/normalize.py
index 82f4e6ceb..96634a879 100644
--- a/xcube/core/normalize.py
+++ b/xcube/core/normalize.py
@@ -49,7 +49,9 @@ def cubify_dataset(ds: xr.Dataset) -> xr.Dataset:
 
 
 def normalize_dataset(ds: xr.Dataset,
-                      reverse_decreasing_lat: bool = False
+                      *,
+                      reverse_decreasing_lat: bool = False,
+                      do_not_normalize_spatial_dims: bool = False
                       ) -> xr.Dataset:
     """
     Normalize the geo- and time-coding upon opening the given dataset w.r.t. a common
@@ -74,12 +76,15 @@ def normalize_dataset(ds: xr.Dataset,
         are increasing
     :return: The normalized dataset, or the original dataset, if it is already "normal".
     """
-    ds = _normalize_zonal_lat_lon(ds)
+    if not do_not_normalize_spatial_dims:
+        ds = _normalize_zonal_lat_lon(ds)
     ds = normalize_coord_vars(ds)
-    ds = _normalize_lat_lon(ds)
-    ds = _normalize_lat_lon_2d(ds)
+    if not do_not_normalize_spatial_dims:
+        ds = _normalize_lat_lon(ds)
+        ds = _normalize_lat_lon_2d(ds)
     ds = _normalize_dim_order(ds)
-    ds = _normalize_lon_360(ds)
+    if not do_not_normalize_spatial_dims:
+        ds = _normalize_lon_360(ds)
     if reverse_decreasing_lat:
         ds = _reverse_decreasing_lat(ds)
     ds = normalize_missing_time(ds)

From 9773c92b0b563b9a0555f26c9e39b64a7785d71a Mon Sep 17 00:00:00 2001
From: Tonio Fincke <tonio.fincke@brockmann-consult.de>
Date: Tue, 27 Jul 2021 15:27:55 +0200
Subject: [PATCH 5/7] added split and merge

---
 test/core/test_treatascube.py | 74 ++++++++++++++++++++++++++
 xcube/core/treatascube.py     | 99 +++++++++++++++++++++++++++++++++++
 2 files changed, 173 insertions(+)
 create mode 100644 test/core/test_treatascube.py
 create mode 100644 xcube/core/treatascube.py

diff --git a/test/core/test_treatascube.py b/test/core/test_treatascube.py
new file mode 100644
index 000000000..27222f2a8
--- /dev/null
+++ b/test/core/test_treatascube.py
@@ -0,0 +1,74 @@
+from xcube.core.treatascube import merge_cube
+from xcube.core.treatascube import split_cube
+from xcube.core.treatascube import verify_cube_subset
+from xcube.core.new import new_cube
+from xcube.core.verify import assert_cube
+
+import numpy as np
+import xarray as xr
+import unittest
+
+
+class VerifyCubSubsetTest(unittest.TestCase):
+
+    def test_all_well(self):
+        cube = new_cube(variables=dict(x=1, y=2))
+        try:
+            verify_cube_subset(cube)
+        except ValueError as ve:
+            self.fail(f'No value error expected: {ve}')
+
+    def test_no_vars(self):
+        cube = new_cube(variables=None)
+        with self.assertRaises(ValueError) as ve:
+            verify_cube_subset(cube)
+        self.assertEqual('Not at least one data variable '
+                         'has spatial dimensions.',
+                         f'{ve.exception}')
+
+    def test_no_grid_mapping(self):
+        cube = new_cube(variables=dict(x=1, y=2))
+        cube = cube.drop_dims('lat')
+        with self.assertRaises(ValueError) as ve:
+            verify_cube_subset(cube)
+        self.assertEqual('cannot find any grid mapping in dataset',
+                         f'{ve.exception}')
+
+    def test_no_time_info(self):
+        cube = new_cube(drop_bounds=True, variables=dict(x=1, y=2))
+        cube = cube.drop_vars('time')
+        with self.assertRaises(ValueError) as ve:
+            verify_cube_subset(cube)
+        self.assertEqual('Dataset has no temporal information.',
+                         f'{ve.exception}')
+
+
+class SplitAndMergeTest(unittest.TestCase):
+
+    def test_split(self):
+        cube = new_cube(variables=dict(x=1, y=2))
+        splitcube, removed_data_vars = split_cube(cube)
+        self.assertEqual(dict(), removed_data_vars)
+        self.assertEqual(cube.data_vars.keys(), splitcube.data_vars.keys())
+
+    def test_split_remove_vars_and_merge(self):
+        cube = new_cube(variables=dict(x=1, y=2))
+        non_cube_dims = {}
+        non_cube_dims['no_spatial_dims'] = \
+            xr.DataArray([0.1, 0.2, 0.3, 0.4, 0.5],
+                         dims=('time'))
+        non_cube_dims['no_dims'] = np.array(b'', dtype='|S1')
+        cube = cube.assign(non_cube_dims)
+
+        with self.assertRaises(ValueError):
+            assert_cube(cube)
+
+        splitcube, removed_data_vars = split_cube(cube)
+        self.assertEqual(non_cube_dims.keys(), removed_data_vars.keys())
+        self.assertEqual(['x', 'y'], list(splitcube.data_vars.keys()))
+
+        assert_cube(splitcube)
+
+        merged_cube = merge_cube(splitcube, removed_data_vars)
+        self.assertEqual(['x', 'y', 'no_spatial_dims', 'no_dims'],
+                         list(merged_cube.data_vars.keys()))
diff --git a/xcube/core/treatascube.py b/xcube/core/treatascube.py
new file mode 100644
index 000000000..3e70a14fa
--- /dev/null
+++ b/xcube/core/treatascube.py
@@ -0,0 +1,99 @@
+# The MIT License (MIT)
+# Copyright (c) 2021 by the xcube development team and contributors
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+from typing import Mapping
+
+import xarray as xr
+from xcube.core.gridmapping import GridMapping
+from xcube.core.normalize import normalize_dataset
+from xcube.core.timecoord import get_time_range_from_data
+from xcube.core.verify import assert_cube
+
+
+def verify_cube_subset(dataset: xr.Dataset):
+    """
+    Verifies that the dataset fulfils the minimum requirements  for a dataset
+    that either is or may be converted to be a cube. In order to do so, the
+    dataset
+     * must have two spatial dimensions
+     * must have at least one data variable that uses the spatial dimensions
+     * must have either a temporal dimension or temporal information in its
+     attributes
+
+    :param dataset: The dataset to be validated.
+    :raise: ValueError, if dataset contains no subset that is a valid xcube
+    dataset.
+    """
+    grid_mapping = GridMapping.from_dataset(dataset)
+    # if a gridmapping exists, the dataset contains spatial dimensions
+    # if no gridmapping exists, a ValueError is raised
+    at_least_one_valid_var = False
+    for data_var in dataset.data_vars.values():
+        if grid_mapping.xy_dim_names[0] in data_var.dims and \
+                grid_mapping.xy_dim_names[1] in data_var.dims:
+            at_least_one_valid_var = True
+            break
+    if not at_least_one_valid_var:
+        raise ValueError('Not at least one data variable has '
+                         'spatial dimensions.')
+    start_time, end_time = get_time_range_from_data(dataset)
+    if start_time is None and end_time is None:
+        raise ValueError('Dataset has no temporal information.')
+
+
+def split_cube(dataset: xr.Dataset) -> (xr.Dataset, Mapping[str, xr.DataArray]):
+    """
+    Creates a subset of a dataset that meets all hard requirements of a cube.
+    To this end, all variables that do not include spatial dimensions will be
+    removed and returned in a mapping from dataset name to data array.
+
+    :param dataset: The dataset from which the subset shall be built
+    :raise: ValueError, , if dataset contains no subset that is a valid xcube
+    dataset.
+    :return: a tuple, consisting of (a) a subset of the input dataset that has
+    been normalized to conform to strict cube requirements and (b) a mapping
+    of the names of removed data variables to these data variables
+    """
+    verify_cube_subset(dataset)
+
+    non_cube_data_vars = dict()
+    grid_mapping = GridMapping.from_dataset(dataset)
+
+    for data_var_name, data_var in dataset.data_vars.items():
+        if grid_mapping.xy_dim_names[0] not in data_var.dims \
+                and grid_mapping.xy_dim_names[1] not in data_var.dims:
+            non_cube_data_vars[data_var_name] = data_var
+    dataset = dataset.drop_vars(list(non_cube_data_vars.keys()))
+    dataset = normalize_dataset(dataset, do_not_normalize_spatial_dims=True)
+    return dataset, non_cube_data_vars
+
+
+def merge_cube(dataset: xr.Dataset,
+               data_vars: Mapping[str, xr.DataArray]) -> xr.Dataset:
+    """
+    Merges data_vars into a data set.
+
+    :param dataset: The dataset into which the data variables shall be merged
+    :param data_vars: The data variables that shall be merged into the dataset
+    :raise: ValueError, if dataset is not a valid xcube dataset
+    """
+    assert_cube(dataset)
+    return dataset.assign(data_vars)

From cf9fc980d561823b031003968ae56d20d0573d28 Mon Sep 17 00:00:00 2001
From: Tonio Fincke <tonio.fincke@brockmann-consult.de>
Date: Tue, 27 Jul 2021 15:31:37 +0200
Subject: [PATCH 6/7] use split_cube rather than assert_cube

---
 xcube/core/compute.py             | 20 +++++++++---------
 xcube/core/extract.py             | 26 ++++++++---------------
 xcube/core/mldataset.py           | 13 ++++++------
 xcube/core/resampling/temporal.py | 12 ++++++-----
 xcube/core/timeseries.py          |  8 +++-----
 xcube/core/vars2dim.py            | 34 ++++++++++++++++++++++---------
 6 files changed, 59 insertions(+), 54 deletions(-)

diff --git a/xcube/core/compute.py b/xcube/core/compute.py
index 7275b2830..7ad453825 100644
--- a/xcube/core/compute.py
+++ b/xcube/core/compute.py
@@ -28,6 +28,7 @@
 
 from xcube.core.schema import CubeSchema
 from xcube.core.chunkstore import ChunkStore
+from xcube.core.treatascube import split_cube
 from xcube.core.verify import assert_cube
 
 CubeFuncOutput = Union[xr.DataArray, np.ndarray, Sequence[Union[xr.DataArray, np.ndarray]]]
@@ -79,8 +80,7 @@ def compute_cube(cube_func: CubeFunc,
                            output_var_name=output_var_name,
                            output_var_dtype=output_var_dtype,
                            output_var_attrs=output_var_attrs,
-                           vectorize=vectorize,
-                           cube_asserted=cube_asserted)
+                           vectorize=vectorize)
 
 
 def compute_dataset(cube_func: CubeFunc,
@@ -92,8 +92,7 @@ def compute_dataset(cube_func: CubeFunc,
                     output_var_dims: AbstractSet[str] = None,
                     output_var_dtype: Any = np.float64,
                     output_var_attrs: Dict[str, Any] = None,
-                    vectorize: bool = None,
-                    cube_asserted: bool = False) -> xr.Dataset:
+                    vectorize: bool = None) -> xr.Dataset:
     """
     Compute a new output dataset with a single variable named *output_var_name*
     from variables named *input_var_names* contained in zero, one, or more
@@ -139,7 +138,6 @@ def cube_func(*input_vars: np.ndarray,
     :param output_var_attrs: Optional metadata attributes for the output variable.
     :param vectorize: Whether all *input_cubes* have the same variables which are concatenated and passed as vectors
         to *cube_func*. Not implemented yet.
-    :param cube_asserted: If False, *cube* will be verified, otherwise it is expected to be a valid cube.
     :return: A new dataset that contains the computed output variable.
     """
     if vectorize is not None:
@@ -147,16 +145,18 @@ def cube_func(*input_vars: np.ndarray,
         #       receives variables as vectors (with extra dim)
         raise NotImplementedError('vectorize is not supported yet')
 
-    if not cube_asserted:
-        for cube in input_cubes:
-            assert_cube(cube)
+    # TODO resample all input cubes to WGS84
+    split_input_cubes = []
+    for cube in input_cubes:
+        cube, _ = split_cube(cube)
+        assert_cube(cube)
+        split_input_cubes.append(cube)
+    input_cubes = tuple(split_input_cubes)
 
     # Check compatibility of inputs
     if input_cubes:
         input_cube_schema = CubeSchema.new(input_cubes[0])
         for cube in input_cubes:
-            if not cube_asserted:
-                assert_cube(cube)
             if cube != input_cubes[0]:
                 # noinspection PyUnusedLocal
                 other_schema = CubeSchema.new(cube)
diff --git a/xcube/core/extract.py b/xcube/core/extract.py
index 3018cdbc2..71b9a3749 100644
--- a/xcube/core/extract.py
+++ b/xcube/core/extract.py
@@ -6,7 +6,7 @@
 import pandas as pd
 import xarray as xr
 
-from xcube.core.verify import assert_cube
+from xcube.core.treatascube import split_cube
 
 DEFAULT_INDEX_NAME_PATTERN = '{name}_index'
 DEFAULT_REF_NAME_PATTERN = '{name}_ref'
@@ -41,8 +41,7 @@ def get_cube_values_for_points(
         index_name_pattern: str = DEFAULT_INDEX_NAME_PATTERN,
         include_refs: bool = False,
         ref_name_pattern: str = DEFAULT_REF_NAME_PATTERN,
-        method: str = DEFAULT_INTERP_POINT_METHOD,
-        cube_asserted: bool = False
+        method: str = DEFAULT_INTERP_POINT_METHOD
 ) -> xr.Dataset:
     """
     Extract values from *cube* variables at given
@@ -77,8 +76,7 @@ def get_cube_values_for_points(
     :return: A new data frame whose columns are values from *cube* variables
         at given *points*.
     """
-    if not cube_asserted:
-        assert_cube(cube)
+    cube, other_vars = split_cube(cube)
 
     index_dtype = np.int64 \
         if method == POINT_INTERP_METHOD_NEAREST else np.float64
@@ -87,8 +85,7 @@ def get_cube_values_for_points(
         cube,
         points,
         index_name_pattern=index_name_pattern,
-        index_dtype=index_dtype,
-        cube_asserted=True
+        index_dtype=index_dtype
     )
 
     cube_values = get_cube_values_for_indexes(
@@ -98,8 +95,7 @@ def get_cube_values_for_points(
         include_bounds,
         data_var_names=var_names,
         index_name_pattern=index_name_pattern,
-        method=method,
-        cube_asserted=True
+        method=method
     )
 
     if include_indexes:
@@ -131,8 +127,7 @@ def get_cube_values_for_indexes(
         include_bounds: bool = False,
         data_var_names: Sequence[str] = None,
         index_name_pattern: str = DEFAULT_INDEX_NAME_PATTERN,
-        method: str = DEFAULT_INTERP_POINT_METHOD,
-        cube_asserted: bool = False
+        method: str = DEFAULT_INTERP_POINT_METHOD
 ) -> xr.Dataset:
     """
     Get values from the *cube* at given *indexes*.
@@ -155,8 +150,7 @@ def get_cube_values_for_indexes(
     :return: A new data frame whose columns are values from *cube* variables
         at given *indexes*.
     """
-    if not cube_asserted:
-        assert_cube(cube)
+    cube, other_vars = split_cube(cube)
 
     if method not in {POINT_INTERP_METHOD_NEAREST, POINT_INTERP_METHOD_LINEAR}:
         raise ValueError(f"invalid method {method!r}")
@@ -263,8 +257,7 @@ def get_cube_point_indexes(
         points: PointsLike,
         dim_name_mapping: Mapping[str, str] = None,
         index_name_pattern: str = DEFAULT_INDEX_NAME_PATTERN,
-        index_dtype=np.float64,
-        cube_asserted: bool = False
+        index_dtype=np.float64
 ) -> xr.Dataset:
     """
     Get indexes of given point coordinates *points* into the given *dataset*.
@@ -288,8 +281,7 @@ def get_cube_point_indexes(
         it is expected to be a valid cube.
     :return: A dataset containing the index columns.
     """
-    if not cube_asserted:
-        assert_cube(cube)
+    cube, _ = split_cube(cube)
 
     dim_name_mapping = dim_name_mapping if dim_name_mapping is not None else {}
     dim_names = _get_cube_data_var_dims(cube)
diff --git a/xcube/core/mldataset.py b/xcube/core/mldataset.py
index 9b973e8ee..420836137 100644
--- a/xcube/core/mldataset.py
+++ b/xcube/core/mldataset.py
@@ -18,7 +18,6 @@
 from xcube.core.dsio import parse_s3_fs_and_root
 from xcube.core.dsio import write_cube
 from xcube.core.geom import get_dataset_bounds
-from xcube.core.verify import assert_cube
 from xcube.util.perf import measure_time
 from xcube.util.tilegrid import TileGrid
 
@@ -287,7 +286,7 @@ def _get_dataset_lazily(self, index: int, parameters: Dict[str, Any]) -> xr.Data
                     base_dir = os.path.dirname(self._dir_path)
                     level_path = os.path.join(base_dir, level_path)
         with measure_time(tag=f"opened local dataset {level_path} for level {index}"):
-            return assert_cube(xr.open_zarr(level_path, **parameters), name=level_path)
+            return xr.open_zarr(level_path, **parameters)
 
     def _get_tile_grid_lazily(self):
         """
@@ -386,7 +385,7 @@ def _get_dataset_lazily(self, index: int, parameters: Dict[str, Any]) -> xr.Data
             store = zarr.LRUStoreCache(store, max_size=max_size)
         with measure_time(tag=f"opened remote dataset {level_path} for level {index}"):
             consolidated = self._s3_file_system.exists(f'{level_path}/.zmetadata')
-            return assert_cube(xr.open_zarr(store, consolidated=consolidated, **parameters), name=level_path)
+            return xr.open_zarr(store, consolidated=consolidated, **parameters)
 
     def _get_tile_grid_lazily(self):
         """
@@ -510,7 +509,7 @@ def _get_dataset_lazily(self, index: int, parameters: Dict[str, Any]) -> xr.Data
             raise self._exception_type(f"Failed to compute in-memory dataset {self.ds_id!r} at level {index} "
                                        f"from function {self._callable_name!r}: "
                                        f"expected an xarray.Dataset but got {type(computed_value)}")
-        return assert_cube(computed_value, name=self.ds_id)
+        return computed_value
 
 
 def get_dataset_tile_grid(dataset: xr.Dataset, num_levels: int = None) -> TileGrid:
@@ -663,7 +662,7 @@ def open_ml_dataset_from_object_storage(path: str,
             store = zarr.LRUStoreCache(store, max_size=chunk_cache_capacity)
         with measure_time(tag=f"opened remote zarr dataset {path}"):
             consolidated = s3.exists(f'{root}/.zmetadata')
-            ds = assert_cube(xr.open_zarr(store, consolidated=consolidated, **kwargs))
+            ds = xr.open_zarr(store, consolidated=consolidated, **kwargs)
         return BaseMultiLevelDataset(ds, ds_id=ds_id)
     elif data_format == FORMAT_NAME_LEVELS:
         with measure_time(tag=f"opened remote levels dataset {path}"):
@@ -686,11 +685,11 @@ def open_ml_dataset_from_local_fs(path: str,
 
     if data_format == FORMAT_NAME_NETCDF4:
         with measure_time(tag=f"opened local NetCDF dataset {path}"):
-            ds = assert_cube(xr.open_dataset(path, **kwargs))
+            ds = xr.open_dataset(path, **kwargs)
             return BaseMultiLevelDataset(ds, ds_id=ds_id)
     elif data_format == FORMAT_NAME_ZARR:
         with measure_time(tag=f"opened local zarr dataset {path}"):
-            ds = assert_cube(xr.open_zarr(path, **kwargs))
+            ds = xr.open_zarr(path, **kwargs)
             return BaseMultiLevelDataset(ds, ds_id=ds_id)
     elif data_format == FORMAT_NAME_LEVELS:
         with measure_time(tag=f"opened local levels dataset {path}"):
diff --git a/xcube/core/resampling/temporal.py b/xcube/core/resampling/temporal.py
index 454d4caea..8b405312e 100644
--- a/xcube/core/resampling/temporal.py
+++ b/xcube/core/resampling/temporal.py
@@ -26,7 +26,8 @@
 
 from xcube.core.schema import CubeSchema
 from xcube.core.select import select_variables_subset
-from xcube.core.verify import assert_cube
+from xcube.core.treatascube import merge_cube
+from xcube.core.treatascube import split_cube
 
 
 def resample_in_time(dataset: xr.Dataset,
@@ -38,8 +39,7 @@ def resample_in_time(dataset: xr.Dataset,
                      interp_kind=None,
                      time_chunk_size=None,
                      var_names: Sequence[str] = None,
-                     metadata: Dict[str, Any] = None,
-                     cube_asserted: bool = False) -> xr.Dataset:
+                     metadata: Dict[str, Any] = None) -> xr.Dataset:
     """
     Resample a dataset in the time dimension.
 
@@ -84,8 +84,7 @@ def resample_in_time(dataset: xr.Dataset,
         otherwise it is expected to be a valid cube.
     :return: A new xcube dataset resampled in time.
     """
-    if not cube_asserted:
-        assert_cube(dataset)
+    dataset, other_data_vars = split_cube(dataset)
 
     if frequency == 'all':
         time_gap = np.array(dataset.time[-1]) - np.array(dataset.time[0])
@@ -152,6 +151,9 @@ def resample_in_time(dataset: xr.Dataset,
     if isinstance(time_chunk_size, int) and time_chunk_size >= 0:
         chunk_sizes['time'] = time_chunk_size
 
+    # TODO consider cases where a data var in other_data_vars has time dimension
+    resampled_cube = merge_cube(resampled_cube, other_data_vars)
+
     return resampled_cube.chunk(chunk_sizes)
 
 
diff --git a/xcube/core/timeseries.py b/xcube/core/timeseries.py
index 882d482af..548f3b3a9 100644
--- a/xcube/core/timeseries.py
+++ b/xcube/core/timeseries.py
@@ -29,7 +29,7 @@
 
 from xcube.core.geom import mask_dataset_by_geometry, convert_geometry, GeometryLike, get_dataset_geometry
 from xcube.core.select import select_variables_subset
-from xcube.core.verify import assert_cube
+from xcube.core.treatascube import split_cube
 
 Date = Union[np.datetime64, str]
 
@@ -61,8 +61,7 @@ def get_time_series(cube: xr.Dataset,
                     agg_methods: Union[str, Sequence[str], AbstractSet[str]] = AGG_MEAN,
                     include_count: bool = False,
                     include_stdev: bool = False,
-                    use_groupby: bool = False,
-                    cube_asserted: bool = False) -> Optional[xr.Dataset]:
+                    use_groupby: bool = False) -> Optional[xr.Dataset]:
     """
     Get a time series dataset from a data *cube*.
 
@@ -97,8 +96,7 @@ def get_time_series(cube: xr.Dataset,
     :return: A new dataset with time-series for each variable.
     """
 
-    if not cube_asserted:
-        assert_cube(cube)
+    cube, other_data_vars = split_cube(cube)
 
     geometry = convert_geometry(geometry)
 
diff --git a/xcube/core/vars2dim.py b/xcube/core/vars2dim.py
index 7e3178c11..c90240e90 100644
--- a/xcube/core/vars2dim.py
+++ b/xcube/core/vars2dim.py
@@ -21,25 +21,33 @@
 
 import xarray as xr
 
-from xcube.core.verify import assert_cube
+from xcube.core.treatascube import merge_cube
+from xcube.core.treatascube import split_cube
 
 
 def vars_to_dim(cube: xr.Dataset,
                 dim_name: str = 'var',
                 var_name='data',
-                cube_asserted: bool = False):
+                consider_cube_data_vars_only: bool = False):
     """
     Convert data variables into a dimension.
 
     :param cube: The xcube dataset.
-    :param dim_name: The name of the new dimension and coordinate variable. Defaults to 'var'.
-    :param var_name: The name of the new, single data variable. Defaults to 'data'.
-    :param cube_asserted: If False, *cube* will be verified, otherwise it is expected to be a valid cube.
-    :return: A new xcube dataset with data variables turned into a new dimension.
+    :param dim_name: The name of the new dimension and coordinate variable.
+    Defaults to 'var'.
+    :param var_name: The name of the new, single data variable.
+    Defaults to 'data'.
+    :param consider_cube_data_vars_only: If true, the dimension will only consider the data
+    variables that carry spatial dimensions
+    If False, *cube* will be verified, otherwise it is expected to be a valid
+    cube.
+    :return: A new xcube dataset with data variables turned into a new
+    dimension.
     """
 
-    if not cube_asserted:
-        assert_cube(cube)
+    other_data_vars = {}
+    if consider_cube_data_vars_only:
+        cube, other_data_vars = split_cube(cube)
 
     if var_name == dim_name:
         raise ValueError("var_name must be different from dim_name")
@@ -48,8 +56,14 @@ def vars_to_dim(cube: xr.Dataset,
     if not data_var_names:
         raise ValueError("cube must not be empty")
 
-    da = xr.concat([cube[data_var_name] for data_var_name in data_var_names], dim_name)
+    da = xr.concat([cube[data_var_name] for data_var_name in data_var_names],
+                   dim_name)
     new_coord_var = xr.DataArray(data_var_names, dims=[dim_name])
     da = da.assign_coords(**{dim_name: new_coord_var})
 
-    return xr.Dataset(dict(**{var_name: da}))
+    dataset = xr.Dataset(dict(**{var_name: da}))
+
+    if consider_cube_data_vars_only:
+        dataset = merge_cube(dataset, other_data_vars)
+
+    return dataset

From be8bbf9d488c7010feb1fe7910d1a3af637cba94 Mon Sep 17 00:00:00 2001
From: Tonio Fincke <tonio.fincke@brockmann-consult.de>
Date: Fri, 30 Jul 2021 15:54:14 +0200
Subject: [PATCH 7/7] updated cubeconv.md

---
 docs/source/cubeconv.md | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/docs/source/cubeconv.md b/docs/source/cubeconv.md
index c8adae265..925a49204 100644
--- a/docs/source/cubeconv.md
+++ b/docs/source/cubeconv.md
@@ -58,8 +58,7 @@ ARD links:
     [Attribute Convention for Data Discovery](http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery) 
 * Dimensions: 
   * SHALL all be greater than zero.
-  + SHALL include two spatial dimensions  
-  * SHOULD include a dimension `time`
+  + SHALL include one temporal and two spatial dimensions  
   * SHOULD include a dimension `bnds` of size 2 that may be used by bounding 
     coordinate variables
 * Coordinate Variables
@@ -70,10 +69,10 @@ ARD links:
   * 1-dimensional coordinate variables SHOULD be named like the dimension they 
     describe 
   * For each dimension of a data variable, a coordinate variable MUST exist
-* Temporal coordinate variables: 
+* Temporal coordinate variable: 
   * SHALL provide time coordinates for a given time index.
   * MAY be non-equidistant or equidistant.
-  * SHOULD be named `time`    
+  * SHALL be named `time`    
   * One variable value SHALL provide observation or average time of 
     *cell centers*.
   * Attributes: 
@@ -95,12 +94,10 @@ ARD links:
   * SHOULD be named `<dim_name>_bnds`
   * `<bound_var>[<coord_dim>, 0]` SHALL provide the *lower cell boundary*,
     `<bound_var>[<coord_dim>, 1]` SHALL provide the *upper cell boundary*
-* Data variables: 
-  * MAY have any dimensionality, including no dimensions at all.
-  * SHALL have the spatial dimensions at the innermost position in case it has 
-    spatial dimensions (e.g., `[..., y, x]`)
-  * SHALL have its time dimension at the outermost position in case it has a
-    time dimension (e.g., `[time, ...]`)
+* Cube Data variables:
+  * SHALL have its time dimension at the outermost position and the 
+    spatial dimensions at the innermost positions (`[time, ..., y, x]` 
+    in this order (where `y` and `x` denote the spatial dimensions))
   * MAY have extra dimensions, e.g. `layer` (of the atmosphere) or 
     `band` (of a spectrum). These extra dimensions MUST be positioned between
     the time and the spatial coordinates
@@ -117,6 +114,10 @@ ARD links:
        for applying the colour bar. If not provided, minimum and maximum
        default to `valid_min`, `valid_max`. If neither are provided, 
        minimum and maximum default to `0` and `1`.
+* Non-Cube Data variables:
+  * Consists of all data variables that are not cube data variables as 
+    described above
+  * MAY have any dimensionality, including no dimensions at all
 
 ### WGS84 Schema (extends Basic)
 
@@ -134,7 +135,7 @@ ARD links:
 ### Generic Schema (extends Basic)
 
 * Dimensions:
-  * SHALL include two spatial dimensions, which SHOULD be named `y` and `x`
+  * SHALL include two spatial dimensions, the names `y` and `x` are RECOMMENDED
 * Spatial coordinate variables: 
   * MAY use any spatial grid and CRS.
   * SHOULD have attributes `standard_name`, `units`
@@ -143,7 +144,7 @@ ARD links:
   * MAY have `lon[<y_dim_name>,<x_dim_name>]`: longitude of *cell centers*. 
     *  Attributes: `standard_name="longitude"`, `units="degrees_east"`.
 * Grid Mapping variable:
-  * SHALL be included in case the CRS is not WGS84.
+  * MUST be included in case the CRS is not WGS84.
   * SHALL not carry any data, therefore it MAY be of any type
   * SHOULD be named `crs`  
   * MUST have attributes that describe a CF Grid Mapping v1.8 (see