Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Commit 9a6f674

Browse files
TomNicholasmalmans2pre-commit-ci[bot]
authored
Replace .ds with immutable DatasetView (#99)
* sketching out changes needed to integrate variables into DataTree * fixed some other basic conflicts * fix mypy errors * can create basic datatree node objects again * child-variable name collisions dectected correctly * in-progres * add _replace method * updated tests to assert identical instead of check .ds is expected_ds * refactor .ds setter to use _replace * refactor init to use _replace * refactor test tree to avoid init * attempt at copy methods * rewrote implementation of .copy method * xfailing test for deepcopying * pseudocode implementation of DatasetView * Revert "pseudocode implementation of DatasetView" This reverts commit 52ef23b. * pseudocode implementation of DatasetView * removed duplicated implementation of copy * reorganise API docs * expose data_vars, coords etc. properties * try except with calculate_dimensions private import * add keys/values/items methods * don't use has_data when .variables would do * change asserts to not fail just because of differing types * full sketch of how DatasetView could work * added tests for DatasetView * remove commented pseudocode * explanation of basic properties * add data structures page to index * revert adding documentation in favour of that going in a different PR * correct deepcopy tests * use .data_vars in copy tests * add test for arithmetic with .ds * remove reference to wrapping node in DatasetView * clarify type through renaming variables * remove test for out-of-node access * make imports depend on most recent version of xarray Co-authored-by: Mattia Almansi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove try except for internal import * depend on latest pre-release of xarray * correct name of version * xarray pre-release under pip in ci envs * correct methods * whatsnews * fix fixture in test * whatsnew * improve docstrings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Mattia Almansi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent c2b6400 commit 9a6f674

File tree

4 files changed

+204
-34
lines changed

4 files changed

+204
-34
lines changed

datatree/datatree.py

Lines changed: 137 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
Set,
2020
Tuple,
2121
Union,
22+
overload,
2223
)
2324

2425
import pandas as pd
@@ -86,6 +87,135 @@ def _check_for_name_collisions(
8687
)
8788

8889

90+
class DatasetView(Dataset):
91+
"""
92+
An immutable Dataset-like view onto the data in a single DataTree node.
93+
94+
In-place operations modifying this object should raise an AttributeError.
95+
96+
Operations returning a new result will return a new xarray.Dataset object.
97+
This includes all API on Dataset, which will be inherited.
98+
99+
This requires overriding all inherited private constructors.
100+
"""
101+
102+
# TODO what happens if user alters (in-place) a DataArray they extracted from this object?
103+
104+
def __init__(
105+
self,
106+
data_vars: Mapping[Any, Any] = None,
107+
coords: Mapping[Any, Any] = None,
108+
attrs: Mapping[Any, Any] = None,
109+
):
110+
raise AttributeError("DatasetView objects are not to be initialized directly")
111+
112+
@classmethod
113+
def _from_node(
114+
cls,
115+
wrapping_node: DataTree,
116+
) -> DatasetView:
117+
"""Constructor, using dataset attributes from wrapping node"""
118+
119+
obj: DatasetView = object.__new__(cls)
120+
obj._variables = wrapping_node._variables
121+
obj._coord_names = wrapping_node._coord_names
122+
obj._dims = wrapping_node._dims
123+
obj._indexes = wrapping_node._indexes
124+
obj._attrs = wrapping_node._attrs
125+
obj._close = wrapping_node._close
126+
obj._encoding = wrapping_node._encoding
127+
128+
return obj
129+
130+
def __setitem__(self, key, val) -> None:
131+
raise AttributeError(
132+
"Mutation of the DatasetView is not allowed, please use __setitem__ on the wrapping DataTree node, "
133+
"or use `DataTree.to_dataset()` if you want a mutable dataset"
134+
)
135+
136+
def update(self, other) -> None:
137+
raise AttributeError(
138+
"Mutation of the DatasetView is not allowed, please use .update on the wrapping DataTree node, "
139+
"or use `DataTree.to_dataset()` if you want a mutable dataset"
140+
)
141+
142+
# FIXME https://github.com/python/mypy/issues/7328
143+
@overload
144+
def __getitem__(self, key: Mapping) -> Dataset: # type: ignore[misc]
145+
...
146+
147+
@overload
148+
def __getitem__(self, key: Hashable) -> DataArray: # type: ignore[misc]
149+
...
150+
151+
@overload
152+
def __getitem__(self, key: Any) -> Dataset:
153+
...
154+
155+
def __getitem__(self, key) -> DataArray:
156+
# TODO call the `_get_item` method of DataTree to allow path-like access to contents of other nodes
157+
# For now just call Dataset.__getitem__
158+
return Dataset.__getitem__(self, key)
159+
160+
@classmethod
161+
def _construct_direct(
162+
cls,
163+
variables: dict[Any, Variable],
164+
coord_names: set[Hashable],
165+
dims: dict[Any, int] = None,
166+
attrs: dict = None,
167+
indexes: dict[Any, Index] = None,
168+
encoding: dict = None,
169+
close: Callable[[], None] = None,
170+
) -> Dataset:
171+
"""
172+
Overriding this method (along with ._replace) and modifying it to return a Dataset object
173+
should hopefully ensure that the return type of any method on this object is a Dataset.
174+
"""
175+
if dims is None:
176+
dims = calculate_dimensions(variables)
177+
if indexes is None:
178+
indexes = {}
179+
obj = object.__new__(Dataset)
180+
obj._variables = variables
181+
obj._coord_names = coord_names
182+
obj._dims = dims
183+
obj._indexes = indexes
184+
obj._attrs = attrs
185+
obj._close = close
186+
obj._encoding = encoding
187+
return obj
188+
189+
def _replace(
190+
self,
191+
variables: dict[Hashable, Variable] = None,
192+
coord_names: set[Hashable] = None,
193+
dims: dict[Any, int] = None,
194+
attrs: dict[Hashable, Any] | None | Default = _default,
195+
indexes: dict[Hashable, Index] = None,
196+
encoding: dict | None | Default = _default,
197+
inplace: bool = False,
198+
) -> Dataset:
199+
"""
200+
Overriding this method (along with ._construct_direct) and modifying it to return a Dataset object
201+
should hopefully ensure that the return type of any method on this object is a Dataset.
202+
"""
203+
204+
if inplace:
205+
raise AttributeError("In-place mutation of the DatasetView is not allowed")
206+
207+
return Dataset._replace(
208+
self,
209+
variables=variables,
210+
coord_names=coord_names,
211+
dims=dims,
212+
attrs=attrs,
213+
indexes=indexes,
214+
encoding=encoding,
215+
inplace=inplace,
216+
)
217+
218+
89219
class DataTree(
90220
TreeNode,
91221
MappedDatasetMethodsMixin,
@@ -217,10 +347,13 @@ def parent(self: DataTree, new_parent: DataTree) -> None:
217347
self._set_parent(new_parent, self.name)
218348

219349
@property
220-
def ds(self) -> Dataset:
221-
"""The data in this node, returned as a Dataset."""
222-
# TODO change this to return only an immutable view onto this node's data (see GH #80)
223-
return self.to_dataset()
350+
def ds(self) -> DatasetView:
351+
"""
352+
An immutable Dataset-like view onto the data in this node.
353+
354+
For a mutable Dataset containing the same data as in this node, use `.to_dataset()` instead.
355+
"""
356+
return DatasetView._from_node(self)
224357

225358
@ds.setter
226359
def ds(self, data: Union[Dataset, DataArray] = None) -> None:

datatree/mapping.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -189,10 +189,10 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]:
189189
*args_as_tree_length_iterables,
190190
*list(kwargs_as_tree_length_iterables.values()),
191191
):
192-
node_args_as_datasets = [
192+
node_args_as_datasetviews = [
193193
a.ds if isinstance(a, DataTree) else a for a in all_node_args[:n_args]
194194
]
195-
node_kwargs_as_datasets = dict(
195+
node_kwargs_as_datasetviews = dict(
196196
zip(
197197
[k for k in kwargs_as_tree_length_iterables.keys()],
198198
[
@@ -204,7 +204,7 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]:
204204

205205
# Now we can call func on the data in this particular set of corresponding nodes
206206
results = (
207-
func(*node_args_as_datasets, **node_kwargs_as_datasets)
207+
func(*node_args_as_datasetviews, **node_kwargs_as_datasetviews)
208208
if not node_of_first_tree.is_empty
209209
else None
210210
)

datatree/tests/test_datatree.py

Lines changed: 56 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import pytest
55
import xarray as xr
66
import xarray.testing as xrt
7-
from xarray.tests import source_ndarray
7+
from xarray.tests import create_test_data, source_ndarray
88

99
import datatree.testing as dtt
1010
from datatree import DataTree
@@ -16,7 +16,7 @@ def test_empty(self):
1616
assert dt.name == "root"
1717
assert dt.parent is None
1818
assert dt.children == {}
19-
xrt.assert_identical(dt.ds, xr.Dataset())
19+
xrt.assert_identical(dt.to_dataset(), xr.Dataset())
2020

2121
def test_unnamed(self):
2222
dt = DataTree()
@@ -66,7 +66,7 @@ class TestStoreDatasets:
6666
def test_create_with_data(self):
6767
dat = xr.Dataset({"a": 0})
6868
john = DataTree(name="john", data=dat)
69-
xrt.assert_identical(john.ds, dat)
69+
xrt.assert_identical(john.to_dataset(), dat)
7070

7171
with pytest.raises(TypeError):
7272
DataTree(name="mary", parent=john, data="junk") # noqa
@@ -75,7 +75,7 @@ def test_set_data(self):
7575
john = DataTree(name="john")
7676
dat = xr.Dataset({"a": 0})
7777
john.ds = dat
78-
xrt.assert_identical(john.ds, dat)
78+
xrt.assert_identical(john.to_dataset(), dat)
7979
with pytest.raises(TypeError):
8080
john.ds = "junk"
8181

@@ -100,16 +100,9 @@ def test_assign_when_already_child_with_variables_name(self):
100100
dt.ds = xr.Dataset({"a": 0})
101101

102102
dt.ds = xr.Dataset()
103+
new_ds = dt.to_dataset().assign(a=xr.DataArray(0))
103104
with pytest.raises(KeyError, match="names would collide"):
104-
dt.ds = dt.ds.assign(a=xr.DataArray(0))
105-
106-
@pytest.mark.xfail
107-
def test_update_when_already_child_with_variables_name(self):
108-
# See issue #38
109-
dt = DataTree(name="root", data=None)
110-
DataTree(name="a", data=None, parent=dt)
111-
with pytest.raises(KeyError, match="names would collide"):
112-
dt.ds["a"] = xr.DataArray(0)
105+
dt.ds = new_ds
113106

114107

115108
class TestGet:
@@ -275,13 +268,13 @@ def test_setitem_new_empty_node(self):
275268
john["mary"] = DataTree()
276269
mary = john["mary"]
277270
assert isinstance(mary, DataTree)
278-
xrt.assert_identical(mary.ds, xr.Dataset())
271+
xrt.assert_identical(mary.to_dataset(), xr.Dataset())
279272

280273
def test_setitem_overwrite_data_in_node_with_none(self):
281274
john = DataTree(name="john")
282275
mary = DataTree(name="mary", parent=john, data=xr.Dataset())
283276
john["mary"] = DataTree()
284-
xrt.assert_identical(mary.ds, xr.Dataset())
277+
xrt.assert_identical(mary.to_dataset(), xr.Dataset())
285278

286279
john.ds = xr.Dataset()
287280
with pytest.raises(ValueError, match="has no name"):
@@ -292,21 +285,21 @@ def test_setitem_dataset_on_this_node(self):
292285
data = xr.Dataset({"temp": [0, 50]})
293286
results = DataTree(name="results")
294287
results["."] = data
295-
xrt.assert_identical(results.ds, data)
288+
xrt.assert_identical(results.to_dataset(), data)
296289

297290
@pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes")
298291
def test_setitem_dataset_as_new_node(self):
299292
data = xr.Dataset({"temp": [0, 50]})
300293
folder1 = DataTree(name="folder1")
301294
folder1["results"] = data
302-
xrt.assert_identical(folder1["results"].ds, data)
295+
xrt.assert_identical(folder1["results"].to_dataset(), data)
303296

304297
@pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes")
305298
def test_setitem_dataset_as_new_node_requiring_intermediate_nodes(self):
306299
data = xr.Dataset({"temp": [0, 50]})
307300
folder1 = DataTree(name="folder1")
308301
folder1["results/highres"] = data
309-
xrt.assert_identical(folder1["results/highres"].ds, data)
302+
xrt.assert_identical(folder1["results/highres"].to_dataset(), data)
310303

311304
def test_setitem_named_dataarray(self):
312305
da = xr.DataArray(name="temp", data=[0, 50])
@@ -341,7 +334,7 @@ def test_setitem_dataarray_replace_existing_node(self):
341334
p = xr.DataArray(data=[2, 3])
342335
results["pressure"] = p
343336
expected = t.assign(pressure=p)
344-
xrt.assert_identical(results.ds, expected)
337+
xrt.assert_identical(results.to_dataset(), expected)
345338

346339

347340
class TestDictionaryInterface:
@@ -355,16 +348,16 @@ def test_data_in_root(self):
355348
assert dt.name is None
356349
assert dt.parent is None
357350
assert dt.children == {}
358-
xrt.assert_identical(dt.ds, dat)
351+
xrt.assert_identical(dt.to_dataset(), dat)
359352

360353
def test_one_layer(self):
361354
dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2})
362355
dt = DataTree.from_dict({"run1": dat1, "run2": dat2})
363-
xrt.assert_identical(dt.ds, xr.Dataset())
356+
xrt.assert_identical(dt.to_dataset(), xr.Dataset())
364357
assert dt.name is None
365-
xrt.assert_identical(dt["run1"].ds, dat1)
358+
xrt.assert_identical(dt["run1"].to_dataset(), dat1)
366359
assert dt["run1"].children == {}
367-
xrt.assert_identical(dt["run2"].ds, dat2)
360+
xrt.assert_identical(dt["run2"].to_dataset(), dat2)
368361
assert dt["run2"].children == {}
369362

370363
def test_two_layers(self):
@@ -373,13 +366,13 @@ def test_two_layers(self):
373366
assert "highres" in dt.children
374367
assert "lowres" in dt.children
375368
highres_run = dt["highres/run"]
376-
xrt.assert_identical(highres_run.ds, dat1)
369+
xrt.assert_identical(highres_run.to_dataset(), dat1)
377370

378371
def test_nones(self):
379372
dt = DataTree.from_dict({"d": None, "d/e": None})
380373
assert [node.name for node in dt.subtree] == [None, "d", "e"]
381374
assert [node.path for node in dt.subtree] == ["/", "/d", "/d/e"]
382-
xrt.assert_identical(dt["d/e"].ds, xr.Dataset())
375+
xrt.assert_identical(dt["d/e"].to_dataset(), xr.Dataset())
383376

384377
def test_full(self, simple_datatree):
385378
dt = simple_datatree
@@ -409,8 +402,44 @@ def test_roundtrip_unnamed_root(self, simple_datatree):
409402
assert roundtrip.equals(dt)
410403

411404

412-
class TestBrowsing:
413-
...
405+
class TestDatasetView:
406+
def test_view_contents(self):
407+
ds = create_test_data()
408+
dt = DataTree(data=ds)
409+
assert ds.identical(
410+
dt.ds
411+
) # this only works because Dataset.identical doesn't check types
412+
assert isinstance(dt.ds, xr.Dataset)
413+
414+
def test_immutability(self):
415+
# See issue #38
416+
dt = DataTree(name="root", data=None)
417+
DataTree(name="a", data=None, parent=dt)
418+
419+
with pytest.raises(
420+
AttributeError, match="Mutation of the DatasetView is not allowed"
421+
):
422+
dt.ds["a"] = xr.DataArray(0)
423+
424+
with pytest.raises(
425+
AttributeError, match="Mutation of the DatasetView is not allowed"
426+
):
427+
dt.ds.update({"a": 0})
428+
429+
# TODO are there any other ways you can normally modify state (in-place)?
430+
# (not attribute-like assignment because that doesn't work on Dataset anyway)
431+
432+
def test_methods(self):
433+
ds = create_test_data()
434+
dt = DataTree(data=ds)
435+
assert ds.mean().identical(dt.ds.mean())
436+
assert type(dt.ds.mean()) == xr.Dataset
437+
438+
def test_arithmetic(self, create_test_datatree):
439+
dt = create_test_datatree()
440+
expected = create_test_datatree(modify=lambda ds: 10.0 * ds)["set1"]
441+
result = 10.0 * dt["set1"].ds
442+
assert result.identical(expected)
414443

415444

416445
class TestRestructuring:

docs/source/whats-new.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,20 @@ New Features
2929
Breaking changes
3030
~~~~~~~~~~~~~~~~
3131

32+
- The ``DataTree.ds`` attribute now returns a view onto an immutable Dataset-like object, instead of an actual instance
33+
of ``xarray.Dataset``. This make break existing ``isinstance`` checks or ``assert`` comparisons. (:pull:`99`)
34+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
35+
3236
Deprecations
3337
~~~~~~~~~~~~
3438

3539
Bug fixes
3640
~~~~~~~~~
3741

42+
- Modifying the contents of a ``DataTree`` object via the ``DataTree.ds`` attribute is now forbidden, which prevents
43+
any possibility of the contents of a ``DataTree`` object and its ``.ds`` attribute diverging. (:issue:`38`, :pull:`99`)
44+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
45+
3846
Documentation
3947
~~~~~~~~~~~~~
4048

0 commit comments

Comments
 (0)