Skip to content

Commit 63f19e2

Browse files
committed
initial draft of the datatree computation tutorial
1 parent 8e69353 commit 63f19e2

File tree

1 file changed

+219
-0
lines changed

1 file changed

+219
-0
lines changed
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "0",
6+
"metadata": {},
7+
"source": [
8+
"# Hierarchical computations\n",
9+
"\n",
10+
"In this lesson, we extend what we learned about [basic computation](#basic-computation) to hierarchical datasets. By the end of the lesson, we will be able to:\n",
11+
"\n",
12+
"- Apply basic arithmetic and label-aware reductions to xarray DataTree objects\n",
13+
"- Apply arbitrary functions across all nodes across a tree"
14+
]
15+
},
16+
{
17+
"cell_type": "code",
18+
"execution_count": null,
19+
"id": "1",
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"import xarray as xr\n",
24+
"import numpy as np\n",
25+
"\n",
26+
"xr.set_options(keep_attrs=True, display_expand_attrs=False, display_expand_data=False)"
27+
]
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"id": "2",
32+
"metadata": {},
33+
"source": [
34+
"## Example dataset\n",
35+
"\n",
36+
"First we load the NMC reanalysis air temperature dataset and arrange it to form a hierarchy of temporal resolutions:"
37+
]
38+
},
39+
{
40+
"cell_type": "code",
41+
"execution_count": null,
42+
"id": "3",
43+
"metadata": {},
44+
"outputs": [],
45+
"source": [
46+
"ds = xr.tutorial.open_dataset(\"air_temperature\")\n",
47+
"\n",
48+
"ds_daily = (\n",
49+
" ds.resample(time=\"D\").mean(\"time\").assign(mask=lambda ds: ds[\"air\"].isel(time=0) >= 293.15)\n",
50+
")\n",
51+
"ds_weekly = (\n",
52+
" ds.resample(time=\"W\").mean(\"time\").assign(mask=lambda ds: ds[\"air\"].isel(time=0) >= 293.15)\n",
53+
")\n",
54+
"ds_monthly = (\n",
55+
" ds.resample(time=\"ME\").mean(\"time\").assign(mask=lambda ds: ds[\"air\"].isel(time=0) >= 293.15)\n",
56+
")\n",
57+
"\n",
58+
"tree = xr.DataTree.from_dict({\"daily\": ds_daily, \"weekly\": ds_weekly, \"monthly\": ds_monthly})\n",
59+
"tree"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"id": "4",
65+
"metadata": {},
66+
"source": [
67+
"## Arithmetic\n",
68+
"\n",
69+
"As an extension to `Dataset`, `DataTree` objects automatically apply arithmetic to all variables within all nodes:"
70+
]
71+
},
72+
{
73+
"cell_type": "code",
74+
"execution_count": null,
75+
"id": "5",
76+
"metadata": {},
77+
"outputs": [],
78+
"source": [
79+
"tree - 273.15"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"id": "6",
85+
"metadata": {},
86+
"source": [
87+
"## Reductions\n",
88+
"\n",
89+
"In a similar way, we can reduce all nodes in the datatree at once:"
90+
]
91+
},
92+
{
93+
"cell_type": "code",
94+
"execution_count": null,
95+
"id": "7",
96+
"metadata": {},
97+
"outputs": [],
98+
"source": [
99+
"tree.mean(dim=[\"lat\", \"lon\"])"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"id": "8",
105+
"metadata": {},
106+
"source": [
107+
"## Applying functions designed for `Dataset` with `map_over_datasets`\n",
108+
"\n",
109+
"What if we wanted to convert the data to log-space? For a `Dataset` or `DataArray`, we could just use {py:func}`xarray.ufuncs.log`, but that does not support `DataTree` objects, yet:"
110+
]
111+
},
112+
{
113+
"cell_type": "code",
114+
"execution_count": null,
115+
"id": "9",
116+
"metadata": {},
117+
"outputs": [],
118+
"source": [
119+
"xr.ufuncs.log(tree)"
120+
]
121+
},
122+
{
123+
"cell_type": "markdown",
124+
"id": "10",
125+
"metadata": {},
126+
"source": [
127+
"Note how the result is a empty `Dataset`?\n",
128+
"\n",
129+
"To map a function to all nodes, we can use {py:func}`xarray.map_over_datasets` and {py:meth}`xarray.DataTree.map_over_datasets`: "
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": null,
135+
"id": "11",
136+
"metadata": {},
137+
"outputs": [],
138+
"source": [
139+
"tree.map_over_datasets(xr.ufuncs.log)"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"id": "12",
145+
"metadata": {},
146+
"source": [
147+
"We can also use a custom function to perform more complex operations, like subtracting a group mean:"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": null,
153+
"id": "13",
154+
"metadata": {},
155+
"outputs": [],
156+
"source": [
157+
"def demean(ds):\n",
158+
" return ds.groupby(\"time.day\") - ds.groupby(\"time.day\").mean()"
159+
]
160+
},
161+
{
162+
"cell_type": "markdown",
163+
"id": "14",
164+
"metadata": {},
165+
"source": [
166+
"Applying that to the dataset raises an error, though:"
167+
]
168+
},
169+
{
170+
"cell_type": "code",
171+
"execution_count": null,
172+
"id": "15",
173+
"metadata": {},
174+
"outputs": [],
175+
"source": [
176+
"tree.map_over_datasets(demean)"
177+
]
178+
},
179+
{
180+
"cell_type": "markdown",
181+
"id": "16",
182+
"metadata": {},
183+
"source": [
184+
"The reason for this error is that the root node does not have any variables, and thus in particular no `\"time\"` coordinate. To avoid the error, we have to skip computing the function for that node:"
185+
]
186+
},
187+
{
188+
"cell_type": "code",
189+
"execution_count": null,
190+
"id": "17",
191+
"metadata": {},
192+
"outputs": [],
193+
"source": [
194+
"def demean(ds):\n",
195+
" if \"time\" not in ds.coords:\n",
196+
" return ds\n",
197+
" return ds.groupby(\"time.day\") - ds.groupby(\"time.day\").mean()\n",
198+
"\n",
199+
"\n",
200+
"tree.map_over_datasets(demean)"
201+
]
202+
}
203+
],
204+
"metadata": {
205+
"language_info": {
206+
"codemirror_mode": {
207+
"name": "ipython",
208+
"version": 3
209+
},
210+
"file_extension": ".py",
211+
"mimetype": "text/x-python",
212+
"name": "python",
213+
"nbconvert_exporter": "python",
214+
"pygments_lexer": "ipython3"
215+
}
216+
},
217+
"nbformat": 4,
218+
"nbformat_minor": 5
219+
}

0 commit comments

Comments
 (0)