Skip to content

Commit 2c1b2bf

Browse files
committed
data storage
1 parent 407542e commit 2c1b2bf

File tree

1 file changed

+311
-14
lines changed

1 file changed

+311
-14
lines changed

intermediate/intro-to-zarr.ipynb

Lines changed: 311 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -27,35 +27,332 @@
2727
]
2828
},
2929
{
30-
"cell_type": "code",
31-
"execution_count": null,
32-
"id": "ae9c38ed",
30+
"cell_type": "markdown",
31+
"id": "89a8f0ec",
3332
"metadata": {
3433
"vscode": {
3534
"languageId": "plaintext"
3635
}
3736
},
38-
"outputs": [],
39-
"source": []
37+
"source": [
38+
"### Zarr Fundamenals\n",
39+
"A Zarr array has the following important properties:\n",
40+
"- **Shape**: The dimensions of the array.\n",
41+
"- **Dtype**: The data type of each element (e.g., float32).\n",
42+
"- **Attributes**: Metadata stored as key-value pairs (e.g., units, description.\n",
43+
"- **Compressors**: Algorithms used to compress each chunk (e.g., Blosc, Zlib).\n",
44+
"\n",
45+
"\n",
46+
"#### Example: Creating and Inspecting a Zarr Array"
47+
]
4048
},
4149
{
4250
"cell_type": "code",
43-
"execution_count": null,
44-
"id": "233640b0",
45-
"metadata": {
46-
"vscode": {
47-
"languageId": "plaintext"
51+
"execution_count": 1,
52+
"id": "ae9c38ed",
53+
"metadata": {},
54+
"outputs": [
55+
{
56+
"data": {
57+
"text/plain": [
58+
"<zarr.core.Array (40, 50) float64>"
59+
]
60+
},
61+
"execution_count": 1,
62+
"metadata": {},
63+
"output_type": "execute_result"
4864
}
49-
},
50-
"outputs": [],
65+
],
5166
"source": [
52-
"The Xarray library provides a rich API for working with Zarr data, slicing, and selecting data. \n"
67+
"import zarr\n",
68+
"z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr')\n",
69+
"z"
5370
]
71+
},
72+
{
73+
"cell_type": "code",
74+
"execution_count": 2,
75+
"id": "0f39867a",
76+
"metadata": {},
77+
"outputs": [
78+
{
79+
"data": {
80+
"text/html": [
81+
"<table class=\"zarr-info\"><tbody><tr><th style=\"text-align: left\">Type</th><td style=\"text-align: left\">zarr.core.Array</td></tr><tr><th style=\"text-align: left\">Data type</th><td style=\"text-align: left\">float64</td></tr><tr><th style=\"text-align: left\">Shape</th><td style=\"text-align: left\">(40, 50)</td></tr><tr><th style=\"text-align: left\">Chunk shape</th><td style=\"text-align: left\">(10, 10)</td></tr><tr><th style=\"text-align: left\">Order</th><td style=\"text-align: left\">C</td></tr><tr><th style=\"text-align: left\">Read-only</th><td style=\"text-align: left\">False</td></tr><tr><th style=\"text-align: left\">Compressor</th><td style=\"text-align: left\">Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)</td></tr><tr><th style=\"text-align: left\">Store type</th><td style=\"text-align: left\">zarr.storage.DirectoryStore</td></tr><tr><th style=\"text-align: left\">No. bytes</th><td style=\"text-align: left\">16000 (15.6K)</td></tr><tr><th style=\"text-align: left\">No. bytes stored</th><td style=\"text-align: left\">337</td></tr><tr><th style=\"text-align: left\">Storage ratio</th><td style=\"text-align: left\">47.5</td></tr><tr><th style=\"text-align: left\">Chunks initialized</th><td style=\"text-align: left\">0/20</td></tr></tbody></table>"
82+
],
83+
"text/plain": [
84+
"Type : zarr.core.Array\n",
85+
"Data type : float64\n",
86+
"Shape : (40, 50)\n",
87+
"Chunk shape : (10, 10)\n",
88+
"Order : C\n",
89+
"Read-only : False\n",
90+
"Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)\n",
91+
"Store type : zarr.storage.DirectoryStore\n",
92+
"No. bytes : 16000 (15.6K)\n",
93+
"No. bytes stored : 337\n",
94+
"Storage ratio : 47.5\n",
95+
"Chunks initialized : 0/20"
96+
]
97+
},
98+
"execution_count": 2,
99+
"metadata": {},
100+
"output_type": "execute_result"
101+
}
102+
],
103+
"source": [
104+
"z.info"
105+
]
106+
},
107+
{
108+
"cell_type": "code",
109+
"execution_count": 3,
110+
"id": "dbe47985",
111+
"metadata": {},
112+
"outputs": [
113+
{
114+
"data": {
115+
"text/plain": [
116+
"0.0"
117+
]
118+
},
119+
"execution_count": 3,
120+
"metadata": {},
121+
"output_type": "execute_result"
122+
}
123+
],
124+
"source": [
125+
"z.fill_value"
126+
]
127+
},
128+
{
129+
"cell_type": "markdown",
130+
"id": "f5dcee68",
131+
"metadata": {},
132+
"source": [
133+
"No data has been written to the array yet. If we try to access the data, we will get a fill value: "
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": 4,
139+
"id": "7d905f06",
140+
"metadata": {},
141+
"outputs": [
142+
{
143+
"data": {
144+
"text/plain": [
145+
"0.0"
146+
]
147+
},
148+
"execution_count": 4,
149+
"metadata": {},
150+
"output_type": "execute_result"
151+
}
152+
],
153+
"source": [
154+
"z[0, 0]\n"
155+
]
156+
},
157+
{
158+
"cell_type": "markdown",
159+
"id": "a6091ba5",
160+
"metadata": {},
161+
"source": [
162+
"This is how we assign data to the array. When we do this it gets written immediately."
163+
]
164+
},
165+
{
166+
"cell_type": "code",
167+
"execution_count": 6,
168+
"id": "1ccc28b6",
169+
"metadata": {},
170+
"outputs": [
171+
{
172+
"data": {
173+
"text/html": [
174+
"<table class=\"zarr-info\"><tbody><tr><th style=\"text-align: left\">Type</th><td style=\"text-align: left\">zarr.core.Array</td></tr><tr><th style=\"text-align: left\">Data type</th><td style=\"text-align: left\">float64</td></tr><tr><th style=\"text-align: left\">Shape</th><td style=\"text-align: left\">(40, 50)</td></tr><tr><th style=\"text-align: left\">Chunk shape</th><td style=\"text-align: left\">(10, 10)</td></tr><tr><th style=\"text-align: left\">Order</th><td style=\"text-align: left\">C</td></tr><tr><th style=\"text-align: left\">Read-only</th><td style=\"text-align: left\">False</td></tr><tr><th style=\"text-align: left\">Compressor</th><td style=\"text-align: left\">Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)</td></tr><tr><th style=\"text-align: left\">Store type</th><td style=\"text-align: left\">zarr.storage.DirectoryStore</td></tr><tr><th style=\"text-align: left\">No. bytes</th><td style=\"text-align: left\">16000 (15.6K)</td></tr><tr><th style=\"text-align: left\">No. bytes stored</th><td style=\"text-align: left\">1277 (1.2K)</td></tr><tr><th style=\"text-align: left\">Storage ratio</th><td style=\"text-align: left\">12.5</td></tr><tr><th style=\"text-align: left\">Chunks initialized</th><td style=\"text-align: left\">20/20</td></tr></tbody></table>"
175+
],
176+
"text/plain": [
177+
"Type : zarr.core.Array\n",
178+
"Data type : float64\n",
179+
"Shape : (40, 50)\n",
180+
"Chunk shape : (10, 10)\n",
181+
"Order : C\n",
182+
"Read-only : False\n",
183+
"Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)\n",
184+
"Store type : zarr.storage.DirectoryStore\n",
185+
"No. bytes : 16000 (15.6K)\n",
186+
"No. bytes stored : 1277 (1.2K)\n",
187+
"Storage ratio : 12.5\n",
188+
"Chunks initialized : 20/20"
189+
]
190+
},
191+
"execution_count": 6,
192+
"metadata": {},
193+
"output_type": "execute_result"
194+
}
195+
],
196+
"source": [
197+
"z[:] = 1\n",
198+
"z.info"
199+
]
200+
},
201+
{
202+
"cell_type": "markdown",
203+
"id": "c6a059cc",
204+
"metadata": {},
205+
"source": [
206+
"##### Attributes\n",
207+
"\n",
208+
"We can attach arbitrary metadata to our Array via attributes:"
209+
]
210+
},
211+
{
212+
"cell_type": "code",
213+
"execution_count": 8,
214+
"id": "859c9cfe",
215+
"metadata": {},
216+
"outputs": [
217+
{
218+
"name": "stdout",
219+
"output_type": "stream",
220+
"text": [
221+
"{'standard_name': 'wind_speed', 'units': 'm/s'}\n"
222+
]
223+
}
224+
],
225+
"source": [
226+
"z.attrs['units'] = 'm/s'\n",
227+
"z.attrs['standard_name'] = 'wind_speed'\n",
228+
"print(dict(z.attrs))"
229+
]
230+
},
231+
{
232+
"cell_type": "markdown",
233+
"id": "23885ea0",
234+
"metadata": {},
235+
"source": [
236+
"### Zarr Data Storage\n",
237+
"\n",
238+
"Zarr can be stored in memory, on disk, or in cloud storage systems like Amazon S3.\n",
239+
"\n",
240+
"Let's look under the hood. _The ability to look inside a Zarr store and understand what is there is a deliberate design decision._"
241+
]
242+
},
243+
{
244+
"cell_type": "code",
245+
"execution_count": 9,
246+
"id": "1bbc935c",
247+
"metadata": {},
248+
"outputs": [
249+
{
250+
"data": {
251+
"text/plain": [
252+
"<zarr.storage.DirectoryStore at 0x107530650>"
253+
]
254+
},
255+
"execution_count": 9,
256+
"metadata": {},
257+
"output_type": "execute_result"
258+
}
259+
],
260+
"source": [
261+
"z.store"
262+
]
263+
},
264+
{
265+
"cell_type": "code",
266+
"execution_count": 10,
267+
"id": "51953f01",
268+
"metadata": {},
269+
"outputs": [
270+
{
271+
"name": "stdout",
272+
"output_type": "stream",
273+
"text": [
274+
"\u001b[01;34mtest.zarr\u001b[0m\n",
275+
"├── \u001b[00m.zarray\u001b[0m\n",
276+
"├── \u001b[00m.zattrs\u001b[0m\n",
277+
"├── \u001b[00m0.0\u001b[0m\n",
278+
"├── \u001b[00m0.1\u001b[0m\n",
279+
"├── \u001b[00m0.2\u001b[0m\n",
280+
"├── \u001b[00m0.3\u001b[0m\n",
281+
"├── \u001b[00m0.4\u001b[0m\n",
282+
"├── \u001b[00m1.0\u001b[0m\n",
283+
"├── \u001b[00m1.1\u001b[0m\n"
284+
]
285+
}
286+
],
287+
"source": [
288+
"!tree -a test.zarr | head"
289+
]
290+
},
291+
{
292+
"cell_type": "code",
293+
"execution_count": 11,
294+
"id": "9a6365b7",
295+
"metadata": {},
296+
"outputs": [
297+
{
298+
"name": "stdout",
299+
"output_type": "stream",
300+
"text": [
301+
"{'chunks': [10, 10], 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'dtype': '<f8', 'fill_value': 0.0, 'filters': None, 'order': 'C', 'shape': [40, 50], 'zarr_format': 2}\n"
302+
]
303+
}
304+
],
305+
"source": [
306+
"import json\n",
307+
"with open('test.zarr/.zarray') as fp:\n",
308+
" print(json.load(fp))"
309+
]
310+
},
311+
{
312+
"cell_type": "code",
313+
"execution_count": 12,
314+
"id": "d8f05ea3",
315+
"metadata": {},
316+
"outputs": [
317+
{
318+
"name": "stdout",
319+
"output_type": "stream",
320+
"text": [
321+
"{'standard_name': 'wind_speed', 'units': 'm/s'}\n"
322+
]
323+
}
324+
],
325+
"source": [
326+
"with open('test.zarr/.zattrs') as fp:\n",
327+
" print(json.load(fp))"
328+
]
329+
},
330+
{
331+
"cell_type": "code",
332+
"execution_count": null,
333+
"id": "1e0d1a8e",
334+
"metadata": {},
335+
"outputs": [],
336+
"source": []
54337
}
55338
],
56339
"metadata": {
340+
"kernelspec": {
341+
"display_name": "ERA5_interactive",
342+
"language": "python",
343+
"name": "python3"
344+
},
57345
"language_info": {
58-
"name": "python"
346+
"codemirror_mode": {
347+
"name": "ipython",
348+
"version": 3
349+
},
350+
"file_extension": ".py",
351+
"mimetype": "text/x-python",
352+
"name": "python",
353+
"nbconvert_exporter": "python",
354+
"pygments_lexer": "ipython3",
355+
"version": "3.11.9"
59356
}
60357
},
61358
"nbformat": 4,

0 commit comments

Comments
 (0)