Skip to content

Commit acd78f9

Browse files
committed
add compute diagrams
1 parent 7f36400 commit acd78f9

File tree

4 files changed

+35
-36
lines changed

4 files changed

+35
-36
lines changed
188 KB
Loading
269 KB
Loading
201 KB
Loading

slides/scipy-2019.md

Lines changed: 35 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,13 @@ Alistair Miles ([@alimanfoo](https://github.com/alimanfoo)) - SciPy 2019
1515

1616
===
1717

18-
@@TODO image of tensor -> compute -> tensor
18+
### Problem statement
19+
20+
<p class="stretch"><img src="scipy-2019-files/compute1.png"></p>
21+
22+
There is some computation we want to perform.
23+
24+
Inputs and outputs are tensors.
1925

2026
5 key features...
2127

@@ -26,22 +32,20 @@ Alistair Miles ([@alimanfoo](https://github.com/alimanfoo)) - SciPy 2019
2632
Input and/or output tensors are too big to fit comfortably in main
2733
memory.
2834

29-
@@TODO image of larger than memory
30-
3135
===
3236

3337
### (2) Computation can be parallelised
3438

39+
<p class="stretch"><img src="scipy-2019-files/compute2.png"></p>
40+
3541
Some part of the computation can be parallelised by processing data in
3642
chunks.
3743

38-
@@TODO image of tensor -> parallel compute -> compute -> parallel compute -> tensor
39-
4044
===
4145

4246
### E.g., embarassingly parallel
4347

44-
@@TODO image of tensor -> parallel compute -> tensor
48+
<p class="stretch"><img src="scipy-2019-files/compute3.png"></p>
4549

4650
===
4751

@@ -50,8 +54,6 @@ chunks.
5054
Computational complexity is moderate &rarr; significant amount of time is
5155
spent in reading and/or writing data.
5256

53-
@@TODO image of tensor -> bottleneck -> parallel compute -> bottleneck -> tensor
54-
5557
N.B., bottleneck may be due to (a) limited I/O bandwidth, (b) I/O is
5658
not parallel.
5759

@@ -60,11 +62,8 @@ not parallel.
6062
### (4) Data are compressible
6163

6264
* Compression is a very active area of innovation.
63-
6465
* Modern compressors achieve good compression ratios with high speed.
65-
6666
* Opportunity to trade I/O for computation.
67-
6867
* Compression can increase effective I/O bandwidth, sometimes
6968
dramatically.
7069

@@ -75,17 +74,17 @@ not parallel.
7574
* Rich datasets &rarr; exploratory science &rarr; interactive analysis
7675
&rarr; many rounds of summarise, visualise, hypothesise, model,
7776
test, repeat.
78-
77+
7978
* E.g., genome sequencing.
8079

81-
* Each genome is a complete molecular blueprint for an organism.
82-
83-
* Each genome is a history book handed down through the ages, with
84-
each generation making its mark.
85-
8680
* Modern experiments sequence genomes from 1000s of individuals and
8781
compare them.
8882

83+
* Each genome is a complete molecular blueprint for an organism.
84+
85+
* Each genome is a history book handed down from the beginning of
86+
life on Earth, with each generation making its mark.
87+
8988
===
9089

9190
### Problem: key features
@@ -207,11 +206,11 @@ object stores?
207206
### Zarr Python
208207

209208
```bash
210-
pip install zarr
209+
$ pip install zarr
211210
```
212211

213212
```bash
214-
conda install -c conda-forge zarr
213+
$ conda install -c conda-forge zarr
215214
```
216215

217216
```python
@@ -231,20 +230,20 @@ conda install -c conda-forge zarr
231230
<zarr.hierarchy.Group '/'>
232231
```
233232

234-
Using DirectoryStore the data will be stored on the local file
235-
system.
233+
Using DirectoryStore the data will be stored in a directory on the
234+
local file system.
236235

237236
===
238237

239238
### Creating an array
240239

241240
```python
242-
>>> x = root.zeros('x',
243-
... shape=(10000, 10000),
244-
... chunks=(1000, 1000),
245-
... dtype='<i4')
246-
>>> x
247-
<zarr.core.Array '/x' (10000, 10000) int32>
241+
>>> hello = root.zeros('hello',
242+
... shape=(10000, 10000),
243+
... chunks=(1000, 1000),
244+
... dtype='<i4')
245+
>>> hello
246+
<zarr.core.Array '/hello' (10000, 10000) int32>
248247
```
249248

250249
* Creates a 2-dimensional array of 32-bit integers with 10,000 rows
@@ -259,12 +258,12 @@ and 10,000 columns.
259258
### Creating an array (h5py-style API)
260259

261260
```python
262-
>>> x = root.create_dataset('x',
263-
... shape=(10000, 10000),
264-
... chunks=(1000, 1000),
265-
... dtype='<i4')
266-
>>> x
267-
<zarr.core.Array '/x' (10000, 10000) int32>
261+
>>> hello = root.create_dataset('hello',
262+
... shape=(10000, 10000),
263+
... chunks=(1000, 1000),
264+
... dtype='<i4')
265+
>>> hello
266+
<zarr.core.Array '/hello' (10000, 10000) int32>
268267
```
269268

270269
===
@@ -365,7 +364,7 @@ example.zarr
365364
│   ├── 0.1
366365
│   ├── 1.0
367366
│   └── .zarray
368-
├── x
367+
├── hello
369368
│   └── .zarray
370369
└── .zgroup
371370

@@ -452,7 +451,7 @@ MemoryError
452451

453452
===
454453

455-
### DirectoryStore
454+
### DirectoryStore (reminder)
456455

457456
```bash
458457
$ tree -a example.zarr
@@ -462,7 +461,7 @@ example.zarr
462461
│   ├── 0.1
463462
│   ├── 1.0
464463
│   └── .zarray
465-
├── x
464+
├── hello
466465
│   └── .zarray
467466
└── .zgroup
468467

0 commit comments

Comments
 (0)