Skip to content

Commit 6b8d11a

Browse files
Merge pull request #146 from TomAugspurger/cleanup-2
Cleanup 2
2 parents 281059b + 3fa3ecd commit 6b8d11a

File tree

6 files changed

+110
-93
lines changed

6 files changed

+110
-93
lines changed

00_overview.ipynb

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,13 +103,12 @@
103103
"metadata": {},
104104
"source": [
105105
"* Reference\n",
106-
" * [Documentation](https://dask.pydata.org/en/latest/)\n",
106+
" * [Documentation](https://docs.dask.org/)\n",
107107
" * [Code](https://github.com/dask/dask/)\n",
108-
" * [Blog](http://matthewrocklin.com/blog/)\n",
108+
" * [Blog](https://blog.dask.org)\n",
109109
"* Ask for help\n",
110110
" * [dask](http://stackoverflow.com/questions/tagged/dask) tag on Stack Overflow\n",
111111
" * [github issues](https://github.com/dask/dask/issues/new) for bug reports and feature requests\n",
112-
" * [gitter](https://gitter.im/dask/dask) for quasi-realtime conversation\n",
113112
" "
114113
]
115114
},

01_dask.delayed.ipynb

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,15 @@
1313
"\n",
1414
"In this section we parallelize simple for-loop style code with Dask and `dask.delayed`. Often, this is the only function that you will need to convert functions for use with Dask.\n",
1515
"\n",
16-
"This is a simple way to use `dask` to parallelize existing codebases or build [complex systems](http://matthewrocklin.com/blog/work/2018/02/09/credit-models-with-dask). This will also help us to develop an understanding for later sections."
16+
"This is a simple way to use `dask` to parallelize existing codebases or build [complex systems](https://blog.dask.org/2018/02/09/credit-models-with-dask). This will also help us to develop an understanding for later sections.\n",
17+
"\n",
18+
"**Related Documentation**\n",
19+
"\n",
20+
"* [Delayed documentation](https://docs.dask.org/en/latest/delayed.html)\n",
21+
"* [Delayed screencast](https://www.youtube.com/watch?v=SHqFmynRxVU)\n",
22+
"* [Delayed API](https://docs.dask.org/en/latest/delayed-api.html)\n",
23+
"* [Delayed examples](https://examples.dask.org/delayed.html)\n",
24+
"* [Delayed best practices](https://docs.dask.org/en/latest/delayed-best-practices.html)"
1725
]
1826
},
1927
{
@@ -31,7 +39,7 @@
3139
"source": [
3240
"from dask.distributed import Client\n",
3341
"\n",
34-
"client = Client()"
42+
"client = Client(n_workers=4)"
3543
]
3644
},
3745
{
@@ -698,7 +706,11 @@
698706
"- How much speedup did you get? Is this how much speedup you'd expect?\n",
699707
"- Experiment with where to call `compute`. What happens when you call it on `sums` and `counts`? What happens if you wait and call it on `mean`?\n",
700708
"- Experiment with delaying the call to `sum`. What does the graph look like if `sum` is delayed? What does the graph look like if it isn't?\n",
701-
"- Can you think of any reason why you'd want to do the reduction one way over the other?"
709+
"- Can you think of any reason why you'd want to do the reduction one way over the other?\n",
710+
"\n",
711+
"### Learn More\n",
712+
"\n",
713+
"Visit the [Delayed documentation](https://docs.dask.org/en/latest/delayed.html). In particular, this [delayed screencast](https://www.youtube.com/watch?v=SHqFmynRxVU) will reinforce the concepts you learned here and the [delayed best practices](https://docs.dask.org/en/latest/delayed-best-practices.html) document collects advice on using `dask.delayed` well."
702714
]
703715
},
704716
{

02_bag.ipynb

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,10 @@
3131
" \n",
3232
"**Related Documentation**\n",
3333
"\n",
34-
"* [Bag Documenation](http://dask.pydata.org/en/latest/bag.html)\n",
35-
"* [Bag API](http://dask.pydata.org/en/latest/bag-api.html)"
34+
"* [Bag documentation](https://docs.dask.org/en/latest/bag.html)\n",
35+
"* [Bag screencast](https://youtu.be/-qIiJ1XtSv0)\n",
36+
"* [Bag API](https://docs.dask.org/en/latest/bag-api.html)\n",
37+
"* [Bag examples](https://examples.dask.org/bag.html)"
3638
]
3739
},
3840
{
@@ -50,7 +52,7 @@
5052
"source": [
5153
"from dask.distributed import Client\n",
5254
"\n",
53-
"client = Client()"
55+
"client = Client(n_workers=4)"
5456
]
5557
},
5658
{
@@ -626,6 +628,18 @@
626628
" a normalised dataframe."
627629
]
628630
},
631+
{
632+
"cell_type": "markdown",
633+
"metadata": {},
634+
"source": [
635+
"## Learn More\n",
636+
"\n",
637+
"* [Bag documentation](https://docs.dask.org/en/latest/bag.html)\n",
638+
"* [Bag screencast](https://youtu.be/-qIiJ1XtSv0)\n",
639+
"* [Bag API](https://docs.dask.org/en/latest/bag-api.html)\n",
640+
"* [Bag examples](https://examples.dask.org/bag.html)"
641+
]
642+
},
629643
{
630644
"cell_type": "markdown",
631645
"metadata": {},

03_array.ipynb

Lines changed: 27 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,15 @@
2525
"* **Larger-than-memory**: Lets you work on datasets that are larger than your available memory by breaking up your array into many small pieces, operating on those pieces in an order that minimizes the memory footprint of your computation, and effectively streaming data from disk.\n",
2626
"* **Blocked Algorithms**: Perform large computations by performing many smaller computations\n",
2727
"\n",
28+
"In this notebook, we'll build some understanding by implementing some blocked algorithms from scratch.\n",
29+
"We'll then use Dask Array to analyze large datasets, in parallel, using a familiar NumPy-like API.\n",
30+
"\n",
2831
"**Related Documentation**\n",
2932
"\n",
30-
"* [Documentation](http://dask.readthedocs.io/en/latest/array.html)\n",
31-
"* [API reference](http://dask.readthedocs.io/en/latest/array-api.html)"
33+
"* [Array documentation](https://docs.dask.org/en/latest/array.html)\n",
34+
"* [Array screencast](https://youtu.be/9h_61hXCDuI)\n",
35+
"* [Array API](https://docs.dask.org/en/latest/array-api.html)\n",
36+
"* [Array examples](https://examples.dask.org/array.html)"
3237
]
3338
},
3439
{
@@ -39,7 +44,7 @@
3944
"source": [
4045
"from dask.distributed import Client\n",
4146
"\n",
42-
"client = Client(processes=False)"
47+
"client = Client(n_workers=4, processes=False)"
4348
]
4449
},
4550
{
@@ -298,7 +303,7 @@
298303
"cell_type": "markdown",
299304
"metadata": {},
300305
"source": [
301-
"And the variance, std, etc.. This should be a trivial change to the example above.\n",
306+
"And the variance, std, etc.. This should be a small change to the example above.\n",
302307
"\n",
303308
"Look at what other operations you can do with the Jupyter notebook's tab-completion."
304309
]
@@ -639,11 +644,7 @@
639644
{
640645
"cell_type": "code",
641646
"execution_count": null,
642-
"metadata": {
643-
"jupyter": {
644-
"source_hidden": true
645-
}
646-
},
647+
"metadata": {},
647648
"outputs": [],
648649
"source": [
649650
"result = x.mean(axis=0)\n",
@@ -668,11 +669,7 @@
668669
{
669670
"cell_type": "code",
670671
"execution_count": null,
671-
"metadata": {
672-
"jupyter": {
673-
"source_hidden": true
674-
}
675-
},
672+
"metadata": {},
676673
"outputs": [],
677674
"source": [
678675
"result = x[0] - x.mean(axis=0)\n",
@@ -732,11 +729,7 @@
732729
{
733730
"cell_type": "code",
734731
"execution_count": null,
735-
"metadata": {
736-
"jupyter": {
737-
"source_hidden": true
738-
}
739-
},
732+
"metadata": {},
740733
"outputs": [],
741734
"source": [
742735
"import h5py\n",
@@ -919,24 +912,29 @@
919912
"Limitations\n",
920913
"-----------\n",
921914
"\n",
922-
"Dask.array does not implement the entire numpy interface. Users expecting this\n",
923-
"will be disappointed. Notably dask.array has the following failings:\n",
915+
"Dask Array does not implement the entire numpy interface. Users expecting this\n",
916+
"will be disappointed. Notably Dask Array has the following failings:\n",
924917
"\n",
925918
"1. Dask does not implement all of ``np.linalg``. This has been done by a\n",
926919
" number of excellent BLAS/LAPACK implementations and is the focus of\n",
927920
" numerous ongoing academic research projects.\n",
928-
"2. Dask.array does not support any operation where the resulting shape\n",
929-
" depends on the values of the array. In order to form the Dask graph we\n",
930-
" must be able to infer the shape of the array before actually executing the\n",
931-
" operation. This precludes operations like indexing one Dask array with\n",
932-
" another or operations like ``np.where``.\n",
933-
"3. Dask.array does not attempt operations like ``sort`` which are notoriously\n",
921+
"2. Dask Array does not support some operations where the resulting shape\n",
922+
" depends on the values of the array. For those that it does support\n",
923+
" (for example, masking one Dask Array with another boolean mask),\n",
924+
" the chunk sizes will be unknown, which may cause issues with other\n",
925+
" operations that need to know the chunk sizes.\n",
926+
"3. Dask Array does not attempt operations like ``sort`` which are notoriously\n",
934927
" difficult to do in parallel and are of somewhat diminished value on very\n",
935928
" large data (you rarely actually need a full sort).\n",
936929
" Often we include parallel-friendly alternatives like ``topk``.\n",
937930
"4. Dask development is driven by immediate need, and so many lesser used\n",
938-
" functions, like ``np.full_like`` have not been implemented purely out of\n",
939-
" laziness. These would make excellent community contributions."
931+
" functions, like ``np.sometrue`` have not been implemented purely out of\n",
932+
" laziness. These would make excellent community contributions.\n",
933+
" \n",
934+
"* [Array documentation](https://docs.dask.org/en/latest/array.html)\n",
935+
"* [Array screencast](https://youtu.be/9h_61hXCDuI)\n",
936+
"* [Array API](https://docs.dask.org/en/latest/array-api.html)\n",
937+
"* [Array examples](https://examples.dask.org/array.html)"
940938
]
941939
},
942940
{

0 commit comments

Comments
 (0)