|
25 | 25 | "* **Larger-than-memory**: Lets you work on datasets that are larger than your available memory by breaking up your array into many small pieces, operating on those pieces in an order that minimizes the memory footprint of your computation, and effectively streaming data from disk.\n", |
26 | 26 | "* **Blocked Algorithms**: Perform large computations by performing many smaller computations\n", |
27 | 27 | "\n", |
| 28 | + "In this notebook, we'll build some understanding by implementing some blocked algorithms from scratch.\n", |
| 29 | + "We'll then use Dask Array to analyze large datasets, in parallel, using a familiar NumPy-like API.\n", |
| 30 | + "\n", |
28 | 31 | "**Related Documentation**\n", |
29 | 32 | "\n", |
30 | | - "* [Documentation](http://dask.readthedocs.io/en/latest/array.html)\n", |
31 | | - "* [API reference](http://dask.readthedocs.io/en/latest/array-api.html)" |
| 33 | + "* [Array documentation](https://docs.dask.org/en/latest/array.html)\n", |
| 34 | + "* [Array screencast](https://youtu.be/9h_61hXCDuI)\n", |
| 35 | + "* [Array API](https://docs.dask.org/en/latest/array-api.html)\n", |
| 36 | + "* [Array examples](https://examples.dask.org/array.html)" |
32 | 37 | ] |
33 | 38 | }, |
34 | 39 | { |
|
39 | 44 | "source": [ |
40 | 45 | "from dask.distributed import Client\n", |
41 | 46 | "\n", |
42 | | - "client = Client(processes=False)" |
| 47 | + "client = Client(n_workers=4, processes=False)" |
43 | 48 | ] |
44 | 49 | }, |
45 | 50 | { |
|
298 | 303 | "cell_type": "markdown", |
299 | 304 | "metadata": {}, |
300 | 305 | "source": [ |
301 | | - "And the variance, std, etc.. This should be a trivial change to the example above.\n", |
| 306 | + "And the variance, std, etc.. This should be a small change to the example above.\n", |
302 | 307 | "\n", |
303 | 308 | "Look at what other operations you can do with the Jupyter notebook's tab-completion." |
304 | 309 | ] |
|
639 | 644 | { |
640 | 645 | "cell_type": "code", |
641 | 646 | "execution_count": null, |
642 | | - "metadata": { |
643 | | - "jupyter": { |
644 | | - "source_hidden": true |
645 | | - } |
646 | | - }, |
| 647 | + "metadata": {}, |
647 | 648 | "outputs": [], |
648 | 649 | "source": [ |
649 | 650 | "result = x.mean(axis=0)\n", |
|
668 | 669 | { |
669 | 670 | "cell_type": "code", |
670 | 671 | "execution_count": null, |
671 | | - "metadata": { |
672 | | - "jupyter": { |
673 | | - "source_hidden": true |
674 | | - } |
675 | | - }, |
| 672 | + "metadata": {}, |
676 | 673 | "outputs": [], |
677 | 674 | "source": [ |
678 | 675 | "result = x[0] - x.mean(axis=0)\n", |
|
732 | 729 | { |
733 | 730 | "cell_type": "code", |
734 | 731 | "execution_count": null, |
735 | | - "metadata": { |
736 | | - "jupyter": { |
737 | | - "source_hidden": true |
738 | | - } |
739 | | - }, |
| 732 | + "metadata": {}, |
740 | 733 | "outputs": [], |
741 | 734 | "source": [ |
742 | 735 | "import h5py\n", |
|
919 | 912 | "Limitations\n", |
920 | 913 | "-----------\n", |
921 | 914 | "\n", |
922 | | - "Dask.array does not implement the entire numpy interface. Users expecting this\n", |
923 | | - "will be disappointed. Notably dask.array has the following failings:\n", |
| 915 | + "Dask Array does not implement the entire numpy interface. Users expecting this\n", |
| 916 | + "will be disappointed. Notably Dask Array has the following failings:\n", |
924 | 917 | "\n", |
925 | 918 | "1. Dask does not implement all of ``np.linalg``. This has been done by a\n", |
926 | 919 | " number of excellent BLAS/LAPACK implementations and is the focus of\n", |
927 | 920 | " numerous ongoing academic research projects.\n", |
928 | | - "2. Dask.array does not support any operation where the resulting shape\n", |
929 | | - " depends on the values of the array. In order to form the Dask graph we\n", |
930 | | - " must be able to infer the shape of the array before actually executing the\n", |
931 | | - " operation. This precludes operations like indexing one Dask array with\n", |
932 | | - " another or operations like ``np.where``.\n", |
933 | | - "3. Dask.array does not attempt operations like ``sort`` which are notoriously\n", |
| 921 | + "2. Dask Array does not support some operations where the resulting shape\n", |
| 922 | + " depends on the values of the array. For those that it does support\n", |
| 923 | + " (for example, masking one Dask Array with another boolean mask),\n", |
| 924 | + " the chunk sizes will be unknown, which may cause issues with other\n", |
| 925 | + " operations that need to know the chunk sizes.\n", |
| 926 | + "3. Dask Array does not attempt operations like ``sort`` which are notoriously\n", |
934 | 927 | " difficult to do in parallel and are of somewhat diminished value on very\n", |
935 | 928 | " large data (you rarely actually need a full sort).\n", |
936 | 929 | " Often we include parallel-friendly alternatives like ``topk``.\n", |
937 | 930 | "4. Dask development is driven by immediate need, and so many lesser used\n", |
938 | | - " functions, like ``np.full_like`` have not been implemented purely out of\n", |
939 | | - " laziness. These would make excellent community contributions." |
| 931 | + " functions, like ``np.sometrue`` have not been implemented purely out of\n", |
| 932 | + " laziness. These would make excellent community contributions.\n", |
| 933 | + " \n", |
| 934 | + "* [Array documentation](https://docs.dask.org/en/latest/array.html)\n", |
| 935 | + "* [Array screencast](https://youtu.be/9h_61hXCDuI)\n", |
| 936 | + "* [Array API](https://docs.dask.org/en/latest/array-api.html)\n", |
| 937 | + "* [Array examples](https://examples.dask.org/array.html)" |
940 | 938 | ] |
941 | 939 | }, |
942 | 940 | { |
|
0 commit comments