Skip to content

Commit 6161900

Browse files
committed
Fix some linter issues
1 parent 788dc77 commit 6161900

File tree

1 file changed

+30
-22
lines changed

1 file changed

+30
-22
lines changed

_episodes/11-dask-configuration.md

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Configuring Dask"
33
teaching: 20 (+ optional 10)
4-
exercises: 40 (+ optional 20)
4+
exercises: 40 (+ optional 20)
55
compatibility: ESMValCore v2.10.0
66

77
questions:
@@ -27,22 +27,24 @@ keypoints:
2727
When processing larger amounts of data, and especially when the tool crashes
2828
when running a recipe because there is not enough memory available, it is
2929
usually beneficial to change the default
30-
[Dask configuration](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/quickstart/configure.html#dask-configuration).
30+
[Dask configuration](https://docs.esmvaltool.org/
31+
projects/ESMValCore/en/latest/quickstart/configure.html#dask-configuration).
3132

3233
The preprocessor functions in ESMValCore use the
3334
[Iris](https://scitools-iris.readthedocs.io) library, which in turn uses Dask
3435
Arrays to be able to process datasets that are larger than the available memory.
3536
It is not necesary to understand how these work exactly to use the ESMValTool,
3637
but if you are interested there is a
3738
[Dask Array Tutorial](https://tutorial.dask.org/02_array.html) as a well as a
38-
[guide to "Lazy Data"](https://scitools-iris.readthedocs.io/en/stable/userguide/real_and_lazy_data.html)
39+
[guide to "Lazy Data"](https://scitools-iris.readthedocs.io/
40+
en/stable/userguide/real_and_lazy_data.html)
3941
available. Lazy data is the term the Iris library uses for Dask Arrays.
4042

4143

4244
### Workers
4345
The most important concept to understand when using Dask Arrays is the concept
44-
of a Dask "worker". With Dask, computations are run in parallel by little programs
45-
that are called "workers". These could be on running on the
46+
of a Dask "worker". With Dask, computations are run in parallel by little
47+
programs that are called "workers". These could be on running on the
4648
same machine that you are running ESMValTool on, or they could be on one or
4749
more other computers. Dask workers typically require 2 to 4 gigabytes (GiB) of
4850
memory (RAM) each. In order to avoid running out of memory, it is important
@@ -66,15 +68,15 @@ package is more suitable for larger computations.
6668
>
6769
> In the config-user.yml file, there is a setting called ``max_parallel_tasks``.
6870
> Any variable or diagnostic script in the recipe is considered a 'task' in this
69-
> context and when settings this to a value larger than 1, these will be processed
70-
> in parallel on the computer running the ``esmvaltool`` command.
71+
> context and when settings this to a value larger than 1, these will be
72+
> processed in parallel on the computer running the ``esmvaltool`` command.
7173
>
7274
> With the Dask Distributed scheduler, all the tasks running in parallel
7375
> can use the same workers, but with the default scheduler each task will
74-
> start its own workers. If a recipe does not run with ``max_parallel_tasks`` set
75-
> to a value larger than 1, try reducing the value or setting it to 1. This is
76-
> especially the case for recipes with high resolution data or many datasets
77-
> per variable.
76+
> start its own workers. If a recipe does not run with ``max_parallel_tasks``
77+
> set to a value larger than 1, try reducing the value or setting it to 1.
78+
> This is especially the case for recipes with high resolution data or many
79+
> datasets per variable.
7880
>
7981
{: .callout}
8082

@@ -133,8 +135,8 @@ Open the Dashboard link in a browser to see the Dask Dashboard website.
133135
When the recipe has finished running, the Dashboard website will stop working.
134136
The top left panel shows the memory use of each of the workers, the panel on the
135137
right shows one row for each thread that is doing work, and the panel at the
136-
bottom shows the progress of all work that the scheduler currently has been asked
137-
to do.
138+
bottom shows the progress of all work that the scheduler currently has been
139+
asked to do.
138140
139141
> ## Explore what happens if workers do not have enough memory
140142
>
@@ -156,8 +158,8 @@ to do.
156158
>> orange as the worker reaches the maximum amount of memory it is
157159
>> allowed to use and it starts 'spilling' (writing data temporarily) to disk.
158160
>> The red blocks in the top right panel represent time spent reading/writing
159-
>> to disk. While 2 GiB per worker may be enough in other cases, it is apparently
160-
>> not enough for this recipe.
161+
>> to disk. While 2 GiB per worker may be enough in other cases, it is
162+
>> apparently not enough for this recipe.
161163
>>
162164
> {: .solution}
163165
{: .challenge}
@@ -195,9 +197,12 @@ to do.
195197
## Using an existing Dask Distributed cluster
196198

197199
In some cases, it can be useful to start the Dask Distributed cluster before
198-
running the ``esmvaltool`` command. For example, if you would like to keep the Dashboard available for further investigation after the recipe completes running, or if you are working from a Jupyter notebook environment, see
200+
running the ``esmvaltool`` command. For example, if you would like to keep the
201+
Dashboard available for further investigation after the recipe completes
202+
running, or if you are working from a Jupyter notebook environment, see
199203
[dask-labextension](https://github.com/dask/dask-labextension) and
200-
[dask_jobqueue interactive use](https://jobqueue.dask.org/en/latest/interactive.html)
204+
[dask_jobqueue interactive use](https://jobqueue.dask.org/
205+
en/latest/interactive.html)
201206
for more information.
202207

203208
To use a cluster that was started in some other way, the following configuration
@@ -208,7 +213,10 @@ client:
208213
address: "tcp://127.0.0.1:33041"
209214
```
210215
where the address depends on the Dask cluster. Code to start a
211-
[``distributed.LocalCluster``](https://distributed.dask.org/en/stable/api.html#distributed.LocalCluster) that automatically scales between 0 and 2 workers, depending on demand, could look like this:
216+
[``distributed.LocalCluster``](https://distributed.dask.org/
217+
en/stable/api.html#distributed.LocalCluster)
218+
that automatically scales between 0 and 2 workers, depending on demand, could
219+
look like this:
212220

213221
```python
214222
from time import sleep
@@ -257,8 +265,8 @@ Dashboard remains available after the recipe completes.
257265
>> to the screen, edit the file ``~/.esmvaltool/dask.yml`` so it contains the
258266
lines
259267
>> ```yaml
260-
>> client:
261-
>> address: "tcp://127.0.0.1:34827"
268+
>> client:
269+
>> address: "tcp://127.0.0.1:34827"
262270
>> ```
263271
>> open the link "http://127.0.0.1:8787/status" in your browser and
264272
>> run the recipe again with the command ``esmvaltool run recipe_easy_ipcc_short.yml``.
@@ -273,7 +281,7 @@ compute hours you are not using.
273281

274282
It is recommended to use the Distributed scheduler explained above for
275283
processing larger amounts of data. However, in many cases the default scheduler
276-
is good enough. Note that it does not provide a Dashboard, so it is less
284+
is good enough. Note that it does not provide a Dashboard, so it is less
277285
instructive and that is why we did not use it earlier in this tutorial.
278286

279287
To use the default scheduler, comment out all the contents of
@@ -359,7 +367,7 @@ in order to find the optimal configuration for your situation.
359367
>
360368
>> ## Solution
361369
>>
362-
>> The best configuration depends on the HPC system that you are using.
370+
>> The best configuration depends on the HPC system that you are using.
363371
>> Discuss your answer with the instructor and the class if possible. If you are
364372
>> taking this course by yourself, you can have a look at the [Dask configuration examples in the ESMValCore documentation](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/quickstart/configure.html#dask-distributed-configuration).
365373
>>

0 commit comments

Comments
 (0)