You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/11-dask-configuration.md
+11-5Lines changed: 11 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -336,7 +336,8 @@ cluster:
336
336
n_workers: 64 # total number of workers to start
337
337
```
338
338
339
-
In this example we use the popular SLURM scheduduler, but other schedulers are also supported, see [this list](https://jobqueue.dask.org/en/latest/api.html).
339
+
In this example we use the popular SLURM scheduduler, but other schedulers are
340
+
also supported, see [this list](https://jobqueue.dask.org/en/latest/api.html).
340
341
341
342
In the above example, ESMValCore will start 64 Dask workers
342
343
(with 128 / 64 = 2 threads each) and for that it will need to launch a single
@@ -345,16 +346,21 @@ e.g. 256, it would launch 4 SLURM batch jobs which would each start 64 workers
345
346
for a total of 4 x 64 = 256 workers. In the above configuration, each worker is
346
347
allowed to use 240 GiB per job / 64 workers per job = ~4 GiB per worker.
347
348
348
-
It is important to read the documentation about your HPC system and answer questions such as
349
+
It is important to read the documentation about your HPC system and answer
350
+
questions such as:
349
351
- Which batch scheduler does my HPC system use?
350
352
- How many CPU cores are available per node (a computer in an HPC system)?
351
353
- How much memory is available for use per node?
352
354
- What is the fastest network interface (infiniband is much faster than ethernet)?
353
-
- What path should I use for storing temporary files on the nodes (try to avoid slower network storage if possible)?
355
+
- What path should I use for storing temporary files on the nodes (try to
356
+
avoid slower network storage if possible)?
354
357
- Which computing queue has the best availability?
355
358
- Can I use part of a node or do I need to use the full node?
356
-
- If you are always charged for using the full node, asking for only part of a node is wasteful of computational resources.
357
-
- If you can ask for part of a node, make sure the amount of memory you request matches the number of CPU cores if possible, or you will be charged for a larger fraction of the node.
359
+
- If you are always charged for using the full node, asking for only part of
360
+
a node is wasteful of computational resources.
361
+
- If you can ask for part of a node, make sure the amount of memory you
362
+
request matches the number of CPU cores if possible, or you will be charged
363
+
for a larger fraction of the node.
358
364
359
365
in order to find the optimal configuration for your situation.
0 commit comments