Skip to content

Commit 5d30c21

Browse files
authored
Update README.md
1 parent 3441f44 commit 5d30c21

File tree

1 file changed

+0
-130
lines changed

1 file changed

+0
-130
lines changed

README.md

Lines changed: 0 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -50,133 +50,3 @@ and these elements can be used for the menu options:
5050
- `RiskAversionRecipe`: Add risk aversion certainty equivalent to consumption calculations - Value uncertainty over econometric and climate draws.
5151
- `EquityRecipe`: Add risk aversion and equity to the consumption calculations. Equity includes taking a certainty equivalent over spatial impact regions.
5252

53-
54-
## Requirements
55-
56-
The library runs on Python +3.8 and it expects a that all requirements are
57-
installed previous running any code, check Installation The integration
58-
process is stacking different damage outcomes from several sectors
59-
at the impact region level. Thus, you will need several tricks to deal with
60-
the data I/O.
61-
62-
## Computing
63-
64-
### Computing introduction
65-
66-
One of the tricks we rely on is the extensive use of `Dask` and `xarray` to
67-
read raw damage data in `nc4` or `zarr` format (This latter is how coastal damages are provided).
68-
Hence, you will need to have a `Dask` `distributed.client` to harness the power of distributed computing.
69-
The computing requirements will vary depending on the execution of different
70-
menu options and the number of sectors you are aggregating. These are some general rules about
71-
computational intensity:
72-
73-
1. For recipes, `EquityRecipe > RiskAversionRecipe > AddingUpRecipe`
74-
2. For discounting, `euler_gwr > euler_ramsey > naive_gwr > naive_ramsey > constant > constant_model_collapsed`
75-
3. More options (ie., greater number of SSPs, greater number of sectors) means more computing resources required.
76-
4. `Dask` does not perfectly release memory after each menu run. Thus, if you are running
77-
several menu options, in loops or otherwise, you may need to execute a `client.restart()` partway through
78-
to force `Dask` into emptying memory.
79-
5. Inclusion of coastal increases memory usage exponentially (due to the 500 batches and 10 GMSL bins against which
80-
other sectors' damages must be broadcasted). Be careful and smart when running this option,
81-
and don't be afraid to reconsider chunking for the files being read in.
82-
83-
### Setting up a Dask client
84-
85-
Ensure that the following packages are installed and updated:
86-
[Dask](https://docs.dask.org/en/latest/install.html), [distributed](https://distributed.dask.org/en/latest/install.html), [Jupyter Dask extension](https://github.com/dask/dask-labextension), `dask_jobqueue`.
87-
88-
Ensure that your Jupyter Lab has add-ons enabled so that you can access Dask as an extension.
89-
90-
You have two options for setting up a Dask client.
91-
92-
#### Local client
93-
<details><summary>Click to expand</summary>
94-
If your local node has sufficient memory and computational power, you will only need to create a local Dask client.
95-
96-
_If you are operating on Midway3, you should be able to run the menu in its entirety.
97-
Each `caslake` computing node on Midway3 has 193 GB memory, and 48 CPUs. This is sufficient for all options._
98-
99-
- Open the Dask tab on the left side of your Jupyter Lab page.
100-
- Click `New + ` and wait for a cluster to appear.
101-
- Drag and drop the cluster into your notebook and execute the cell.
102-
- You now have a new Dask client!
103-
- click on the `CPU`, `Worker Memory`, and `Progress` tabs to track progress. You can arrange them in a side bar of your
104-
Jupyter notebook to keep them all visible at the same time.
105-
- note that opening 2 or 3 local Clients does _not_ get you 2 or 3 times the compute space. These clients will be sharing
106-
the same node, so in fact computing may be slower as they are fighting for resources. (_check this, it's a hypothesis_)
107-
![](images/dask_example.png)
108-
</details>
109-
110-
#### Distributed client
111-
<details><summary>Click to expand</summary>
112-
If your local node does not have sufficient computational power, you will need to manually request separate
113-
nodes with `dask.distributed`:
114-
```
115-
cluster = SLURMCluster()
116-
print(cluster.job_script())
117-
cluster.scale(10)
118-
client = Client(cluster)
119-
client
120-
```
121-
You can adjust the number of workers by changing the integer inside `cluster.scale()`. You can adjust the CPUs
122-
and memory per worker inside `~/.config/dask/jobqueue.yaml`.
123-
124-
To track progress of this client, copy-paste the "Dashboard" IP address and SSH into it. Example code:
125-
```
126-
ssh -N -f -L 8787:10.50.250.7:8510 [email protected]
127-
```
128-
Then go to `localhost:8787` in your browser to watch the magic.
129-
</details>
130-
131-
### Dask troubleshooting
132-
133-
Most Dask issues in the menu come from one of two sources:
134-
1. requesting Dask to compute too many tasks (your chunks are too small) which will result in a sort of "hung state"
135-
and empty progress bar.
136-
2. requesting Dask to compute too _large_ tasks (your chunks are too big). In this case, you will see memory under
137-
`Worker Memory` taskbar shoot off the charts. Then your kernel will likely be killed by SLURM.
138-
139-
How can you avoid these situations?
140-
1. Start with `client.restart()`. Sometimes, Dask does not properly release tasks from memory and this plugs up
141-
the client. Doing a fresh restart (and perhaps a fresh restart of your notebook) will fix the problem.
142-
2. Next, check your chunks! Ensure that any `xr.open_dataset()` or `xr.open_mfdataset()` commands have a `chunks`
143-
argument passed. If not, Dask's default is to load the entire file into memory before rechunking later. This
144-
is very bad news for impact-region-level damages, which are 10TB of data.
145-
3. Start executing the menu object by object. Call an object, select a small slice of it, and add `.compute()`. If the object
146-
computes successfully without overloading memory, it's not the memory leak. Keep moving through the menu until you find the
147-
source of the error. _Hot tip: it's usually the initial reading-in of files where nasty things happen._ Check each object in the menu to
148-
ensure three things:
149-
- chunks should be a reasonable size ('reasonable' is relative, but approximately 250-750 MB is typically successful
150-
on a Midway3 `caslake` computing node)
151-
- not too many chunks! Again, this is relative, but more than 10,000 likely means you should reconsider your chunksize.
152-
- not too many tasks per chunk. Again, relative, but more than 300,000 tasks early in the menu is unusual and should be
153-
checked to make sure there aren't any unnecessary rechunking operations being forced upon the menu.
154-
4. Consider rechunking your inputs. If your inputs are chunked in a manner that's orthogonal to your first few operations,
155-
Dask will have a nasty time trying to rechunk all those files before executing things on them. Rechunking and resaving
156-
usually takes a few minutes; rechunking in the middle of an operation can take hours.
157-
5. If this has all been done and you are still getting large memory errors, it's possible that Dask isn't correctly separating
158-
and applying operations to chunks. If this is the case, consider adding a `map_blocks` method, which explicitly
159-
tells Dask to apply the operation to each chunk sequentially.
160-
161-
For more information about how to
162-
execute `Dask` and the `job-queue` library (in case you are in a computing
163-
cluster), refer to [Dask Distributed][3] and [job-queue][4] documentation.
164-
You can check several use-case examples on the computed notebook under examples.
165-
166-
### Priority
167-
168-
Maintaining priority is important when given tight deadlines to run menu options. To learn more about
169-
priority, click [here](https://rcc.uchicago.edu/docs/tutorials/rcc-tips-and-tricks.html#priority
170-
).
171-
172-
In general, following these hygiene rules will keep priority high:
173-
1. Kill all notebooks/clusters when not in use.
174-
2. Only request what you need (in terms of `WALLTIME`, `WORKERS`, and `WORKER MEMORY`).
175-
3. Run things right the first time around. Your notebook text is worth an extra double check :)
176-
177-
[3]: https://distributed.dask.org/en/latest/
178-
[4]: https://jobqueue.dask.org/en/latest/
179-
[5]: https://sylabs.io/guides/3.5/user-guide/quick_start.html
180-
[6]: https://sylabs.io/
181-
[7]: https://pangeo.io/setup_guides/hpc.html
182-
[8]: https://climateimpactlab.gitlab.io/Impacts/integration/

0 commit comments

Comments
 (0)