@@ -50,133 +50,3 @@ and these elements can be used for the menu options:
5050 - ` RiskAversionRecipe ` : Add risk aversion certainty equivalent to consumption calculations - Value uncertainty over econometric and climate draws.
5151 - ` EquityRecipe ` : Add risk aversion and equity to the consumption calculations. Equity includes taking a certainty equivalent over spatial impact regions.
5252
53-
54- ## Requirements
55-
56- The library runs on Python +3.8 and it expects a that all requirements are
57- installed previous running any code, check Installation The integration
58- process is stacking different damage outcomes from several sectors
59- at the impact region level. Thus, you will need several tricks to deal with
60- the data I/O.
61-
62- ## Computing
63-
64- ### Computing introduction
65-
66- One of the tricks we rely on is the extensive use of ` Dask ` and ` xarray ` to
67- read raw damage data in ` nc4 ` or ` zarr ` format (This latter is how coastal damages are provided).
68- Hence, you will need to have a ` Dask ` ` distributed.client ` to harness the power of distributed computing.
69- The computing requirements will vary depending on the execution of different
70- menu options and the number of sectors you are aggregating. These are some general rules about
71- computational intensity:
72-
73- 1 . For recipes, ` EquityRecipe > RiskAversionRecipe > AddingUpRecipe `
74- 2 . For discounting, ` euler_gwr > euler_ramsey > naive_gwr > naive_ramsey > constant > constant_model_collapsed `
75- 3 . More options (ie., greater number of SSPs, greater number of sectors) means more computing resources required.
76- 4 . ` Dask ` does not perfectly release memory after each menu run. Thus, if you are running
77- several menu options, in loops or otherwise, you may need to execute a ` client.restart() ` partway through
78- to force ` Dask ` into emptying memory.
79- 5 . Inclusion of coastal increases memory usage exponentially (due to the 500 batches and 10 GMSL bins against which
80- other sectors' damages must be broadcasted). Be careful and smart when running this option,
81- and don't be afraid to reconsider chunking for the files being read in.
82-
83- ### Setting up a Dask client
84-
85- Ensure that the following packages are installed and updated:
86- [ Dask] ( https://docs.dask.org/en/latest/install.html ) , [ distributed] ( https://distributed.dask.org/en/latest/install.html ) , [ Jupyter Dask extension] ( https://github.com/dask/dask-labextension ) , ` dask_jobqueue ` .
87-
88- Ensure that your Jupyter Lab has add-ons enabled so that you can access Dask as an extension.
89-
90- You have two options for setting up a Dask client.
91-
92- #### Local client
93- <details ><summary >Click to expand</summary >
94- If your local node has sufficient memory and computational power, you will only need to create a local Dask client.
95-
96- _ If you are operating on Midway3, you should be able to run the menu in its entirety.
97- Each ` caslake ` computing node on Midway3 has 193 GB memory, and 48 CPUs. This is sufficient for all options._
98-
99- - Open the Dask tab on the left side of your Jupyter Lab page.
100- - Click ` New + ` and wait for a cluster to appear.
101- - Drag and drop the cluster into your notebook and execute the cell.
102- - You now have a new Dask client!
103- - click on the ` CPU ` , ` Worker Memory ` , and ` Progress ` tabs to track progress. You can arrange them in a side bar of your
104- Jupyter notebook to keep them all visible at the same time.
105- - note that opening 2 or 3 local Clients does _ not_ get you 2 or 3 times the compute space. These clients will be sharing
106- the same node, so in fact computing may be slower as they are fighting for resources. (_ check this, it's a hypothesis_ )
107- ![ ] ( images/dask_example.png )
108- </details >
109-
110- #### Distributed client
111- <details ><summary >Click to expand</summary >
112- If your local node does not have sufficient computational power, you will need to manually request separate
113- nodes with ` dask.distributed ` :
114- ```
115- cluster = SLURMCluster()
116- print(cluster.job_script())
117- cluster.scale(10)
118- client = Client(cluster)
119- client
120- ```
121- You can adjust the number of workers by changing the integer inside ` cluster.scale() ` . You can adjust the CPUs
122- and memory per worker inside ` ~/.config/dask/jobqueue.yaml ` .
123-
124- To track progress of this client, copy-paste the "Dashboard" IP address and SSH into it. Example code:
125- ```
126- ssh -N -f -L 8787:10.50.250.7:8510 [email protected] 127- ```
128- Then go to ` localhost:8787 ` in your browser to watch the magic.
129- </details >
130-
131- ### Dask troubleshooting
132-
133- Most Dask issues in the menu come from one of two sources:
134- 1 . requesting Dask to compute too many tasks (your chunks are too small) which will result in a sort of "hung state"
135- and empty progress bar.
136- 2 . requesting Dask to compute too _ large_ tasks (your chunks are too big). In this case, you will see memory under
137- ` Worker Memory ` taskbar shoot off the charts. Then your kernel will likely be killed by SLURM.
138-
139- How can you avoid these situations?
140- 1 . Start with ` client.restart() ` . Sometimes, Dask does not properly release tasks from memory and this plugs up
141- the client. Doing a fresh restart (and perhaps a fresh restart of your notebook) will fix the problem.
142- 2 . Next, check your chunks! Ensure that any ` xr.open_dataset() ` or ` xr.open_mfdataset() ` commands have a ` chunks `
143- argument passed. If not, Dask's default is to load the entire file into memory before rechunking later. This
144- is very bad news for impact-region-level damages, which are 10TB of data.
145- 3 . Start executing the menu object by object. Call an object, select a small slice of it, and add ` .compute() ` . If the object
146- computes successfully without overloading memory, it's not the memory leak. Keep moving through the menu until you find the
147- source of the error. _ Hot tip: it's usually the initial reading-in of files where nasty things happen._ Check each object in the menu to
148- ensure three things:
149- - chunks should be a reasonable size ('reasonable' is relative, but approximately 250-750 MB is typically successful
150- on a Midway3 ` caslake ` computing node)
151- - not too many chunks! Again, this is relative, but more than 10,000 likely means you should reconsider your chunksize.
152- - not too many tasks per chunk. Again, relative, but more than 300,000 tasks early in the menu is unusual and should be
153- checked to make sure there aren't any unnecessary rechunking operations being forced upon the menu.
154- 4 . Consider rechunking your inputs. If your inputs are chunked in a manner that's orthogonal to your first few operations,
155- Dask will have a nasty time trying to rechunk all those files before executing things on them. Rechunking and resaving
156- usually takes a few minutes; rechunking in the middle of an operation can take hours.
157- 5 . If this has all been done and you are still getting large memory errors, it's possible that Dask isn't correctly separating
158- and applying operations to chunks. If this is the case, consider adding a ` map_blocks ` method, which explicitly
159- tells Dask to apply the operation to each chunk sequentially.
160-
161- For more information about how to
162- execute ` Dask ` and the ` job-queue ` library (in case you are in a computing
163- cluster), refer to [ Dask Distributed] [ 3 ] and [ job-queue] [ 4 ] documentation.
164- You can check several use-case examples on the computed notebook under examples.
165-
166- ### Priority
167-
168- Maintaining priority is important when given tight deadlines to run menu options. To learn more about
169- priority, click [ here] (https://rcc.uchicago.edu/docs/tutorials/rcc-tips-and-tricks.html#priority
170- ).
171-
172- In general, following these hygiene rules will keep priority high:
173- 1 . Kill all notebooks/clusters when not in use.
174- 2 . Only request what you need (in terms of ` WALLTIME ` , ` WORKERS ` , and ` WORKER MEMORY ` ).
175- 3 . Run things right the first time around. Your notebook text is worth an extra double check :)
176-
177- [ 3 ] : https://distributed.dask.org/en/latest/
178- [ 4 ] : https://jobqueue.dask.org/en/latest/
179- [ 5 ] : https://sylabs.io/guides/3.5/user-guide/quick_start.html
180- [ 6 ] : https://sylabs.io/
181- [ 7 ] : https://pangeo.io/setup_guides/hpc.html
182- [ 8 ] : https://climateimpactlab.gitlab.io/Impacts/integration/
0 commit comments