|
1 | | -# dscim |
2 | | -Data-Driven Spatial Climate Impact Model core component code |
| 1 | +[](https://github.com/psf/black) |
| 2 | + |
| 3 | +# DSCIM: The Data-driven Spatial Climate Impact Model |
| 4 | + |
| 5 | +This Python library enables the calculation of a sector integrated social cost of carbon |
| 6 | +(SCC) using a variety of valuation methods and assumptions. The main purpose of this |
| 7 | +library is to parse the monetized spatial damages from different sectors and integrate them |
| 8 | +using different options (or menu options) that encompass different decisions, such as |
| 9 | +discount levels, discount strategies, and different considerations related to |
| 10 | +economic and climate uncertainty. |
| 11 | + |
| 12 | +## Structure and logic |
| 13 | + |
| 14 | +The library is split into several components that implement the hierarchy |
| 15 | +defined by the menu options. These are the main elements of the library and |
| 16 | +serve as the main classes to call different menu options. |
| 17 | + |
| 18 | +```mermaid |
| 19 | +graph TD |
| 20 | +
|
| 21 | +SubGraph1Flow(Storage and I/O) |
| 22 | + subgraph "Storage utilities" |
| 23 | + SubGraph1Flow --> A[Stacked_damages] |
| 24 | + SubGraph1Flow -- Climate Data --> Climate |
| 25 | + SubGraph1Flow -- Economic Data --> EconData |
| 26 | + end |
| 27 | +
|
| 28 | + subgraph "Recipe Book" |
| 29 | + A[StackedDamages] --> B[MainMenu] |
| 30 | + B[MainMenu] --> C[AddingUpRecipe]; |
| 31 | + B[MainMenu] --> D[RiskAversionRecipe]; |
| 32 | + B[MainMenu] --> E[EquityRecipe] |
| 33 | +end |
| 34 | +``` |
| 35 | + |
| 36 | +`StackedDamages` takes care of parsing all monetized damage data from several |
| 37 | +sectors and read the data using a `dask.distributed.Client`. At the same time, |
| 38 | +this class takes care of ingesting FaIR GMST and GMSL data needed to draw damage |
| 39 | +functions and calculate FaIR marginal damages to an additional emission of |
| 40 | +carbon. The data can be read using the following components: |
| 41 | + |
| 42 | +Class | Function | |
| 43 | +|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 44 | +| `Climate` | Wrapper class to read all things climate, including GMST and GMSL. You can pass a `fair_path` with a NetCDF with FaIR control and pulse simulations and median FaIR runs. You can use `gmst_path` to input a CSV file with model and year anomaly data, for fitting the damage functions. | |
| 45 | +| `EconVars` | Class to ingest sector path related data, this includes GDP and population data. Some intermediate variables are also included in this class, check the documentation for more details | |
| 46 | +| `StackedDamages` | Damages wrapper class. This class contains all the elements above and additionally reads all the computed monetized damages. A single path is needed to read all damages, and sectors must be separated by folders. If necessary, the class will save data in `.zarr` format to make chunking operations more efficient. Check documentation of the class for more details. | |
| 47 | + |
| 48 | + |
| 49 | +and these elements can be used for the menu options: |
| 50 | + - `AddingUpRecipe`: Adding up all damages and collapse them to calculate a general SCC without valuing uncertainty. |
| 51 | + - `RiskAversionRecipe`: Add risk aversion certainty equivalent to consumption calculations - Value uncertainty over econometric and climate draws. |
| 52 | + - `EquityRecipe`: Add risk aversion and equity to the consumption calculations. Equity includes taking a certainty equivalent over spatial impact regions. |
| 53 | + |
| 54 | + |
| 55 | +## Requirements |
| 56 | + |
| 57 | +The library runs on Python +3.8 and it expects a that all requirements are |
| 58 | +installed previous running any code, check Installation The integration |
| 59 | +process is stacking different damage outcomes from several sectors |
| 60 | +at the impact region level. Thus, you will need several tricks to deal with |
| 61 | +the data I/O. |
| 62 | + |
| 63 | +## Computing |
| 64 | + |
| 65 | +### Computing introduction |
| 66 | + |
| 67 | +One of the tricks we rely on is the extensive use of `Dask` and `xarray` to |
| 68 | +read raw damage data in `nc4` or `zarr` format (This latter is how coastal damages are provided). |
| 69 | +Hence, you will need to have a `Dask` `distributed.client` to harness the power of distributed computing. |
| 70 | +The computing requirements will vary depending on the execution of different |
| 71 | +menu options and the number of sectors you are aggregating. These are some general rules about |
| 72 | +computational intensity: |
| 73 | + |
| 74 | +1. For recipes, `EquityRecipe > RiskAversionRecipe > AddingUpRecipe` |
| 75 | +2. For discounting, `euler_gwr > euler_ramsey > naive_gwr > naive_ramsey > constant > constant_model_collapsed` |
| 76 | +3. More options (ie., greater number of SSPs, greater number of sectors) means more computing resources required. |
| 77 | +4. `Dask` does not perfectly release memory after each menu run. Thus, if you are running |
| 78 | +several menu options, in loops or otherwise, you may need to execute a `client.restart()` partway through |
| 79 | +to force `Dask` into emptying memory. |
| 80 | +5. Inclusion of coastal increases memory usage exponentially (due to the 500 batches and 10 GMSL bins against which |
| 81 | +other sectors' damages must be broadcasted). Be careful and smart when running this option, |
| 82 | +and don't be afraid to reconsider chunking for the files being read in. |
| 83 | + |
| 84 | +### Setting up a Dask client |
| 85 | + |
| 86 | +Ensure that the following packages are installed and updated: |
| 87 | +[Dask](https://docs.dask.org/en/latest/install.html), [distributed](https://distributed.dask.org/en/latest/install.html), [Jupyter Dask extension](https://github.com/dask/dask-labextension), `dask_jobqueue`. |
| 88 | + |
| 89 | +Ensure that your Jupyter Lab has add-ons enabled so that you can access Dask as an extension. |
| 90 | + |
| 91 | +You have two options for setting up a Dask client. |
| 92 | + |
| 93 | +#### Local client |
| 94 | +<details><summary>Click to expand</summary> |
| 95 | +If your local node has sufficient memory and computational power, you will only need to create a local Dask client. |
| 96 | + |
| 97 | +_If you are operating on Midway3, you should be able to run the menu in its entirety. |
| 98 | +Each `caslake` computing node on Midway3 has 193 GB memory, and 48 CPUs. This is sufficient for all options._ |
| 99 | + |
| 100 | +- Open the Dask tab on the left side of your Jupyter Lab page. |
| 101 | +- Click `New + ` and wait for a cluster to appear. |
| 102 | +- Drag and drop the cluster into your notebook and execute the cell. |
| 103 | +- You now have a new Dask client! |
| 104 | +- click on the `CPU`, `Worker Memory`, and `Progress` tabs to track progress. You can arrange them in a side bar of your |
| 105 | +Jupyter notebook to keep them all visible at the same time. |
| 106 | +- note that opening 2 or 3 local Clients does _not_ get you 2 or 3 times the compute space. These clients will be sharing |
| 107 | +the same node, so in fact computing may be slower as they are fighting for resources. (_check this, it's a hypothesis_) |
| 108 | + |
| 109 | +</details> |
| 110 | + |
| 111 | +#### Distributed client |
| 112 | +<details><summary>Click to expand</summary> |
| 113 | +If your local node does not have sufficient computational power, you will need to manually request separate |
| 114 | +nodes with `dask.distributed`: |
| 115 | +``` |
| 116 | +cluster = SLURMCluster() |
| 117 | +print(cluster.job_script()) |
| 118 | +cluster.scale(10) |
| 119 | +client = Client(cluster) |
| 120 | +client |
| 121 | +``` |
| 122 | +You can adjust the number of workers by changing the integer inside `cluster.scale()`. You can adjust the CPUs |
| 123 | +and memory per worker inside `~/.config/dask/jobqueue.yaml`. |
| 124 | + |
| 125 | +To track progress of this client, copy-paste the "Dashboard" IP address and SSH into it. Example code: |
| 126 | +``` |
| 127 | +ssh -N -f -L 8787:10.50.250.7:8510 [email protected] |
| 128 | +``` |
| 129 | +Then go to `localhost:8787` in your browser to watch the magic. |
| 130 | +</details> |
| 131 | + |
| 132 | +### Dask troubleshooting |
| 133 | + |
| 134 | +Most Dask issues in the menu come from one of two sources: |
| 135 | +1. requesting Dask to compute too many tasks (your chunks are too small) which will result in a sort of "hung state" |
| 136 | +and empty progress bar. |
| 137 | +2. requesting Dask to compute too _large_ tasks (your chunks are too big). In this case, you will see memory under |
| 138 | +`Worker Memory` taskbar shoot off the charts. Then your kernel will likely be killed by SLURM. |
| 139 | + |
| 140 | +How can you avoid these situations? |
| 141 | +1. Start with `client.restart()`. Sometimes, Dask does not properly release tasks from memory and this plugs up |
| 142 | +the client. Doing a fresh restart (and perhaps a fresh restart of your notebook) will fix the problem. |
| 143 | +2. Next, check your chunks! Ensure that any `xr.open_dataset()` or `xr.open_mfdataset()` commands have a `chunks` |
| 144 | +argument passed. If not, Dask's default is to load the entire file into memory before rechunking later. This |
| 145 | +is very bad news for impact-region-level damages, which are 10TB of data. |
| 146 | +3. Start executing the menu object by object. Call an object, select a small slice of it, and add `.compute()`. If the object |
| 147 | +computes successfully without overloading memory, it's not the memory leak. Keep moving through the menu until you find the |
| 148 | +source of the error. _Hot tip: it's usually the initial reading-in of files where nasty things happen._ Check each object in the menu to |
| 149 | +ensure three things: |
| 150 | +- chunks should be a reasonable size ('reasonable' is relative, but approximately 250-750 MB is typically successful |
| 151 | +on a Midway3 `caslake` computing node) |
| 152 | +- not too many chunks! Again, this is relative, but more than 10,000 likely means you should reconsider your chunksize. |
| 153 | +- not too many tasks per chunk. Again, relative, but more than 300,000 tasks early in the menu is unusual and should be |
| 154 | +checked to make sure there aren't any unnecessary rechunking operations being forced upon the menu. |
| 155 | +4. Consider rechunking your inputs. If your inputs are chunked in a manner that's orthogonal to your first few operations, |
| 156 | +Dask will have a nasty time trying to rechunk all those files before executing things on them. Rechunking and resaving |
| 157 | +usually takes a few minutes; rechunking in the middle of an operation can take hours. |
| 158 | +5. If this has all been done and you are still getting large memory errors, it's possible that Dask isn't correctly separating |
| 159 | +and applying operations to chunks. If this is the case, consider adding a `map_blocks` method, which explicitly |
| 160 | +tells Dask to apply the operation to each chunk sequentially. |
| 161 | + |
| 162 | +For more information about how to |
| 163 | +execute `Dask` and the `job-queue` library (in case you are in a computing |
| 164 | +cluster), refer to [Dask Distributed][3] and [job-queue][4] documentation. |
| 165 | +You can check several use-case examples on the computed notebook under examples. |
| 166 | + |
| 167 | +### Priority |
| 168 | + |
| 169 | +Maintaining priority is important when given tight deadlines to run menu options. To learn more about |
| 170 | +priority, click [here](https://rcc.uchicago.edu/docs/tutorials/rcc-tips-and-tricks.html#priority |
| 171 | +). |
| 172 | + |
| 173 | +In general, following these hygiene rules will keep priority high: |
| 174 | +1. Kill all notebooks/clusters when not in use. |
| 175 | +2. Only request what you need (in terms of `WALLTIME`, `WORKERS`, and `WORKER MEMORY`). |
| 176 | +3. Run things right the first time around. Your notebook text is worth an extra double check :) |
| 177 | + |
| 178 | +[3]: https://distributed.dask.org/en/latest/ |
| 179 | +[4]: https://jobqueue.dask.org/en/latest/ |
| 180 | +[5]: https://sylabs.io/guides/3.5/user-guide/quick_start.html |
| 181 | +[6]: https://sylabs.io/ |
| 182 | +[7]: https://pangeo.io/setup_guides/hpc.html |
| 183 | +[8]: https://climateimpactlab.gitlab.io/Impacts/integration/ |
0 commit comments