Skip to content

Commit 3407b91

Browse files
authored
Merge pull request #31 from microsoft/download_cmip6
Download and preprocess CMIP6 data
2 parents 5533b8c + 3f97ae7 commit 3407b91

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1049
-2
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,6 @@ dmypy.json
133133

134134
# experiments
135135
exps
136+
137+
# snakemake logs
138+
.snakemake

docs/usage.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,21 @@
44

55
### Data Preparation
66

7-
The code for downloading and preprocessing CMIP6 data is coming soon
7+
First install `snakemake` following [these instructions](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html)
8+
9+
To download and regrid a CMIP6 dataset to a common resolution (e.g., 1.406525 degree), go to the corresponding directory inside `snakemake_configs` and run
10+
```bash
11+
snakemake all --configfile config_2m_temperature.yml --cores 8
12+
```
13+
This script will download and regrid the `2m_temperature` data in parallel using 8 CPU cores. Modify `configfile` for other variables. After downloading and regrdding, run the following script to preprocess the `.nc` files into `.npz` format for pretraining ClimaX
14+
```bash
15+
python src/data_preprocessing/nc2np_equally_cmip6.py \
16+
--dataset mpi
17+
--path /data/CMIP6/MPI-ESM/1.40625deg/
18+
--num_shards 10
19+
--save_dir /data/CMIP6/MPI-ESM/1.40625deg_np_10shards
20+
```
21+
in which `num_shards` denotes the number of chunks to break each `.nc` file into.
822

923
### Training
1024

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
2+
year_strings = [f'{y}01010600-{y+1}01010000' for y in range(1850, 2015, 1)]
3+
4+
print(config)
5+
6+
rule download:
7+
output:
8+
"{dataset}/raw/{name}/{name}_{year_str}_raw.nc",
9+
shell:
10+
"wget https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/historical/{config["
11+
"run]}/6hrPlevPt/"
12+
"{config[cmip_name]}/gn/v20200212/"
13+
"{config[cmip_name]}_6hrPlevPt_AWI-ESM-1-1-LR_historical_{config[run]}_gn_{wildcards.year_str}.nc "
14+
"-O {wildcards.dataset}/raw/{config[name]}/{config[name]}_{wildcards.year_str}_raw.nc"
15+
16+
rule regrid:
17+
input:
18+
"{dataset}/raw/{name}/{name}_{year_str}_raw.nc"
19+
output:
20+
"{dataset}/{res}deg/{name}/{name}_{year_str}_{res}deg.nc.tmp"
21+
shell:
22+
"python ../../src/data_preprocessing/regrid.py \
23+
--input_fns {input} \
24+
--output_dir {wildcards.dataset}/{wildcards.res}deg/{wildcards.name} \
25+
--ddeg_out {wildcards.res} \
26+
--cmip 1 \
27+
--rename {config[cmip_name]} {config[era_name]} \
28+
--file_ending nc.tmp"
29+
30+
rule delete:
31+
input:
32+
expand("{{dataset}}/{res}deg/{{name}}/{{name}}_{{year_str}}_{res}deg.nc.tmp",
33+
res=config['res']),
34+
output:
35+
expand("{{dataset}}/{res}deg/{{name}}/{{name}}_{{year_str}}_{res}deg.nc",
36+
res=config['res'])
37+
priority: 100
38+
run:
39+
for i, o in zip(input, output):
40+
shell("mv {i} {o}")
41+
# shell("rm {wildcards.dataset}/raw/{wildcards.name}/{wildcards.name}_{wildcards.year_str}_raw.nc"),
42+
43+
44+
rule all:
45+
input:
46+
expand("{datadir}/{res}deg/{name}/{name}_{year_str}_{res}deg.nc",
47+
datadir=config['datadir'], res=config['res'], name=config['name'], year_str=year_strings)
48+
49+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
datadir: /data/CMIP6/AWI-ESM
2+
name: 10m_u_component_of_wind
3+
cmip_name: uas
4+
era_name: u10
5+
run: r1i1p1f1
6+
res:
7+
- 1.40625
8+
# - 5.625
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
datadir: /data/CMIP6/AWI-ESM
2+
name: 10m_v_component_of_wind
3+
cmip_name: vas
4+
era_name: v10
5+
run: r1i1p1f1
6+
res:
7+
- 1.40625
8+
# - 5.625
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
datadir: /data/CMIP6/AWI-ESM
2+
name: 2m_temperature
3+
cmip_name: tas
4+
era_name: t2m
5+
run: r1i1p1f1
6+
res:
7+
- 1.40625
8+
# - 5.625
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
datadir: /data/CMIP6/AWI-ESM
2+
name: geopotential
3+
cmip_name: zg
4+
era_name: z
5+
run: r1i1p1f1
6+
res:
7+
- 1.40625
8+
# - 5.625
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
datadir: /data/CMIP6/AWI-ESM
2+
name: specific_humidity
3+
cmip_name: hus
4+
era_name: q
5+
run: r1i1p1f1
6+
res:
7+
- 1.40625
8+
# - 5.625
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
datadir: /data/CMIP6/AWI-ESM
2+
name: temperature
3+
cmip_name: ta
4+
era_name: t
5+
run: r1i1p1f1
6+
res:
7+
- 1.40625
8+
# - 5.625
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
datadir: /data/CMIP6/AWI-ESM
2+
name: u_component_of_wind
3+
cmip_name: ua
4+
era_name: u
5+
run: r1i1p1f1
6+
res:
7+
- 1.40625
8+
# - 5.625

0 commit comments

Comments
 (0)