Skip to content
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,23 +164,27 @@ The created surface and domain file have negative longitudes that CLM5 does not

## Creation of forcing data from ERA5

A possible source of atmospheric forcing for CLM5 is ERA5.
The folder `mkforcing/` contains two scripts that assist the ERA5 retrieval.
- `download_ERA5.py` contains a prepared retrieval for the cdsapi python module.
By modifying the two loops inside the script it is possible to download ERA5 for any timerange.
However, the script requires that cdsapi is installed with an user specific key.
More information about the installation can be found [here](https://cds.climate.copernicus.eu/api-how-to).
- `prepare_ERA5.sh` prepares ERA5 as an input by changing names and modifying units.
ERA5 has to be regridded to your resolution before the script can be used.
A possible source of atmospheric forcing for CLM (eCLM, CLM5, CLM3.5) is ERA5. It is safer to extract the lowermost level of temperature, humidity and wind of ERA5 instead of taking mixed 2m-values and 10m values. [This internal issue](https://gitlab.jsc.fz-juelich.de/HPSCTerrSys/tsmp-internal-development-tracking/-/issues/36) provides some details. The `download_ERA5_input.py` can be adapted to download another set of quantities.

`download_ERA5_v2.py`, `prepare_ERA5_v2.sh` and `extract_ERA5_meteocloud.sh` provide an alternative pathway. [This issue](https://gitlab.jsc.fz-juelich.de/HPSCTerrSys/tsmp-internal-development-tracking/-/issues/36) provides some details. Basically it is safer to extract the lowermost level of temperature, humidity and wind of ERA5 instead of taking 2m-values. The workflow goes like this:
The folder `mkforcing/` contains three scripts that assist the ERA5 retrieval.

- `download_ERA5_input.py` contains a prepared retrieval for the cdsapi python module.
The script requires that cdsapi is installed with a user specific key (API access token).
More information about the installation and registration can be found [here](https://cds.climate.copernicus.eu/how-to-api) and where to put your access token [here](https://github.com/ecmwf/cdsapi?tab=readme-ov-file#install).
Usage:
`python download_ERA5_input.py <year> <month> <output_directory>`
Non-JSC users should adapt the download script to include temperature, specific humidity and horizontal wind speed.
- `extract_ERA5_meteocloud.sh` prepares ERA5 as an input by changing names and modifying units (JSC users only).
- `prepare_ERA5_input.sh` prepares ERA5 as an input by remapping the ERA5 data, changing names and modifying units. The script is divided into three parts, which could be handled separately. Remapping, merging the data, and special treatment in case CLM3.5 forces data preparation. If remapping is to be used, the remapping weights for the ERA data as well as the grid definition file of the target domain should be created beforehand. The following commands can be used to create the necessary files:
```
bash extract_ERA5_meteocloud.sh
python download_ERA5_v2.py
regridding
bash prepare_ERA5_v2.sh
cdo gendis,<eclm_domainfile.nc> <era5caf_yyyy_mm.nc> <wgtdis_era5caf_to_domain.nc>
cdo gendis,<eclm_domainfile.nc> <era5meteo_yyyy_mm.nc> <wgtdis_era5meteo_to_domain.nc>
cdo griddes,<eclm_domainfile.nc> > <domain_griddef.txt>
```
Usage:
`sh prepare_ERA5_input.sh iyear=<year> imonth=<month> wgtcaf=<wgtcaf> wgtmeteo=<wgtmeteo> griddesfile=<griddesfile>`
More options are available, see script for details.


Note: This worfklow is not fully tested.

51 changes: 0 additions & 51 deletions mkforcing/download_ERA5.py

This file was deleted.

80 changes: 80 additions & 0 deletions mkforcing/download_ERA5_input.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/usr/bin/env python3
import calendar
import cdsapi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include a pip install cdsapi in the README?
This would be the first time that this is needed (the rest we relied on modules).

edit: ah, except for the NCL script they need to evben do a conda install ncl!
To some extend, mess must maybe be accepted…

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed; it is written in the linked documentation.

import sys
import os

def generate_days(year, month):
# Get the number of days in the given month
num_days = calendar.monthrange(year, month)[1]

# Generate the list of days as integers
days = [day for day in range(1, num_days + 1)]

return days

def generate_datarequest(year, monthstr, days):

# active download client for climate data service (cds)
client = cdsapi.Client()

# dataset to download rom cds
dataset = "reanalysis-era5-single-levels"
# request for cds
request = {
"product_type": ["reanalysis"],
"variable": [
"surface_pressure",
"mean_surface_downward_long_wave_radiation_flux",
"mean_surface_downward_short_wave_radiation_flux",
"mean_total_precipitation_rate"
],
"year": [str(year)],
"month": [monthstr],
"day": days,
"time": [
"00:00", "01:00", "02:00",
"03:00", "04:00", "05:00",
"06:00", "07:00", "08:00",
"09:00", "10:00", "11:00",
"12:00", "13:00", "14:00",
"15:00", "16:00", "17:00",
"18:00", "19:00", "20:00",
"21:00", "22:00", "23:00"
],
"data_format": "netcdf",
"download_format": "unarchived",
"area": [74, -42, 20, 69]
}
# filename of downloaded file
target = 'download_era5_'+str(year)+'_'+monthstr+'.zip'

# Get the data from cds
client.retrieve(dataset, request, target)

if __name__ == "__main__":
# Check if the correct number of arguments are provided
if len(sys.argv) != 4:
print("Usage: python download_ERA5_input.py <year> <month> <output_directory>")
sys.exit(1)

# Get the year and month from command-line arguments
year = int(sys.argv[1])
month = int(sys.argv[2])
dirout = sys.argv[3]

# Ensure the output directory exists, if not, create it
if not os.path.exists(dirout):
os.makedirs(dirout)

# change to output directory
os.chdir(dirout)

# Format the month with a leading zero if needed
monthstr = f"{month:02d}"

# Get the list of days for the request
days = generate_days(year, month)

# do download request
generate_datarequest(year, monthstr, days)
50 changes: 50 additions & 0 deletions mkforcing/download_ERA5_input_wrapper.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date command parameters are not standard and fail on BSD (and probably macOSX).

Is a wrapper (in Bash) needed? Why not pass the user arguments to download_ERA5_input.py?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the linked standard is maybe overly restricted. At least on OpenBSD there is no -I or -d.

Also breaks on OSX: no -I and -d does something else.

I don't know how to make it platform indenpendent in (Bourne) shell. Python would be nice (and better to use python3 and rely on shebang).

Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env bash
# Before using this script CDSAPI has to be configured (see README)
# Needs to be executed at LOGIN node as connection to "outside" is required
set -eo pipefail

# load environment
module load Python

# Settings
start_date="2017-07" # yyyy-mm
end_date="2018-08" # yyyy-mm + 1
out_dir="cdsapidwn"

# Function to parse input
parse_arguments() {
for arg in "$@"; do
key="${arg%%=*}"
value="${arg#*=}"

case "$key" in
start_date) start_date="$value" ;;
end_date) end_date="$value" ;;
out_dir) out_dir="$value" ;;
*) echo "Warning: Unknown parameter: $key" ;;
esac
done
}

# Call the function to parse the input arguments
# Users needs to make sure for consistent input
parse_arguments "$@"


# create output directory
mkdir -p $out_dir

# loop over months
current_date=$start_date
while [[ "$current_date" < "$end_date" ]]; do
echo "Processing month: $current_date"

year="${current_date%%-*}"
month="${current_date#*-}"

# start download script with data request
./download_ERA5_input.py $year $month $out_dir

# Increment the month
current_date=$(date -I -d "$current_date-01 + 1 month" | cut -d'-' -f1,2)
done
50 changes: 0 additions & 50 deletions mkforcing/download_ERA5_v2.py

This file was deleted.

100 changes: 84 additions & 16 deletions mkforcing/extract_ERA5_meteocloud.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get some warnings:

Process 2017-07-31-23 prun: 744
cdi  warning (gribapiScanTimestep1): Record 1509 (name=z id=4.3.0 lev1=1 lev2=0) timestep 1: Inconsistent verification time!
...

Copy link
Collaborator

@mvhulten mvhulten Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use here years instead of the faulty iyear (to be fixed by @s-poll), so it's consistent with near-future changeset.

  • ./extract_ERA5_meteocloud.sh years=2017 months=07 works
  • ./extract_ERA5_meteocloud.sh years=2017 months=7` fails
  • ./extract_ERA5_meteocloud.sh iyear=2017 imonth=7 ihour=(00 01) fails
  • ./extract_ERA5_meteocloud.sh iyear=2017 imonth=7 ihour=(00 01) fails
  • There is no days, but maybe this was intentional
  • Check for meteocloud_${year}_${month}.{grb,nc} at start of script is nice to have (to catch unnecessary recomputation) but optional

Original file line number Diff line number Diff line change
@@ -1,37 +1,105 @@
#!/usr/bin/env bash
set -eo pipefail

# load env -> not all CDO are compiled with "-t ecmwf"
# module use $OTHERSTAGES
# ml Stages/2022 NVHPC/22.9 ParaStationMPI/5.5.0-1 CDO/2.0.2

if [ -z "$1" ]
then
iyear=2017
echo "Take the default year "$iyear
else
iyear=$1
echo "Calculate the year "$iyear
fi
function message(){
if [ -z "${quiet}" ];then
echo "$1"
fi # quiet
}

# default values of parameters
iyear=2017
imonth=07
ihour=(00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23)
outdir=${iyear}-${imonth}
runpp=1
area=(-48 74 20 74)

# Function to parse input
parse_arguments() {
for arg in "$@"; do
key="${arg%%=*}"
value="${arg#*=}"

case "$key" in
quiet) quiet=y;;
iyear) iyear="$value" ;;
imonth) imonth="$value" ;;
ihour) ihour="$value" ;;
outdir) outdir="$value" ;;
runpp) runpp="$value" ;;
area) area="$value" ;;
*) echo "Warning: Unknown parameter: $key" ;;
esac
done
}

# Call the function to parse the input arguments
# Users needs to make sure for consistent input
parse_arguments "$@"

message "=========================="
message "Year: "$iyear
message "Month: "$imonth
message "Hours: "$ihour
message "Selected area W: "${area[0]}
message "Selected area E: "${area[1]}
message "Selected area S: "${area[2]}
message "Selected area N: "${area[3]}
message "Output directory: "$outdir
message "Max running procs: "$runpp
message "=========================="

cd ${outdir}

# start a counter for background jobs
running_jobs=0

for year in ${iyear}
do
for month in 01 #02 03 04 05 06 07 08 09 10 11 12
for month in ${imonth}
do
days_per_month=$(cal ${month} ${year} | awk 'NF {DAYS = $NF}; END {print DAYS}')
for my_date in $(seq -w 1 ${days_per_month})
for day in $(seq -w 1 ${days_per_month})
do
for time in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
for hour in "${ihour[@]}"
do

cdo sellonlatbox,-48,74,20,74 /p/fastdata/slmet/slmet111/met_data/ecmwf/era5/grib/${year}/${month}/${year}${month}${my_date}${time}_ml.grb cut_domain_${year}${month}${my_date}${time}.grb
# increment the running job counter
running_jobs=$((running_jobs+1))

message "Process "$year"-"$month"-"$day"-"$hour" prun: "$running_jobs

cdo sellevel,137 cut_domain_${year}${month}${my_date}${time}.grb lower_level_${year}${month}${my_date}${time}.grb
#cdo -t ecmwf -f nc4 copy lower_level_${month}${my_date}${time}.grb lower_level_${month}${my_date}${time}.nc
cdo -t ecmwf selname,t,u,v,q lower_level_${year}${month}${my_date}${time}.grb variables_lower_level_${year}${month}${my_date}${time}.grb
# select domain area
cdo sellonlatbox,${area[0]},${area[1]},${area[2]},${area[3]} /p/data1/slmet/met_data/ecmwf/era5/grib/${year}/${month}/${year}${month}${day}${hour}_ml.grb cut_domain_${year}${month}${day}${hour}.grb
# select lowermost model level
cdo sellevel,137 cut_domain_${year}${month}${day}${hour}.grb lower_level_${year}${month}${day}${hour}.grb
# select temperature, horizontal wind speed, humidity
cdo -t ecmwf selname,t,u,v,q lower_level_${year}${month}${day}${hour}.grb variables_lower_level_${year}${month}${day}${hour}.grb

# if the max number of parallel tasks is reached, wait for a job to finish
if [[ ${running_jobs} -ge ${runpp} ]]; then
wait -n # wait for one job to finish before starting another
running_jobs=$((running_jobs-1)) # decrement the running job counter
fi

done
done

wait

# merge hourly files to monthly
cdo merge variables_lower_level_${year}*.grb meteocloud_${year}_${month}.grb
rm variables_lower_level_${year}*.grb cut_domain_${year}* lower_level_${year}*
# transform from grib to netcdf format
cdo -t ecmwf -f nc4 copy meteocloud_${year}_${month}.grb meteocloud_${year}_${month}.nc

# clean-up
rm variables_lower_level_${year}*.grb cut_domain_${year}* lower_level_${year}*

done
done

Loading