-
Notifications
You must be signed in to change notification settings - Fork 4
Description
- xclim version:
- Python version:
- Operating System:
Description
I am trying to use xclim to bias correct precipitation from ECCC's GEPS product (ensemble NWP) and correcting it against CaPA data. The NWP model has 20 ensemble members and 16 lead times in the forecast. The data is applied to N different watersheds (each with a single timeseries and not gridded data). Through some research it was determined that each GEPS lead time has a slightly different bias. The EQM method was selected and parameters for EQM are pre-computed. The current code takes quite a bit of time to run and I would like to know if there are ways to speed it up. Particularly around the nested for loops. Is there a built in xarray function or xclim function?
What I Did
Assume we start with a bit of house keeping, imports, identifying folders, etc.
def process_each_lead_time(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_geps,realizations,stations):
# define the time slot for this leadtime
ltime_start = ini_time + np.timedelta64(leadtime_length*i_ltime, 'h') + np.timedelta64(1,'h')
ltime_end = ini_time + np.timedelta64(leadtime_length*(i_ltime+1), 'h')
#select geps dataset within the time slot
geps_fcst_ltime = geps_fcst.sel(time=slice(ltime_start, ltime_end)).copy(deep=True)
# read eqm parameters for this lead time
eqm_pars = xr.open_dataset(os.path.join(path_eqm_par_folder,"Leadtime_"+str(i_ltime + 1) + "_EQM.nc"))
eqm_pars.close()
# correct geps for each station and realization
for i_stations in range(0,len(stations)):
i_station = stations[i_stations]
geps_fcst_ltime_station = geps_fcst_ltime.sel(stations=i_station)
stn = station_names[i_station]
with open("../../Export/GEPS_FCST2/log.txt", "a") as f:
print("INFO: processing station", i_station, "or", stn, ",at leadtime ",i_ltime, file=f)
# print(i_station, file=f)
for i_ens in range(0,len(realizations)):
i_en = realizations[i_ens]
# select parameter for this station and this realization
idx = i_stations*22 + i_ens
EMQ_par = eqm_pars.sel(station_ens = idx)
geps_ltime_uncorrect = geps_fcst_ltime_station.sel(realization = i_en)["PC_NWP"]
#create EMQ model based on this parameter
QM_model = sdba.EmpiricalQuantileMapping.from_dataset(EMQ_par)
# correct geps forecast
geps_ltime_corrected = QM_model.adjust(geps_ltime_uncorrect,extrapolation="constant", interp="linear")
geps_ltime_corrected = geps_ltime_corrected.fillna(0)
geps_ltime_corrected = xr.where((geps_ltime_corrected>1000),0,geps_ltime_corrected)
# assign forecast to the dataset
geps_fcst["PC_NWP_CORR"].loc[dict(time=slice(ltime_start, ltime_end),realization = i_en,stations = i_station)] = geps_ltime_corrected.values
return geps_fcst
#loop for lead times
for i_ltime in range(0,n_leadtimes):
start_time = time.time()
geps_fcst = process_each_lead_time(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_geps,realizations,stations)
print("Total time for each leadtime is ",time.time() - start_time)
geps_fcst["PC_NWP_CORR"].attrs['long_name'] = 'PC_NWP_CORR'
geps_fcst["PC_NWP_CORR"].attrs['ensemble'] = 'GEPS_CORR'
geps_fcst["PC_NWP_CORR"].attrs['units'] = 'mm'
geps_fcst = geps_fcst.drop_vars("PC_NWP")
geps_fcst.to_netcdf(path_output_geps)
geps_fcst.close()What I Received
This function works and provides the expected results. With the parallel code modified below it takes ~25 min to run. This process applied to ~150 sites with only a 14 day forecast lead time.
from joblib import Parallel, delayed
output_ncs = Parallel(n_jobs=4)(delayed(process_each_lead_time)(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_folder,realizations,stations,i_ltime) for i_ltime in range(0,n_leadtimes))