Skip to content

Efficient Application of Bias Correction #142

@kckornelsen

Description

@kckornelsen
  • xclim version:
  • Python version:
  • Operating System:

Description

I am trying to use xclim to bias correct precipitation from ECCC's GEPS product (ensemble NWP) and correcting it against CaPA data. The NWP model has 20 ensemble members and 16 lead times in the forecast. The data is applied to N different watersheds (each with a single timeseries and not gridded data). Through some research it was determined that each GEPS lead time has a slightly different bias. The EQM method was selected and parameters for EQM are pre-computed. The current code takes quite a bit of time to run and I would like to know if there are ways to speed it up. Particularly around the nested for loops. Is there a built in xarray function or xclim function?

What I Did

Assume we start with a bit of house keeping, imports, identifying folders, etc.

def process_each_lead_time(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_geps,realizations,stations):
     
    # define the time slot for this leadtime 
    ltime_start = ini_time + np.timedelta64(leadtime_length*i_ltime, 'h')  + np.timedelta64(1,'h')
    ltime_end = ini_time + np.timedelta64(leadtime_length*(i_ltime+1), 'h')

    #select geps dataset within the time slot
    geps_fcst_ltime = geps_fcst.sel(time=slice(ltime_start, ltime_end)).copy(deep=True)

    # read eqm parameters for this lead time 
    eqm_pars = xr.open_dataset(os.path.join(path_eqm_par_folder,"Leadtime_"+str(i_ltime + 1) + "_EQM.nc")) 
    eqm_pars.close()

    # correct geps for each station and realization 
    for i_stations in range(0,len(stations)):
        i_station = stations[i_stations]
        geps_fcst_ltime_station = geps_fcst_ltime.sel(stations=i_station)
        
        stn = station_names[i_station]
        with open("../../Export/GEPS_FCST2/log.txt", "a") as f:
            print("INFO: processing station", i_station, "or", stn, ",at leadtime ",i_ltime, file=f)
            # print(i_station, file=f)
        for i_ens in range(0,len(realizations)):
            
            
            i_en = realizations[i_ens]
            
            # select parameter for this station and this realization 
            idx = i_stations*22 + i_ens
            EMQ_par = eqm_pars.sel(station_ens = idx)
            
            geps_ltime_uncorrect = geps_fcst_ltime_station.sel(realization = i_en)["PC_NWP"]

            #create EMQ model based on this parameter
            QM_model = sdba.EmpiricalQuantileMapping.from_dataset(EMQ_par)
            
            # correct geps forecast 
            geps_ltime_corrected = QM_model.adjust(geps_ltime_uncorrect,extrapolation="constant", interp="linear") 
            geps_ltime_corrected = geps_ltime_corrected.fillna(0)
            geps_ltime_corrected = xr.where((geps_ltime_corrected>1000),0,geps_ltime_corrected)
            
            # assign forecast to the dataset 
            geps_fcst["PC_NWP_CORR"].loc[dict(time=slice(ltime_start, ltime_end),realization = i_en,stations = i_station)] = geps_ltime_corrected.values
            
            
    return geps_fcst


#loop for lead times
for i_ltime in range(0,n_leadtimes):
    
    start_time = time.time()

    geps_fcst = process_each_lead_time(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_geps,realizations,stations)
    
    print("Total time for each leadtime is ",time.time() -  start_time)

geps_fcst["PC_NWP_CORR"].attrs['long_name'] = 'PC_NWP_CORR'
geps_fcst["PC_NWP_CORR"].attrs['ensemble'] = 'GEPS_CORR'
geps_fcst["PC_NWP_CORR"].attrs['units'] = 'mm'
geps_fcst = geps_fcst.drop_vars("PC_NWP")
geps_fcst.to_netcdf(path_output_geps)

geps_fcst.close()

What I Received

This function works and provides the expected results. With the parallel code modified below it takes ~25 min to run. This process applied to ~150 sites with only a 14 day forecast lead time.

from joblib import Parallel, delayed

output_ncs = Parallel(n_jobs=4)(delayed(process_each_lead_time)(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_folder,realizations,stations,i_ltime) for i_ltime in range(0,n_leadtimes))

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions