Efficient Application of Bias Correction

* xclim version:
* Python version:
* Operating System:

### Description
I am trying to use xclim to bias correct precipitation from ECCC's GEPS product (ensemble NWP) and correcting it against CaPA data. The NWP model has 20 ensemble members and 16 lead times in the forecast. The data is applied to N different watersheds (each with a single timeseries and not gridded data). Through some research it was determined that each GEPS lead time has a slightly different bias. The EQM method was selected and parameters for EQM are pre-computed. The current code takes quite a bit of time to run and I would like to know if there are ways to speed it up. Particularly around the nested for loops. Is there a built in xarray function or xclim function?


### What I Did
Assume we start with a bit of house keeping, imports, identifying folders, etc.
```python3
def process_each_lead_time(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_geps,realizations,stations):
     
    # define the time slot for this leadtime 
    ltime_start = ini_time + np.timedelta64(leadtime_length*i_ltime, 'h')  + np.timedelta64(1,'h')
    ltime_end = ini_time + np.timedelta64(leadtime_length*(i_ltime+1), 'h')

    #select geps dataset within the time slot
    geps_fcst_ltime = geps_fcst.sel(time=slice(ltime_start, ltime_end)).copy(deep=True)

    # read eqm parameters for this lead time 
    eqm_pars = xr.open_dataset(os.path.join(path_eqm_par_folder,"Leadtime_"+str(i_ltime + 1) + "_EQM.nc")) 
    eqm_pars.close()

    # correct geps for each station and realization 
    for i_stations in range(0,len(stations)):
        i_station = stations[i_stations]
        geps_fcst_ltime_station = geps_fcst_ltime.sel(stations=i_station)
        
        stn = station_names[i_station]
        with open("../../Export/GEPS_FCST2/log.txt", "a") as f:
            print("INFO: processing station", i_station, "or", stn, ",at leadtime ",i_ltime, file=f)
            # print(i_station, file=f)
        for i_ens in range(0,len(realizations)):
            
            
            i_en = realizations[i_ens]
            
            # select parameter for this station and this realization 
            idx = i_stations*22 + i_ens
            EMQ_par = eqm_pars.sel(station_ens = idx)
            
            geps_ltime_uncorrect = geps_fcst_ltime_station.sel(realization = i_en)["PC_NWP"]

            #create EMQ model based on this parameter
            QM_model = sdba.EmpiricalQuantileMapping.from_dataset(EMQ_par)
            
            # correct geps forecast 
            geps_ltime_corrected = QM_model.adjust(geps_ltime_uncorrect,extrapolation="constant", interp="linear") 
            geps_ltime_corrected = geps_ltime_corrected.fillna(0)
            geps_ltime_corrected = xr.where((geps_ltime_corrected>1000),0,geps_ltime_corrected)
            
            # assign forecast to the dataset 
            geps_fcst["PC_NWP_CORR"].loc[dict(time=slice(ltime_start, ltime_end),realization = i_en,stations = i_station)] = geps_ltime_corrected.values
            
            
    return geps_fcst


#loop for lead times
for i_ltime in range(0,n_leadtimes):
    
    start_time = time.time()

    geps_fcst = process_each_lead_time(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_geps,realizations,stations)
    
    print("Total time for each leadtime is ",time.time() -  start_time)

geps_fcst["PC_NWP_CORR"].attrs['long_name'] = 'PC_NWP_CORR'
geps_fcst["PC_NWP_CORR"].attrs['ensemble'] = 'GEPS_CORR'
geps_fcst["PC_NWP_CORR"].attrs['units'] = 'mm'
geps_fcst = geps_fcst.drop_vars("PC_NWP")
geps_fcst.to_netcdf(path_output_geps)

geps_fcst.close()
```

### What I Received
This function works and provides the expected results. With the parallel code modified below it takes ~25 min to run. This process applied to ~150 sites with only a 14 day forecast lead time.
```python3
from joblib import Parallel, delayed

output_ncs = Parallel(n_jobs=4)(delayed(process_each_lead_time)(leadtime_length,ini_time,geps_fcst,path_eqm_par_folder,path_output_folder,realizations,stations,i_ltime) for i_ltime in range(0,n_leadtimes))
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient Application of Bias Correction #142

Description

What I Did

What I Received

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Efficient Application of Bias Correction #142

Description

Description

What I Did

What I Received

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions