Choosing threshold in ExtremeValues

### Choosing threshold in ExtremeValues

In xsdba, the ExtremeValues correction procedure involves fitting of GPD distributions, which are then blended with empirical distributions to get a mixture distribution. These mixture distributions are then combined to get the bias-correction mapping. The procedure is as follows (as I understand it):
·	For ref and hist datasets, define clusters as exceeding 1 mm (or a user defined threshold ’cluster_thresh’) surrounded by value below the threshold 
·	For ref and hist datasets, take the q_thresh (default = 0.95 ) quantile of the above, giving a threshold for ref and one for hist
·	Define a final threshold as the mean of the two above thresholds
thresh = (np.nanquantile(ref[ref >= cluster_thresh], q_thresh) +  np.nanquantile(hist[hist >= cluster_thresh], q_thresh)) / 2

Typically the range, i.e also peak values, will differ between hist and ref data. The above procedure of defining the threshold for the GPD fits can be inconvenient because it may select a different number of data points for ref and hist, respectively. In the extreme case, no data may be selected for one of the datasets.


### A more flexible setup

A more flexible setup would be: 
·	possibility to specify individual thresholds for ref, hist (and sim)
·	possibility tor specify individual number of peaks for ref, hist (and sim)
Such an implementation would allow for most practical applications. I would therefore like to the community’s opinion about that.


### Additional context

_No response_

### Contribution

- [ ] I would be willing/able to open a Pull Request to contribute this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choosing threshold in ExtremeValues #245