Skip to content

Proposal: Extend sampling functions by calibration sampling methods #600

@instantkaffee

Description

@instantkaffee

I often find myself in data-modeling situations where the existing functions in rsample for setting up a proper assessment/analysis or test/train do not suffice.

Example: A multivariate regression problem, where numeric predictor data distributions are very frequent and centered around a region and only fewer observation are more distant, while the intention is to learn on all data especially effects when moving outside those frequent centered regions.

The risk of just learning the effect in the center by sampling sampling randomly test/train or assessment/analsis or even with some univariate stratification is high.Also the risk of getting inconsistent model performance results is higher.

I suggest to add functionality to rsample which has extended capability for sampling for these cases:

They ensure maximum coverage of data space for both test/train, resp. Assessment/analysis.

The problem is adressed by calibration sampling methods:
Have a look here for some:

https://cran.r-project.org/web/packages/prospectr/vignettes/prospectr.html#duplex-duplex

Literature:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions