Article on automatic calibration splits #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

hfrick wants to merge 3 commits into main from calibration-split

Member

hfrick commented Oct 13, 2025

I've written about how we make the automatic calibration splits. The article covers the guiding principles of our approach, both for how and why (although not in academic paper depth).

The goal is also to let people understand in detail what happens for the sliding resamples. It has gotten relatively lengthy, though. I'm wondering if it should stay here or, e.g., go into a separate article, similar to how we split out the details on how we deal with censoring for the dynamic survival metrics. I think there's value in working through those details somewhere (other than the source code directly), but we could also experiment with collapsible text. Do you have any preferences or suggestions? Or do you think the length is fine as it is?

hfrick added 3 commits

September 4, 2025 14:59


          add current state from rsample

35fbeba


          edit and polish

2ed2cc9


          render

185eca6

hfrick requested review from EmilHvitfeldt and topepo

October 13, 2025 13:54

EmilHvitfeldt approved these changes

View reviewed changes

Member

EmilHvitfeldt left a comment

I think this is very valuable and high quality work.

I think this would be a good place for this content. It would also fit in rsample pkgdown for that matter.

there are small comments and nit picks but the overall structure and style i find very nice.

I'm also fine with the length. It is a complicated topic and without the prose and diagrams it would be hard to understand

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd

		```

		While preprocessing is the transformation of the predictors prior to a model fit, post-processing is the transformation of the predictions after the model fit. This could be as straightforward as limiting predictions to a certain range of values to as complicated as transforming them based on a separate calibration model.

Member

EmilHvitfeldt Oct 21, 2025

Below we are using the term primary model which we just started using. I like the term but i think it be nice to properly define it, in terms of pre/model/post diagram/terminology

Member

topepo Oct 27, 2025

I agree. I would add text to this paragraph after the first sentence and start the next paragraph with "An additional..."

learn/work/calibration-splits/index.qmd


		A calibration model is used to model the relationship between the predictions based on the primary model and the true outcomes. An additional model means an additional chance to accidentially overfit. So when working with calibration, this is crucial: we cannot use the same data to fit our calibration model as we use to assess the combination of primary and calibration model. Using the same data to fit the primary model and the calibration model means the predictions used to fit the calibration model are re-predictions of the same observations used to fit the primary model. Hence they are closer to the true values than predictions on new data would be and the calibration model doesn't have accurate information to estimate the right trends (so that they then can be removed).

		rsample provides a collection of functions to make resamples for empirical validation of prediction models. So far, the assumption was that the prediction model is the only model that needs fitting, i.e., a resample consists of an analysis set and an assessment set.

Member

EmilHvitfeldt Oct 21, 2025

Do we have a rsample doc page for analysis set and assessment set?

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd


		Let's start with the row-based splitting done by `sliding_window()`. We'll use a very small example dataset. This will make it easier to illustrate how the different subsets of the data are created but note that it is too small for real-world purposes. Let's use a data frame with 11 rows and say we want to use 5 for the analysis set, 3 for the assessment set, and leave a gap of 2 in between those two sets. We can make two such resamples from our data frame.

		![](images/calibration-split-window.jpg)

Member

EmilHvitfeldt Oct 21, 2025

could we do 4 for assessment set and a gap of 1?

right now there is very little air inside the assessment set with regards to the text

Member

EmilHvitfeldt Oct 21, 2025

i'm realizing that this would be a huge undertaking

learn/work/calibration-splits/index.qmd


		![](images/calibration-split-index.jpg)

		We still get two resamples, however, the analysis set contains only 4 rows because only those fall into the window defined by the index.

Member

EmilHvitfeldt Oct 21, 2025

Would it be beneficial to add the missing value at 1 or 5 such that the analysis sets have different length?

Member

EmilHvitfeldt Oct 21, 2025

or i guess, that isn't interesting at all. because then we do the same as the previous section

learn/work/calibration-splits/index.qmd

+              analysis(r_split)
+              ```
+              The sliding splits slide over _the data_, meaning they slide over observed values of the index and they slide only within the boundaries of the observed index values. So here, we can only slide within [3, 6] and thus cannot fit an inner analysis set of three and a calibration set of two into it. As established earlier, we fall back onto an empty calibration set in such a situation.

Member

EmilHvitfeldt Oct 21, 2025

we mention a couple of times that we fall back. should we mention that we fall back with a warning?

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd


		![](images/calibration-split-period.jpg)

		The principle of how to contruct a calibration split on the (outer) analysis set remains the same. The challenges of abstracting away from the rows, as illustrated for sliding over observed instances of an index also remain. Here, we slide over observed periods. We observe a period, if we observe an index within that period.

Member

EmilHvitfeldt Oct 21, 2025

Suggested change

      
            The principle of how to contruct a calibration split on the (outer) analysis set remains the same. The challenges of abstracting away from the rows, as illustrated for sliding over observed instances of an index also remain. Here, we slide over observed periods. We observe a period, if we observe an index within that period. 
          
            The principle of how to construct a calibration split on the (outer) analysis set remains the same. The challenges of abstracting away from the rows, as illustrated for sliding over observed instances of an index also remain. Here, we slide over observed periods. We observe a period, if we observe an index within that period.

Member

topepo Oct 27, 2025

We observe a period, if we observe an index within that period.

I'd reword here too. Maybe it is comma placement but it doesn't make immediate sense to me.

topepo requested changes

View reviewed changes

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd

		```

		While preprocessing is the transformation of the predictors prior to a model fit, post-processing is the transformation of the predictions after the model fit. This could be as straightforward as limiting predictions to a certain range of values to as complicated as transforming them based on a separate calibration model.

Member

topepo Oct 27, 2025

I agree. I would add text to this paragraph after the first sentence and start the next paragraph with "An additional..."

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd

+              #| echo: false
+              #| out.height: 350
+              #| fig.align: "center"
+              knitr::include_graphics("images/analysis-calibration-assessment.jpg")

Member

topepo Oct 27, 2025

On Chrome, I get:

learn/work/calibration-splits/index.qmd


		If you compare a model with calibration to one without, and you use the same resamples, you are also using the same assessment sets.

		"Taking data from the analysis set" means splitting up the analysis set to end up with ... an analysis set and a calibration set. Now we have two sets called analysis set, that's confusing. If we need to distinguish them, we'll refer to them as "outer" and "inner" analysis set for "before" and "after" the split for a calibration set.

Member

topepo Oct 27, 2025

I found the notion of "two sets called analysis set" confusing. I think it should say that we "further split our initial analysis set into two partitions..." or something similar. I still like the inner/outer notation, but I think the process could be better worded here.

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd Show resolved Hide resolved

learn/work/calibration-splits/index.qmd

+              - If we can't make a calibration split based on these basic principles, we skip the calibration.
+              For sliding splits of ordered data, applying those principles is a bit more complex than for other types of splits as the outer split into analysis and assessment is already a bit more complex. We've laid out the details of this here for reference.
+              For bootstrap splits, we don't directly split the (outer) analysis set but rather sample the (inner) analysis set from the unique rows in the (outer) analysis set to avoid data leakage between (inner) analysis and calibration set.

Member

topepo Oct 27, 2025

Were these to be two different sentences (based on the line break)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet