Support for UQ with conformal prediction by sgreenbury · Pull Request #916 · alan-turing-institute/autoemulate

sgreenbury · 2025-11-04T13:52:11Z

Closes #848.
Contributes towards #589.
(merge after #917)

This PR:

Updates the base emulator API to enable the specification of validation data (to be used here as calibration data) (Let user pass train/val/test data split #905)
Adds a base class wrapper for emulators to provide conformal prediction (analogous to Ensemble)
Adds a ConformalMLP class (analogous to EnsembleMLP). This supports both constant width and quantile regression derived UQ.
The quantile regression version uses an MLP with a quantile training loss to fit the upper and lower quantiles and then uses a constant per-target correction based on performance on the calibration training data
Adds ConformalMLP to registry and re-exports (not adding as default emulator)
Updates API to support passing n_samples - this is to enable this to be controlled at lower levels of the API from the high-level AutoEmulate.
Adds output_to_tensors method on ConversionMixin to avoid repeated mapping of outputs to tensors
Fixes indentation of docstrings in MLP subclasses

Question:

Currently the two methods for deriving the intervals are part of the same base class with a parameter passed as a keyword argument - this is similar to the way the kernel fn is passed for GPs. Would it be preferable to have two classes? Or perhaps a general emulator subclass factory extending the one currently that only supports GPs.

Next steps:

Tests for passing validation data
Check UQ performance of ConformalMLP (e.g. on projectile and other simulators)
- For projectile "split" constant-width intervals does not provide good UQ while "quantile" is an improvement but less good than e.g. a GP.

radka-j · 2025-11-12T14:01:21Z

I don't think this PR closes #905 since we still don't let the user specify the test data set (and use all x as train data).

radka-j

This looks great! All my comments are really just about naming/docstrings.

sgreenbury · 2025-11-12T15:41:53Z

I don't think this PR closes #905 since we still don't let the user specify the test data set (and use all x as train data).

Ah I see - #905 is distinct from updates to the Emulator API required in #589. I'll update the top-level comment as contributes towards #589. Are the API changes in #905 specifically towards the AutoEmulate class?

sgreenbury · 2025-11-12T17:15:27Z

From discussion with @radka-j:

Add reference from paper above in notes docstring (we can mention that you can do the quantile regression in different ways uses regression in paper but we're using MLP here)
Extending to methods like the scaling can be in conformal base - add comment to explain and open an issue (different methodologies such as distance to training data and even MLP, to be explored)
Change to constant
Update to specific subclasses
Update API to support distribution type specification and docs explaining assumption

radka-j · 2025-11-13T09:16:28Z

I was thinking about our discussion yesterday about evaluating conformal intervals using MSLL. I only had a brief look but I don't think we can treat the conformal intervals as a distribution - we don't know if it is normal but also don't have a reason to assume it is uniform. We only have the 2 quantiles we compute (and the mean).

sgreenbury · 2025-11-13T09:46:12Z

I was thinking about our discussion yesterday about evaluating conformal intervals using MSLL. I only had a brief look but I don't think we can treat the conformal intervals as a distribution - we don't know if it is normal but also don't have a reason to assume it is uniform. We only have the 2 quantiles we compute (and the mean).

Yes good point - adding a distribution s imposing an additional assumption so that we can represent as a ProbabilisticEmulator. One option could be to provide API to support specifying / overriding the distributional assumption and some docs to describe the design choice.

Co-authored-by: Radka Jersakova <r.jersakova@gmail.com>

Also revises comments

radka-j · 2025-11-13T14:41:20Z

Yes good point - adding a distribution s imposing an additional assumption so that we can represent as a ProbabilisticEmulator. One option could be to provide API to support specifying / overriding the distributional assumption and some docs to describe the design choice.

I think that sounds really good.

This discussion made me think about whether in the case of Ensembles we also shouldn't assume normality. One option would be to introduce something like an EmpiricalDistribution that wraps around the ensemble predictions and can compute all the different empirical quantiles. Sampling in this case could be sampling from the ensemble with replacement. I think assuming normality is common so we can leave it as is but this might be more accurate - maybe we open an issue to discuss?

radka-j · 2025-11-13T14:50:53Z

Related to some of the discussion here, I also think we might want to update our plots to show 95% intervals (rather than 2 sigma). Can create an issue for that if you agree.

sgreenbury · 2025-11-13T16:53:02Z

Related to some of the discussion here, I also think we might want to update our plots to show 95% intervals (rather than 2 sigma). Can create an issue for that if you agree.

Yes good point given we have other distributions than normals potentially - let's update this.

EdwinB12 · 2025-11-20T09:43:56Z

Can I check where we are with this PR?

radka-j · 2025-11-20T10:34:17Z

@EdwinB12 waiting for me to do a second review

radka-j · 2025-11-21T13:00:58Z

At the meeting on Wednesday a suggestion was made to plot calibration of the conformal predictor on a test data set to check it is behaving as expected - I think this is standard. In that instance a user would probably want to loop over different $\alpha$ values, construct those intervals and then evaluate how much data is contained within each. Is our current implementation suited to this?

radka-j · 2025-11-27T17:07:42Z

After a discussion with @sgreenbury we've agreed on the following design:

# Q: possibly namedtuple instead for clarity?
IntervalLike = tuple[TensorLike, TensorLike]  # lower/upper interval

#  NOTE: call predict_mean in predict
class IntervalEmulator(Emulator):   

    # Q: Possibly abstractmethod?
    # NOTE: use alpha to determine lower/upper quantile pair (e.g., 0.05 and 0.95 quantiles for alpha=0.1)
    def predict_interval(self, x: TensorLike, alpha=0.05) -> IntervalLike:
         raise NotImplementedException("Should be implemented in subclass")

class Conformal(InervalEmulator):
    emulator: Emulator

    # Q: custom fit? could refit here?
    def _fit(self, x: TensorLike, ..., refit: bool = True): 
        ...

    def predict_interval(self, x: TensorLike, alpha=0.05) -> IntervalLike:
        ...

class ConformalConstant(Conformal):
    ... 

class ConformalizedQR(Conformal):
    alpha: float | list[float] # self.alphas = [alpha] if isinstance(alpha, float)

     ...

    # NOTE: alpha has to be in self.alphas
    def predict_interval(self, x: TensorLike, alpha=0.05) -> IntervalLike:
        ...

sgreenbury force-pushed the 848-conformal-prediction branch from 2d5adb6 to 6a315d2 Compare November 4, 2025 16:08

sgreenbury added 9 commits November 4, 2025 18:36

Add validation_data to Emulator API

b392472

Add output_to_tensor conversion, update usage

c630317

Add conformal module

87a2a0b

Add initial conformal impl and test

88153bf

Add ConformalMLP to registry and re-export

ef684d3

Add support for case when no cal data provided

e76ed72

Update test_grads for Conformal emulators

6e99ffb

Update LightGBM with validation data

31cbdd3

Update conformal MLP with kwargs

340d1e7

sgreenbury force-pushed the 848-conformal-prediction branch from 2cbf790 to 340d1e7 Compare November 4, 2025 18:36

sgreenbury changed the base branch from main to fix-docs-samples-transformed November 4, 2025 18:36

Update docstring

905a6d6

sgreenbury marked this pull request as draft November 4, 2025 19:41

sgreenbury added 2 commits November 5, 2025 13:54

Add conformal quantile regresssion

051ed55

Add n_samples to APIs

3c609fd

sgreenbury force-pushed the 848-conformal-prediction branch from 08b732c to 3c609fd Compare November 5, 2025 13:54

Add test passing validation data

e1db3c7

sgreenbury marked this pull request as ready for review November 5, 2025 14:24

Base automatically changed from fix-docs-samples-transformed to main November 5, 2025 15:33

sgreenbury commented Nov 7, 2025

View reviewed changes

Comment thread autoemulate/emulators/conformal.py

sgreenbury and others added 9 commits November 7, 2025 12:01

Add comment

c7146d9

Remove obsolete return type

76ec652

Remove separate args

0be09d5

Merge remote-tracking branch 'origin/main' into 848-conformal-prediction

afefe7e

Update docstring and kwarg order

cbc9821

Fix indentation of dcostring

483309f

Fix docstrings

edbf1d2

Add comment

d1bc291

Fix docstring

52fa84c

radka-j approved these changes Nov 12, 2025

View reviewed changes

Comment thread autoemulate/data/utils.py

Comment thread autoemulate/data/utils.py Outdated

Comment thread autoemulate/data/utils.py Outdated

Comment thread autoemulate/emulators/conformal.py Outdated

sgreenbury and others added 4 commits November 13, 2025 13:14

Add with_grad to docstring

0728857

Update

bd9b57c

Co-authored-by: Radka Jersakova <r.jersakova@gmail.com>

Revise return docstring

ba2ce49

Update method str to "constant"

183dc76

Also revises comments

sgreenbury added 3 commits November 13, 2025 17:10

Add option to customise distribution

5cf4f6e

Fix error message

65c326a

Extend docstring and add reference

e0cf73c

sgreenbury mentioned this pull request Nov 13, 2025

Extend the methods supported by Conformal #929

Open

sgreenbury added 5 commits November 13, 2025 17:35

Update docstring to explain adding new methods

dcbfe72

Add conformal MLP subclass factory and subclasses

b87c96d

Fix test

0d6bd9f

Fix subclassing and defaults

012776d

Ensure bounds ordering is valid

7e786b5

sgreenbury requested a review from radka-j November 18, 2025 12:16

sgreenbury added 2 commits November 18, 2025 16:45

Merge remote-tracking branch 'origin/main' into 848-conformal-prediction

f5c5f2d

Update tuning to support MetricParams

6489aaa

radka-j marked this pull request as draft March 25, 2026 13:43

Conversation

sgreenbury commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

radka-j commented Nov 12, 2025

Uh oh!

radka-j left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sgreenbury commented Nov 12, 2025

Uh oh!

sgreenbury commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

radka-j commented Nov 13, 2025

Uh oh!

sgreenbury commented Nov 13, 2025

Uh oh!

radka-j commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

radka-j commented Nov 13, 2025

Uh oh!

sgreenbury commented Nov 13, 2025

Uh oh!

EdwinB12 commented Nov 20, 2025

Uh oh!

radka-j commented Nov 20, 2025

Uh oh!

radka-j commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

radka-j commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sgreenbury commented Nov 4, 2025 •

edited

Loading

sgreenbury commented Nov 12, 2025 •

edited

Loading

radka-j commented Nov 13, 2025 •

edited

Loading

radka-j commented Nov 21, 2025 •

edited

Loading

radka-j commented Nov 27, 2025 •

edited

Loading