How to do Bayesian calibration for multi-dimensional outputs? #954

RemDelaporteMathurin · 2026-01-16T14:46:36Z

RemDelaporteMathurin
Jan 16, 2026

Hi everyone, first off thanks for developing autoemulate, and for making these very well written tutorials!

I would like to use it to perform Bayesian calibration on spectral data.
Consider this example case:

I have a spectrum that is made of two gaussian peaks. Each peak is defined by its mean and height (same stddev). I would like to use Bayesian calibration with an emulator (the real simulator can be expensive) to identify the most likely means and heights that reproduce the observation.

The Bayesian calibration tutorial shows a case for (x0, x1) -> (y0) but I was wondering how to apply this when y is a curve.

For extra context, this is the real application case. Up to now we were mostly using simple parametric optimisation and we would like to look at autoemulate instead!

Thanks in advance!

Answered by radka-j

Feb 4, 2026

Hi @RemDelaporteMathurin! Thank you for getting back to us, the notebook is really helpful.

The modelling approach combining PCA with GPs looks great. You could also try comparing RBF to the Matern kernel just in case you find you can get any performance improvement.

I found one bug in your code. For the calibration, you have:

bc = BayesianCalibration(
    em.model,
    ...
)

But this should be:

bc = BayesianCalibration(
    em,
    ...
)

In your notebook em is an instance of TransformedEmulator which wraps together the underlying GP (which makes predictions in the transformed 8 dimensional space) and the PCA transform (which transforms the 8D predictions back to the 50 dimensional data s…

View full answer

radka-j · 2026-01-26T15:10:07Z

radka-j
Jan 26, 2026
Maintainer

Hi @RemDelaporteMathurin! Thank you for your interest in AutoEmulate, it's really great to hear you found the tutorials useful :)

Could you please provide a bit more information on what it is exactly that you are after? It doesn't have to be too detailed, just to give enough of an intuition (e.g., what are your inputs and outputs like, what is the likelihood you want to fit)?

0 replies

RemDelaporteMathurin · 2026-01-29T21:24:20Z

RemDelaporteMathurin
Jan 29, 2026
Author

Hi @radka-j thank you so much for your reply!

I'm happy to provide more context, although I think I found how to do this.

My simulator basically produces a distribution (50 numbers) made of two peaks, and the parameters are the position and height of these peaks.
This is with [460, 1, 490, 0.8]

I generated some data:

Then I followed the dimensionality reduction tutorial, when I initially wrote this post, I thought that this was only for 2D/3D data but I now understand this is for very high dimensionality, in my case 50D I guess.

from autoemulate.core.compare import AutoEmulate
from autoemulate.transforms import StandardizeTransform, PCATransform
import torch
import numpy as np


# Convert to tensors
x, y = torch.Tensor(X).float(), torch.Tensor(Y).float()

# Split into train/test data
test_idx = [np.array([13, 39, 30, 45, 17, 48, 26, 25, 32, 19])]
train_idx = np.setdiff1d(np.arange(len(x)), test_idx)

# Run AutoEmulate
ae = AutoEmulate(
    x[train_idx], 
    y[train_idx],
    models=["GaussianProcessRBF"],
    x_transforms_list=[[StandardizeTransform()]],
    y_transforms_list=[[
        PCATransform(n_components=8),
        StandardizeTransform()
    ]],
    # fewer bootstraps to reduce computation time, though more uncertainty on test score
    n_bootstraps=None,
    log_level="error",
    model_params={"posterior_predictive": True},
    transformed_emulator_params={
        "full_covariance": False,
        "output_from_samples": False
    }
)

The results of the emulator are quite satisfactory:

I am just struggling with the next task which is Bayesian Calibration using run_mcmc, it seems to really take forever and the final results aren't even really accurate. I'd appreciate any pointers!

Here is the repo where I have these toy codes: https://github.com/festim-dev/festim-emulate/blob/main/tds_fitting/main.ipynb

0 replies

radka-j · 2026-02-04T19:47:55Z

radka-j
Feb 4, 2026
Maintainer

Hi @RemDelaporteMathurin! Thank you for getting back to us, the notebook is really helpful.

The modelling approach combining PCA with GPs looks great. You could also try comparing RBF to the Matern kernel just in case you find you can get any performance improvement.

I found one bug in your code. For the calibration, you have:

bc = BayesianCalibration(
    em.model,
    ...
)

But this should be:

bc = BayesianCalibration(
    em,
    ...
)

In your notebook em is an instance of TransformedEmulator which wraps together the underlying GP (which makes predictions in the transformed 8 dimensional space) and the PCA transform (which transforms the 8D predictions back to the 50 dimensional data space). When you specify em.model, it passes only the GP model to the calibrator without the PCA transform so that's why the calibration is not working.

I tried running the notebook a few times with the above change and have found variable but reasonable results. I had to run the MCMC for longer though (I used 1000 warm up steps and 2000 samples but you can play around with these values, you might get away with fewer, just keep an eye out on the rhat values). I should note that I have a very good computer so have found the running times reasonable (~15-20 min with the increased number of samples).

I also tried running BayesianCalibration with model_uncertainty=True (this accounts for the model predictive uncertainty when evaluating how likely the observed data is for a given set of parameter values). I feel this makes sense in this context but the results felt qualitatively similar.

Below are some example results plots I got (these are runs where I set model_uncertainty=True but as I said, it looked similarly without this):

RUN 1:

RUN 2:

The plots were made with this code:

from getdist.arviz_wrapper import arviz_to_mcsamples
from getdist import plots

az_data = bc.to_arviz(mcmc_emu, posterior_predictive=True)
getdist_samples = arviz_to_mcsamples(az_data, dataset_label="Emulator")

g = plots.get_subplot_plotter()
g.triangle_plot([getdist_samples], filled=True)

The bimodality in the X0 and X2 parameters makes sense given there is nothing in the model constraining the parameters to only capture one of the Gaussian means each. This problem is called "label switching" and is a commonly known issue in mixture models. One option might be to add some ordering constraints on the parameters but that is not currently implemented in AutoEmulate.

I hope this is helpful. If you have suggestions for functionality that you think would be a helpful extension to our existing API, please feel free to open an issue for it. Also, do let us know if you have any further questions or if we can help with anything.

2 replies

RemDelaporteMathurin Feb 5, 2026
Author

Thank you so much for your reply @radka-j !! This is extremely helpful!

I shall try this as soon as possible!

One quick question about the computational time of the bayesian calibration: I may be naive but 20-ish minutes for one fit seems long to me. Would this be as long or longer is we were fitting several calibrations in one calibration run?

radka-j Feb 10, 2026
Maintainer

Short answer: I am not entirely sure. There is some overhead to the MCMC method we are using under the hood by default (HMC) as it does extra computations to figure out where to sample next at each step. The result is that you explore the true posterior with fewer samples compared to other methods and this becomes especially useful in high dimensional spaces. You could change the sampling method (sampler=metropolis) which is a lot faster per sample but leads to worse results in your case even when you significantly increase the sample size (I tried it). That said, it has been on my mind to profile the code to make sure the current implementation is the most performant it can be. I will open an issue for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do Bayesian calibration for multi-dimensional outputs? #954

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to do Bayesian calibration for multi-dimensional outputs? #954

Uh oh!

RemDelaporteMathurin Jan 16, 2026

Replies: 3 comments · 2 replies

Uh oh!

radka-j Jan 26, 2026 Maintainer

Uh oh!

RemDelaporteMathurin Jan 29, 2026 Author

Uh oh!

Uh oh!

radka-j Feb 4, 2026 Maintainer

Uh oh!

RemDelaporteMathurin Feb 5, 2026 Author

Uh oh!

radka-j Feb 10, 2026 Maintainer

RemDelaporteMathurin
Jan 16, 2026

Replies: 3 comments 2 replies

radka-j
Jan 26, 2026
Maintainer

RemDelaporteMathurin
Jan 29, 2026
Author

radka-j
Feb 4, 2026
Maintainer

RemDelaporteMathurin Feb 5, 2026
Author

radka-j Feb 10, 2026
Maintainer