Synthetic domains for transfer lerning #634

Hrovatin · 2025-09-04T11:20:30Z

Hrovatin
Sep 4, 2025
Collaborator

Collection of source-target relationships to consider when designing synthetic TL domains for benchmarking.

Focus of this discussion is on different types of unreliable source information, rather than other types of missmatch between source and target (such as non-overlapping parameter spaces suggested by Sara W. below).

Types of useful transfer information

Below is collection of different types of useful and harmful (inconsistent) information between source and target. They are not mutually exclusive. The plots show parameter on x-axis, outcome on y-axis, target in red, and source in black.

Some of the inconsistency types may become more relevant in multi-parameter setting with parameter interactions or constraints. But these can be often explained with other types of inconsistencies if the searchspace is regarded as a whole and not parameter wise, as explained by the examples below. For multi-parameter setting plots optima is colored yellow while worse outcome is blue-violet and two parameters are on axes.

All shown examples focus on optima being minimum.

Noisy source

Source follows same underlying function but is noisier. This should be easy to optimize if lots of source data is present - the main challenge occurs if little source data is present as then the learned relationships will be more biased by the noise effects.

Shifted outcomes

Modelling this would be easy if there was option to add interaction term between task parameter and the non-task parameter (x-axis on the plot).

Inverted source

This is what the current TaskKernel was designed to account for.

Different optima ranking

This may lead to prolonged exploration of optima more prominent in source/missing global target optima.

Partially shared optima

Some local optima are shared, but there are also some source or target specific local optima.

This could be regarded as special case of different optima ranking.

Missing target optima

While it would be hard to find global source optima, it may still help to find local optima due to most of the parameter space having similar outcome.

This could be regarded as special case of different optima ranking.

Wider source optima

This may be problematic in the case of flat and wide source optima.

Narrower source optima

This may be less interesting as target optima can then be easily found.

However, this may become relevant in multi-parameter setting with constraints, where having wider optima may lead to different regions of the allowed space being selected. But this can be probably simplified into missing target optima case by regarding only the allowed parameter space region.

Different outcome scale

Would not expect to have negative effect on TL.

However, this may again become relevant in multi-paramater setting where optima of two parameters are mutually exclusive due to some negative interaction. Depending on the contribution of each parameter to the outcome a different optima may be selected. The effect of this could be also explained with other inconsistency types (such as too wide optima if the first function on below plots is source and the second is target).

Y = - (X1 + X2 - X1*X2)

Stronger contribution of X1 while X2 and the interaction term contribute equally

Y = - (2*X1 + X2 - X1*X2)

Parameters with useful information

For some of the inconsistencies it may be relevant to be only present in part of parameter range (e.g. noise only at boundary).

In multi-feature (parameter) settings, it may happen that:

Only some of the parameters exhibit the above-described inconsistency behaviors:
- Parameters are differently strongly inconsistent (e.g. all are noisy but some more)
- Parameters can exhibit different inconsistencies (e.g. one is noisy and one shifted)
Some of the parameters are not relevant at all (pure noise; not TL specific issue)
Parameters contribute differently strongly to outcome and thus optima position (i.e. parameter A may affect outcome more than parameter B in source but vice versa in target)

sara2512 · 2025-09-04T11:32:47Z

sara2512
Sep 4, 2025

Would this also include cases where the parameter space is not the same between source and target? E.g. Parameters (p1, p2, p3, p4, p5) contribute to source outcome and (p1, p2, p6, p7) to target outcome? If this is of interest, I am happy to discuss more.

3 replies

Hrovatin Sep 4, 2025
Collaborator Author

Thank you for the suggestion. We were mainly discussing the setting where source information is unreliable.

What you suggested is something that the core team is aware of, although it wasn't prioritized in past months due to complexity of solving it (at least not that I would know of).

sara2512 Sep 4, 2025

I see, the focus was not clear to me.

Scienfitz Sep 9, 2025
Maintainer

yes, what Sara suggest sis definitely important, but already several steps ahead - we would like to get TL working better for the current simpler cases first, then lets revisit partially overlapping search spaces

kalama-ai · 2025-09-08T10:11:12Z

kalama-ai
Sep 8, 2025
Collaborator

Hi @Hrovatin, Thank you for compiling this list!

Are we focused on identifying failure cases for our models, even in unrealistic scenarios, or are we aiming to evaluate their robustness against simpler cases? Specifically, what would "failure" mean for us? Is it converging to the wrong optimum, or to the correct one but slower than a non-transfer learning approach?

For now, I would focus on scenarios without constraints. Since we are working with synthetic benchmarks, we could start with simple, low-dimensional synthetic functions to identify failure cases. From there, we can think about what we might encounter in real world usecases and derive higher dimensional examples.

I would be most interested in the following relations:

Shifted Outcomes: I would expect this to be challenging for hierarchical and additive approaches. I'm particularly interested in how the IndexKernel performs in comparison.

Inverted Source: I would like to see if the TaskKernel can really effectively learn this, if it has not been tested. Testing on a simple 1D example, also looking at predictions, would be interesting. Successful learning for the TaskKernel would imply it can invert the source, indicating negative correlations in the Index Kernel matrix.

Different Optima Ranking: We could explore this using the Forrester function, which is a well-known benchmark for multifidelity Bayesian optimization.

Second priority would be:

Noisy Source: While it should be manageable with ample source data, I suggest we test this on a simple 1D example for completeness.

Wider Source Optima: While this may slow down exploitation, it shouldn't lead us to converge on the wrong optimum. The same reasoning applies to the different outcome scale.

If we're comfortable starting with 1D examples first, I can develop a new Forrester benchmark that addresses the above cases.

4 replies

Hrovatin Sep 8, 2025
Collaborator Author

The main reason for compiling this is to start discussion on what is actually realistic failure type and should be tested.

I would also personally vote to add Shifted Outcomes and Different Optima Ranking, which are in my opinion among most likely. - Would even expect them to be more relevant than e.g. inverted source (unrealistic) or noisy source (less detrimental) which we currently do test.

Wrt inverted source afaik we know TaskKernel helps there.

kalama-ai Sep 8, 2025
Collaborator

I completely agree that we should focus on realistic failure types. Do you think the Forrester function would still be interesting for testing, even if it's 1D?

Hrovatin Sep 10, 2025
Collaborator Author

Yes I think the Forrester function would be a good start for differently ranked optima.
For the shift case we can do it with many different functions - as long as optima dont align when shifting.

kalama-ai Sep 10, 2025
Collaborator

Ok, great. I will create a Forrester benchmark and test our models on it :)

Hrovatin · 2025-10-23T13:59:54Z

Hrovatin
Oct 23, 2025
Collaborator Author

Comment from the team: Maybe not all of these differences are equally severe for BO outcome.

0 replies

Synthetic domains for transfer lerning #634

Uh oh!

Uh oh!

Hrovatin Sep 4, 2025 Collaborator

Types of useful transfer information

Noisy source

Shifted outcomes

Inverted source

Different optima ranking

Partially shared optima

Missing target optima

Wider source optima

Narrower source optima

Different outcome scale

Parameters with useful information

Replies: 3 comments · 7 replies

Uh oh!

sara2512 Sep 4, 2025

Uh oh!

Hrovatin Sep 4, 2025 Collaborator Author

Uh oh!

sara2512 Sep 4, 2025

Uh oh!

Scienfitz Sep 9, 2025 Maintainer

Uh oh!

kalama-ai Sep 8, 2025 Collaborator

Uh oh!

Uh oh!

Hrovatin Sep 8, 2025 Collaborator Author

Uh oh!

kalama-ai Sep 8, 2025 Collaborator

Uh oh!

Hrovatin Sep 10, 2025 Collaborator Author

Uh oh!

kalama-ai Sep 10, 2025 Collaborator

Uh oh!

Hrovatin Oct 23, 2025 Collaborator Author

Hrovatin
Sep 4, 2025
Collaborator

Replies: 3 comments 7 replies

sara2512
Sep 4, 2025

Hrovatin Sep 4, 2025
Collaborator Author

Scienfitz Sep 9, 2025
Maintainer

kalama-ai
Sep 8, 2025
Collaborator

Hrovatin Sep 8, 2025
Collaborator Author

kalama-ai Sep 8, 2025
Collaborator

Hrovatin Sep 10, 2025
Collaborator Author

kalama-ai Sep 10, 2025
Collaborator

Hrovatin
Oct 23, 2025
Collaborator Author