You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This notebook contains an example on how to use BayBE for the optimization of reaction conditions. It is inspired by the corresponding notebook developed by Pat Walters as part of his [Practical Cheminformatics Tutorial](https://github.com/PatWalters/practical_cheminformatics_tutorials). This notebook assumes basic familiarity with the core concepts of Bayesian Optimization. The intention of this notebook is *not* to introduce and explain all aspects of Bayesian Optimization, but to focus on the usage of BayBE.
21
+
This notebook contains an example on how to use BayBE for the optimization of reaction conditions. It is inspired by the corresponding notebook developed by Pat Walters as part of his [Practical Cheminformatics Tutorial](https://github.com/PatWalters/practical_cheminformatics_tutorials). This notebook assumes basic familiarity with the core concepts of Bayesian Optimization. The intention of this notebook is *not* to introduce and explain all aspects of Bayesian Optimization, but to focus on the usage of `BayBE`.
22
22
23
-
In drug discovery, we frequently encounter situations where we need to modify a set of reaction conditions to optimize the yield. This notebook shows how to use BayBE to model and optimize such a campaign.
23
+
In drug discovery, we frequently encounter situations where we need to modify a set of reaction conditions to optimize the yield. This notebook shows how to use `BayBE` to model and optimize such a campaign.
24
+
In particular, it demonstrates the power and usefulness of `BayBE`'s chemical encodings. If parameters in a process to be optimized are chemicals, this feature enables `BayBE` to automatically use meaningful chemical descriptors, automatically leveraging chamical knowledge for the optimization process.
24
25
25
-
# Chemical encodings
26
-
27
-
This notebook demonstrates the power and usefulness of BayBE's chemical encodings. If parameters in a process to be optimized are chemicals, this feature enables BayBE to automatically use meaningful chemical descriptors, automatically leveraging chamical knowledge for the optimization process.
28
-
29
-
This notebook assumes some basic familiarity with using BayBE, and that it does not explain all of the core concepts. If you are interested in those, we recommend to first check out the `Reation_Optimization` example.
30
26
31
27
/// caution
32
-
This notebook was developed for BayBE version 0.14.2. Although we do our best in keeping our breaking changes minimal and support outdated versions for a long time, this notebook might not be immediately applicable for other BayBE versions. If you install BayBE via the instructions in this repository, version 0.14.2 will thus be installed.
28
+
This notebook was developed for `BayBE` version 0.14.2. Although we do our best in keeping our breaking changes minimal and support outdated versions for a long time, this notebook might not be immediately applicable for other `BayBE` versions. If you install `BayBE` via the instructions in this repository, version 0.14.2 will thus be installed.
Setting up an experimentation campaign with `BayBE` requires us to set up the main components individually. In this notebook, we will set up the following components one after another.
97
100
98
-
1. [**Parameters**](https://emdgroup.github.io/baybe/0.14.2/userguide/parameters.html): In our setting, a _parameter_ is something that we can control directly. An example of this is which ligand to choose, or at which of the available temperatures to run the experiment. Each of the 5 parameters described earlier will correspond to exactly one of BayBE's `Parameter`s.
101
+
1. [**Parameters**](https://emdgroup.github.io/baybe/0.14.2/userguide/parameters.html): In our setting, a _parameter_ is something that we can control directly. An example of this is which ligand to choose, or at which of the available temperatures to run the experiment. Each of the 5 parameters described earlier will correspond to exactly one of `BayBE`'s `Parameter`s.
99
102
2. [**Search space**](https://emdgroup.github.io/baybe/0.14.2/userguide/searchspace.html): The search space defines the combination of parameters to be searched. It thus contains all possible experiments that we could conduct. The search space is typically defined using the function `Searchspace.from_product`, which creates a search space as the Cartesian product of the parameters.
100
103
3. [**Target**](https://emdgroup.github.io/baybe/0.14.2/userguide/targets.html): The target is the quantity we are optimizing. In the case of reaction optimization, this is typically the yield. `BayBE` can optimize a single parameter or multiple parameters at once. In this notebook, we'll focus on single parameter optimization, where we are only optimizing the yield, and we hence stick to single target optimization.
101
104
4. [**Recommender**](https://emdgroup.github.io/baybe/0.14.2/userguide/recommenders.html): The recommender selects the next set of experiments to be performed. In this case, we use the default [`TwoPhaseMetaRecommender`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.recommenders.meta.sequential.TwoPhaseMetaRecommender.html). This recommender behaves differently depending on whether it has experimental data. At the beginning of an optimization process, we typically don't have experimental data and want to find a diverse set of conditions to gather some initial data. If the `TwoPhaseMetaRecommender` has no data available, it uses random sampling to select a set of initial experiments. If the recommender has data, it uses the [`BotorchRecommender`].(https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.recommenders.pure.bayesian.botorch.BotorchRecommender.html), a Bayesian optimizer that balances exploration and exploitation when selecting sets of reaction conditions.
@@ -109,30 +112,35 @@ def _(mo):
109
112
mo.md("""
110
113
## Defining the [`Parameters`](https://emdgroup.github.io/baybe/0.14.2/userguide/parameters.html)
111
114
112
-
In this section, we introduce two different parameter types: The [`CategoricalParameter`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.parameters.categorical.CategoricalParameter.html) and the [`NumericalDiscreteParameter`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.parameters.numerical.NumericalDiscreteParameter.html).
113
-
114
-
115
-
The `CategoricalParameter` has a `name` field as well as a `values` field. The `name` is used to describe the parameter, while the `values` are the collection of values that the parameter can take. In addition, one can choose a specific `encoding`. For the sake of this tutorial, we use the `One-Hot-Encoding`, `BayBE`'s default choice for `CategoricalParameter`s.
115
+
In this section, we introduce two different parameter types: The [`SubstanceParameter`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.parameters.substance.SubstanceParameter.html) and the [`NumericalDiscreteParameter`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.parameters.numerical.NumericalDiscreteParameter.html).
116
116
117
-
In this tutorial, we model the three different chemical parameters, that is, the solvent, the ligand, and the base as `CategoricalParameters`. Since we have access to the data, we extract the values for the parameters from there, and create the corresponding `CategoricalParameters`.
117
+
The `SubstanceParameter` is specifically designed for chemical substances and can automatically use meaningful chemical descriptors. It takes a `name` field and a `data` dictionary mapping substance names to their SMILES representations. One can also choose a specific chemical `encoding` such as MORDRED, ECFP, or RDKIT2DDESCRIPTORS.
118
118
119
-
/// admonition | Note
120
-
As ligand, solvent and base are chemical substances, they should preferably be modeled using the [`SubstanceParameter`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.parameters.substance.SubstanceParameter.html). This is not done in this example for simplicity. We refer to the `Chemical_Encodings` example for a tutorial on using `SubstanceParameter`s and a demonstration of its effect.
121
-
///
119
+
In this tutorial, we model all three chemical parameters (ligand, solvent, and base) as `SubstanceParameter`s to leverage chemical knowledge in the optimization process. Since we have access to the SMILES data, we extract the mappings from the data and create the corresponding `SubstanceParameter`s.
## Define the [`Recommender`](https://emdgroup.github.io/baybe/0.14.2/userguide/recommenders.html)
202
210
203
211
The [`Recommender`](https://emdgroup.github.io/baybe/0.14.2/userguide/recommenders.html) selects the next set of experiments to try.
204
-
There are many different recommenders offered by `BayBE`, and a lot of ways of combining them. For this example, we use the default initial recommender, the [`RandomRecommender`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.recommenders.pure.nonpredictive.sampling.RandomRecommender.html). This recommender samples initial points from the search space randomly. Once it has data available, BayBE will automatically switch to the [`BotorchRecommender`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.recommenders.pure.bayesian.botorch.BotorchRecommender.html).
205
-
206
-
/// admonition | Task
207
-
Instead of using the default recommender, use the [`FPSRecommender`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.recommenders.pure.nonpredictive.sampling.FPSRecommender.html). Also, think about which of the two recommenders should be used in this example, and under which circumstances which recommender might be more favourable.
208
-
///
212
+
There are many different recommenders offered by `BayBE`, and a lot of ways of combining them. For this example, we use a [`TwoPhaseMetaRecommender`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.recommenders.meta.sequential.TwoPhaseMetaRecommender.html) equipped with a [`BotorchRecommender`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.recommenders.pure.bayesian.botorch.BotorchRecommender.html) that uses the [`EDBOKernel`](https://emdgroup.github.io/baybe/0.14.2/_autosummary/baybe.surrogates.gaussian_process.presets.edbo.EDBOKernelFactory.html). The EDBO kernel is particularly well-suited for chemical optimization problems as it was specifically designed to handle chemical descriptors effectively.
As we can see, we found a very good candidate, and only needed to evaluate a fraction of the search space! This insight concludes this basic BayBE tutorial.
354
+
As we can see, we found a very good candidate, and only needed to evaluate a fraction of the search space!
355
+
""")
356
+
return
357
+
358
+
359
+
@app.cell(hide_code=True)
360
+
def_(mo):
361
+
mo.md(r"""
362
+
## Investigating Different Chemical Encodings
363
+
364
+
Now that we've seen how to use `BayBE` for reaction optimization with chemical encodings, let's investigate how different chemical encodings affect the optimization performance. BayBE supports multiple chemical encodings for `SubstanceParameter`s, each capturing different aspects of molecular structure.
365
+
366
+
We'll compare three different chemical encodings as well as the One-Hot-Encoding by running simulated optimization campaigns and visualizing their performance.
367
+
""")
368
+
return
369
+
370
+
371
+
@app.cell(hide_code=True)
372
+
def_(mo):
373
+
mo.md("""
374
+
We create multiple campaigns, one for each encoding we want to compare. Each campaign will use `SubstanceParameter`s for all three chemical parameters (solvent, ligand, and base) with the specified encoding:
We also want to compare the campaigns using chemical encodings to a baseline campaign that uses One-Hot Encoding (OHE) with `CategoricalParameter`s for all chemical parameters:
### Using BayBE's [simulation capabilities](https://emdgroup.github.io/baybe/0.14.2/userguide/simulation.html)
474
+
475
+
BayBE offers powerful simulation capabilities that allow us to compare different optimization strategies without running actual experiments. The simulation uses a lookup mechanism to retrieve target values from our dataset, effectively simulating multiple optimization campaigns with different random seeds (Monte Carlo iterations).
476
+
477
+
Let's run the simulation with multiple Monte Carlo iterations to get statistically meaningful results:
478
+
""")
479
+
return
480
+
481
+
482
+
@app.cell
483
+
def_(df, scenarios):
484
+
frombaybe.simulationimportsimulate_scenarios
485
+
486
+
BATCH_SIZE=2
487
+
N_DOE_ITERATIONS=5# Change to ~20 for better plots
488
+
N_MC_ITERATIONS=10# Change to ~30 for better plots
489
+
490
+
results=simulate_scenarios(
491
+
scenarios,
492
+
df,
493
+
batch_size=BATCH_SIZE,
494
+
n_doe_iterations=N_DOE_ITERATIONS,
495
+
n_mc_iterations=N_MC_ITERATIONS,
496
+
)
497
+
498
+
results.rename(
499
+
columns={
500
+
"Scenario": "Substance encoding",
501
+
"Num_Experiments": "Number of experiments",
502
+
"yield_CumBest": "Running best yield",
503
+
},
504
+
inplace=True,
505
+
)
506
+
return (results,)
507
+
508
+
509
+
@app.cell(hide_code=True)
510
+
def_(mo):
511
+
mo.md("""
512
+
### Visualizing the Results
513
+
514
+
Now let's visualize the results using the `backtest_plot` utility. This plot shows:
515
+
- The mean performance across Monte Carlo iterations (solid line)
516
+
- Confidence intervals (shaded regions)
517
+
- A horizontal guideline at 90% yield (our target threshold)
518
+
- Vertical guidelines showing when the `MORDRED` and `OHE` encoding reach the target.
519
+
520
+
The plot shows that using the `SubstanceParameter` and hence equipping `BayBE` with chemical knowledge significantly improves the performance.
0 commit comments