Tutorial: Self-Driving Labs with Bayesian Optimization

Level: Advanced Prerequisites: Python, Numpy, Basic Statistics Estimated Time: 30 Minutes

Introduction

In "Self-Driving Labs", AI doesn't just analyze data—it decides what experiment to run next.

Experiments are expensive. We can't test every drug combination. We need to find the best candidate with the fewest tests. This is where Bayesian Optimization shines.

It uses a statistical model (Gaussian Process) to balance:

Exploitation: Testing areas we think are good (high predicted yield).
Exploration: Testing areas we know nothing about (high uncertainty).

The Code Structure

The BayesianOptimizer class in bayesian_optimization.py has three key parts:

fit(X, Y): Trains the Gaussian Process on experiments we've already done.
predict(X_new): Guesses the outcome (mean) and uncertainty (variance) for new parameters.
propose_next_location(): Uses the Upper Confidence Bound (UCB) math to pick the best next experiment.

Step-by-Step Implementation

Step 1: Define your "Lab Experiment"

In a real lab, this is a robot. Here, we simulate it with a function.

def protein_yield(params):
    temperature = params[0]
    ph = params[1]
    
    # Optimum at Temp=37, pH=7.4
    return -((temperature - 37)**2 + (ph - 7.4)**2)

Step 2: Initialize the Optimizer

Define the bounds.

Temperature: 20 to 50
pH: 0 to 14

from bayesian_optimization import BayesianOptimizer

opt = BayesianOptimizer(bounds=[(20.0, 50.0), (0.0, 14.0)])

Step 3: The "Closed Loop"

Run the loop: AI Suggests -> Robot Executes -> AI Learns.

known_params = []
known_results = []

for i in range(10):
    # 1. AI Suggests
    next_conditions = opt.propose_next_location()
    
    # 2. Robot Executes (Simulation)
    result = protein_yield(next_conditions)
    
    # 3. AI Learns
    known_params.append(next_conditions)
    known_results.append(result)
    opt.fit(known_params, known_results)
    
    print(f"Batch {i}: Tried {next_conditions} -> Yield {result}")

Why this matters?

Standard "Grid Search" (trying every combination) would take 1000s of experiments. Bayesian Optimization can find the optimum in 10-20 steps. This reduces R&D costs by 90%.

Assignments

Modify synthetic_experiment in bayesian_optimization.py to be a 2D function (like the protein yield example above).
Tune the kappa parameter. What happens if you make it 10.0? (Hint: The AI becomes more curious/exploratory).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial: Self-Driving Labs with Bayesian Optimization

Introduction

The Code Structure

Step-by-Step Implementation

Step 1: Define your "Lab Experiment"

Step 2: Initialize the Optimizer

Step 3: The "Closed Loop"

Why this matters?

Assignments

FilesExpand file tree

TUTORIAL_SELF_DRIVING_LAB_OPTIMIZER.md

Latest commit

History

TUTORIAL_SELF_DRIVING_LAB_OPTIMIZER.md

File metadata and controls

Tutorial: Self-Driving Labs with Bayesian Optimization

Introduction

The Code Structure

Step-by-Step Implementation

Step 1: Define your "Lab Experiment"

Step 2: Initialize the Optimizer

Step 3: The "Closed Loop"

Why this matters?

Assignments