Autoresearch Optuna

AI-driven meta-optimization of Optuna samplers on the BBOB benchmark suite.

An AI agent iteratively evolves train.py to find the best-performing sampler configuration, evaluated on all 24 BBOB (Black-Box Optimization Benchmarking) functions in 5 dimensions across 10 random seeds.

Result

0.1501 mean normalized regret — 85% better than random sampling, 25% better than the CMA-ES baseline (0.2004), achieved through 97 experiments of systematic and creative search.

Architecture: Sobol → CMA-ES → Multi-stage Refinement

Phase	Trials	Strategy
1. Initialization	0–7	Sobol QMC (quasi-random space-filling)
2. Optimization	8–139	CMA-ES (popsize=6, sigma0=0.2)
3a. Medium refinement	140–169	Gaussian perturbation, sigma=1% of range
3b. Tight refinement	170–199	Gaussian perturbation, sigma=0.2% of range

Key Insight

CMA-ES converges by ~trial 140 on most functions. The remaining trial budget is better spent on targeted local search around the best point than continuing CMA-ES with diminishing returns. Since study.best_value tracks the global best, refinement improvements are kept while failed perturbations don't hurt.

Per-Category Results

Category	Regret	Functions
Separable	0.1161	f1–f5
Low conditioning	0.0311	f6–f9
High conditioning	0.0511	f10–f14
Multimodal (global)	0.1663	f15–f19
Multimodal (weak)	0.3623	f20–f24
Mean	0.1501	All 24

Benchmark: BBOB

The BBOB benchmark suite (Black-Box Optimization Benchmarking) is the gold standard for evaluating continuous black-box optimizers, used in GECCO competitions worldwide. It consists of 24 noiseless functions across 5 difficulty categories:

Category	Functions	Description
Separable	f1–f5	Independent dimensions
Low conditioning	f6–f9	Well-conditioned, moderate difficulty
High conditioning	f10–f14	Ill-conditioned, requires covariance adaptation
Multimodal (global)	f15–f19	Multiple global optima with visible structure
Multimodal (weak)	f20–f24	Deceptive multimodal landscapes (hardest)

Evaluation protocol:

Dimension: 5
Trials per run: 200
Seeds: 10 (42–51) for statistical robustness
Metric: Normalized regret = (sampler_best - f_opt) / (random_best - f_opt) — 0.0 = optimal, 1.0 = random-level
Final score: Mean normalized regret across all 24 functions and 10 seeds

Optimal values (f_opt) were computed via scipy.differential_evolution with 5 restarts. Random baselines were computed from 10 seeds of 200 random trials each. All results are deterministic and reproducible — full experiment logs in results.jsonl.

How to Run

train.py      — sampler configuration (exports create_sampler())
prepare.py    — evaluation harness (BBOB × 10 seeds × 200 trials)
results.jsonl — experiment log (97 experiments)
progress.png  — regret over time

python prepare.py

Acknowledgements

This project is directly inspired by Andrej Karpathy's autoresearch concept — using AI agents in an autonomous loop to conduct research by iteratively modifying code and evaluating results. The core idea of train.py (agent-modifiable) + prepare.py (immutable evaluator) comes from his framework.

Built with Optuna, cmaes, and OptunaHub BBOB benchmark.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs/plans		docs/plans
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
bbob_fopts.json		bbob_fopts.json
bbob_random_baseline.json		bbob_random_baseline.json
prepare.py		prepare.py
program.md		program.md
progress.png		progress.png
progress_25.png		progress_25.png
progress_50.png		progress_50.png
progress_75.png		progress_75.png
pyproject.toml		pyproject.toml
results.jsonl		results.jsonl
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoresearch Optuna

Result

Architecture: Sobol → CMA-ES → Multi-stage Refinement

Key Insight

Per-Category Results

Benchmark: BBOB

How to Run

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autoresearch Optuna

Result

Architecture: Sobol → CMA-ES → Multi-stage Refinement

Key Insight

Per-Category Results

Benchmark: BBOB

How to Run

Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages