Autonomous experiment loop for Claude Code. Give it a codebase and a metric — it experiments, evaluates, keeps improvements, discards regressions, and repeats. Forever. You sleep, it researches.
Inspired by Karpathy's autoresearch, but generalized: labloop works on any optimization problem with a measurable metric, not just LLM training.
init → Define what to optimize, what to measure, what files the agent can touch
go → Run baseline → enter infinite experiment loop
┌─→ Analyze history
│ Form hypothesis
│ Modify code
│ git commit
│ Run experiment
│ Extract metric
│ Better? → keep (branch advances)
│ Worse? → discard (git reset)
└─→ Repeat forever
The agent never stops, never asks. It runs until you interrupt it.
| Domain | What changes | What's measured |
|---|---|---|
| ML training | Architecture, hyperparams, optimizer | val_loss, accuracy |
| Algorithm optimization | Implementation code | Latency, throughput |
| Prompt engineering | Prompt text, few-shot examples | Accuracy, score |
| Frontend performance | Components, CSS, config | Lighthouse score |
| Compiler tuning | Build flags, config | Binary size, bench time |
| Any optimization | Any editable file | Any single numeric metric |
See references/examples.md for full configuration examples across 5 domains.
Copy the labloop directory into your Claude Code skills folder:
# Clone
git clone https://github.com/d-wwei/labloop.git
# Option 1: Symlink into Claude Code skills
ln -s "$(pwd)/labloop" ~/.claude/skills/labloop
# Option 2: Copy directly
cp -r labloop ~/.claude/skills/labloopClaude Code will automatically detect the skill on next launch.
/labloop init # Interactive setup — generates labloop.md config
/labloop go # Run baseline + start infinite experiment loop
/labloop status # Check progress: experiments run, best metric, recent results
/labloop history # Full experiment log as a table
/labloop rewind # Reset to the best-performing commit
/labloop init creates a labloop.md in your project root. It defines:
- Research goal — what you're optimizing (one sentence)
- Editable files — what the agent can modify (glob patterns supported)
- Read-only files — evaluation logic, data, etc. (agent won't touch these)
- Run command — how to execute one experiment
- Metric — name, direction (
lower_is_better/higher_is_better), extraction command - Timeout — max seconds per experiment
- Constraints — rules the agent must follow
- Research hints — optional domain knowledge to guide the agent
Carried over from autoresearch and generalized:
- One metric rules all — every experiment is judged by a single number
- Fixed budget per experiment — fair comparison across all changes
- Keep or discard — binary; the branch only moves forward
- Never stop — the agent runs indefinitely until human interrupts
- Simplicity wins — equal results with less complexity = improvement
- One variable at a time — isolate effects, learn from history
labloop/
├── SKILL.md # Skill definition (Claude Code reads this)
├── assets/
│ └── labloop-template.md # Config file template
├── references/
│ └── examples.md # 5 domain-specific config examples
├── scripts/
│ ├── run-with-timeout.sh # Experiment runner with timeout
│ └── extract-metric.sh # Metric extraction from logs
└── evals/ # Test cases (planned)
In your project (generated by init):
your-project/
├── labloop.md # Experiment config (committed to git)
├── labloop-run.log # Latest run output (gitignored)
└── labloop-results.tsv # Full experiment history (gitignored)
- Claude Code CLI
- Git (experiments are managed via branches and commits)
- Whatever runtime your project needs (Python, Node, Rust, etc.)
MIT