Skip to content

d-wwei/labloop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Labloop

中文文档 →

Autonomous experiment loop for Claude Code. Give it a codebase and a metric — it experiments, evaluates, keeps improvements, discards regressions, and repeats. Forever. You sleep, it researches.

Inspired by Karpathy's autoresearch, but generalized: labloop works on any optimization problem with a measurable metric, not just LLM training.

How It Works

init  →  Define what to optimize, what to measure, what files the agent can touch
go    →  Run baseline → enter infinite experiment loop
          ┌─→ Analyze history
          │   Form hypothesis
          │   Modify code
          │   git commit
          │   Run experiment
          │   Extract metric
          │   Better? → keep (branch advances)
          │   Worse?  → discard (git reset)
          └─→ Repeat forever

The agent never stops, never asks. It runs until you interrupt it.

Use Cases

Domain What changes What's measured
ML training Architecture, hyperparams, optimizer val_loss, accuracy
Algorithm optimization Implementation code Latency, throughput
Prompt engineering Prompt text, few-shot examples Accuracy, score
Frontend performance Components, CSS, config Lighthouse score
Compiler tuning Build flags, config Binary size, bench time
Any optimization Any editable file Any single numeric metric

See references/examples.md for full configuration examples across 5 domains.

Install

Copy the labloop directory into your Claude Code skills folder:

# Clone
git clone https://github.com/d-wwei/labloop.git

# Option 1: Symlink into Claude Code skills
ln -s "$(pwd)/labloop" ~/.claude/skills/labloop

# Option 2: Copy directly
cp -r labloop ~/.claude/skills/labloop

Claude Code will automatically detect the skill on next launch.

Quick Start

/labloop init      # Interactive setup — generates labloop.md config
/labloop go        # Run baseline + start infinite experiment loop
/labloop status    # Check progress: experiments run, best metric, recent results
/labloop history   # Full experiment log as a table
/labloop rewind    # Reset to the best-performing commit

Configuration

/labloop init creates a labloop.md in your project root. It defines:

  • Research goal — what you're optimizing (one sentence)
  • Editable files — what the agent can modify (glob patterns supported)
  • Read-only files — evaluation logic, data, etc. (agent won't touch these)
  • Run command — how to execute one experiment
  • Metric — name, direction (lower_is_better / higher_is_better), extraction command
  • Timeout — max seconds per experiment
  • Constraints — rules the agent must follow
  • Research hints — optional domain knowledge to guide the agent

Design Principles

Carried over from autoresearch and generalized:

  1. One metric rules all — every experiment is judged by a single number
  2. Fixed budget per experiment — fair comparison across all changes
  3. Keep or discard — binary; the branch only moves forward
  4. Never stop — the agent runs indefinitely until human interrupts
  5. Simplicity wins — equal results with less complexity = improvement
  6. One variable at a time — isolate effects, learn from history

File Structure

labloop/
├── SKILL.md                   # Skill definition (Claude Code reads this)
├── assets/
│   └── labloop-template.md    # Config file template
├── references/
│   └── examples.md            # 5 domain-specific config examples
├── scripts/
│   ├── run-with-timeout.sh    # Experiment runner with timeout
│   └── extract-metric.sh      # Metric extraction from logs
└── evals/                     # Test cases (planned)

In your project (generated by init):

your-project/
├── labloop.md              # Experiment config (committed to git)
├── labloop-run.log         # Latest run output (gitignored)
└── labloop-results.tsv     # Full experiment history (gitignored)

Requirements

  • Claude Code CLI
  • Git (experiments are managed via branches and commits)
  • Whatever runtime your project needs (Python, Node, Rust, etc.)

License

MIT

About

Autonomous experiment loop skill for Claude Code. Give it a codebase and a metric — it experiments forever. Generalized from Karpathy's autoresearch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages