Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 42 additions & 5 deletions .claude/skills/autoresearch/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,16 +54,35 @@ Then:

1. Create branch: `git checkout -b autoresearch/<tag>`
2. Read all in-scope files for full context
3. Add `results.tsv` and `run.log` to `.gitignore` (create if needed):
3. Add `results.tsv`, `run.log`, and `.autoresearch_wandb.json` to `.gitignore` (create if needed):
```bash
echo -e "results.tsv\nrun.log" >> .gitignore
echo -e "results.tsv\nrun.log\n.autoresearch_wandb.json" >> .gitignore
```
4. Create `results.tsv` with header row:
```
commit metric status description
```
5. Run the baseline (no changes) and record it as the first row
6. Confirm setup with the human, then begin the loop
5. **Launch dashboard** (local web UI — zero deps, stdlib only):
```bash
python dashboard/server.py --port 8420 --results results.tsv &
echo "Dashboard: http://localhost:8420"
```
The dashboard auto-refreshes every 3s showing metric trends, keep/discard/crash
stats, and a full experiment table. Works in any browser. No install needed.

6. **Initialize wandb** (optional remote logging — requires `pip install wandb`):
```bash
# Only if the user wants remote logging
python dashboard/wandb_logger.py --init \
--project autoresearch-<tag> \
--name "<objective>" \
--config '{"objective":"<objective>","direction":"<direction>","scope":"<edit-scope>"}'
```
If wandb is not installed or not wanted, skip this step. All data is always
available locally via `results.tsv` and the dashboard regardless.

7. Run the baseline (no changes) and record it as the first row
8. Confirm setup with the human, then begin the loop

**Once the human confirms, you are autonomous. Do not ask again.**

Expand Down Expand Up @@ -130,7 +149,15 @@ Append to `results.tsv` (tab-separated):
<commit-hash-7char> <metric-value-or-ERR> <keep|discard|crash> <description>
```

**Wandb sync** (if initialized in Phase 0):
```bash
python dashboard/wandb_logger.py --log \
--metric <metric-value-or-nan> --status <keep|discard|crash> \
--desc "<description>"
```

**Do NOT commit results.tsv** — it's in `.gitignore` so reverts don't lose the log.
The dashboard picks up changes automatically (3s poll). No manual refresh needed.

### Step 7 — Repeat
Go back to Step 1. **NEVER STOP.**
Expand Down Expand Up @@ -210,7 +237,17 @@ after that is happening in parallel.

## Phase 2: Summary (when human returns)

When the human interrupts or you detect they're back, produce a summary:
When the human interrupts or you detect they're back:

1. **Finish wandb run** (if initialized):
```bash
python dashboard/wandb_logger.py --finish
```
2. **Stop the dashboard** (if still running):
```bash
pkill -f "dashboard/server.py" 2>/dev/null
```
3. Produce a summary:

```
=== Autoresearch Summary ===
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__/
44 changes: 43 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,18 +34,53 @@ A Claude Code skill that runs autonomous improvement loops on any codebase. Insp
└─────────────────────────────────────────────────────┘
```

## Dashboard & Observability

### Local Web Dashboard

A built-in web dashboard (zero Python dependencies, stdlib only) shows live experiment
progress — metric trends, keep/discard/crash stats, and a full experiment table.

```bash
python dashboard/server.py --port 8420 --results results.tsv
```

Open `http://localhost:8420` in any browser. Auto-refreshes every 3 seconds.

The skill automatically launches this during setup. No manual action needed.

### Remote Logging with Wandb

Optional integration with [Weights & Biases](https://wandb.ai) for remote experiment tracking:

```bash
# Install wandb (optional)
pip install wandb

# During autoresearch, the skill can log to wandb
python dashboard/wandb_logger.py --init --project my-experiment --config '{"objective":"minimize loss"}'

# Replay all results.tsv data to wandb after the fact
python dashboard/wandb_logger.py --replay --project my-experiment
```

Wandb is fully optional. All data is always available locally via `results.tsv` and the dashboard.

## Install

Copy the skill to your Claude Code skills directory:

```bash
# User-level (all projects)
# User-level (all projects) — skill only
mkdir -p ~/.claude/skills/autoresearch
cp .claude/skills/autoresearch/SKILL.md ~/.claude/skills/autoresearch/SKILL.md

# Or project-level (current project only, after cloning this repo)
mkdir -p /path/to/your/project/.claude/skills/autoresearch
cp .claude/skills/autoresearch/SKILL.md /path/to/your/project/.claude/skills/autoresearch/

# For dashboard + wandb support, copy the dashboard/ directory into your project:
cp -r dashboard/ /path/to/your/project/dashboard/
```

Or one-liner from GitHub:
Expand All @@ -56,6 +91,12 @@ curl -sL https://raw.githubusercontent.com/labclaw/autoresearch-skill/main/.clau
-o ~/.claude/skills/autoresearch/SKILL.md
```

Or use the install script:

```bash
./install.sh
```

## Usage

### Interactive setup
Expand Down Expand Up @@ -156,6 +197,7 @@ Directly from [Karpathy's autoresearch](https://github.com/karpathy/autoresearch
| ML architecture only | Any code changes |
| Single GPU required | Any compute environment |
| Claude Code / Cursor required | Claude Code only |
| No UI | Local web dashboard + wandb integration |

The loop structure is identical: **propose -> commit -> run -> evaluate -> keep/discard -> repeat**.

Expand Down
6 changes: 6 additions & 0 deletions dashboard/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Autoresearch Dashboard & Logging

from dashboard.server import DashboardHandler, parse_results_tsv
from dashboard.wandb_logger import WandbLogger

__all__ = ["DashboardHandler", "parse_results_tsv", "WandbLogger"]
162 changes: 162 additions & 0 deletions dashboard/server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#!/usr/bin/env python3
"""Lightweight web dashboard for autoresearch experiment tracking.

Serves a live-updating HTML page that reads results.tsv and displays
metric trends, experiment status, and run details. Uses only Python
stdlib — no external dependencies required.

Usage:
python dashboard/server.py [--port 8420] [--results results.tsv]
"""

import argparse
import json
import os
from http.server import HTTPServer, SimpleHTTPRequestHandler
from pathlib import Path
from typing import Any

HERE = Path(__file__).parent
TEMPLATES = HERE / "templates"
STATIC = HERE / "static"


def parse_results_tsv(path: Path) -> list[dict[str, Any]]:
"""Parse results.tsv into a list of dicts."""
rows = []
if not path.exists():
return rows
try:
text = path.read_text().strip()
if not text:
return rows
lines = text.split("\n")
if not lines:
return rows
header = lines[0].split("\t")
for line in lines[1:]:
parts = line.split("\t")
row = {}
for i, col in enumerate(header):
row[col] = parts[i] if i < len(parts) else ""
rows.append(row)
except Exception:
pass
return rows


class DashboardHandler(SimpleHTTPRequestHandler):
"""HTTP handler that serves the autoresearch dashboard."""

results_path: Path = Path("results.tsv")

def __init__(self, *args, **kwargs):
super().__init__(*args, directory=str(STATIC), **kwargs)

def do_GET(self):
if self.path == "/" or self.path == "/index.html":
self._serve_template()
elif self.path == "/api/results":
self._serve_results()
elif self.path == "/api/status":
self._serve_status()
else:
super().do_GET()

def _serve_template(self):
"""Serve the main dashboard HTML."""
template = TEMPLATES / "index.html"
if template.exists():
content = template.read_bytes()
self.send_response(200)
self.send_header("Content-Type", "text/html; charset=utf-8")
self.send_header("Content-Length", str(len(content)))
self.end_headers()
self.wfile.write(content)
else:
self.send_error(404, "Template not found")

def _serve_results(self):
"""Serve parsed results.tsv as JSON."""
rows = parse_results_tsv(self.results_path)
payload = json.dumps({"experiments": rows}).encode()
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(payload)))
self.end_headers()
self.wfile.write(payload)

def _serve_status(self):
"""Serve a lightweight status endpoint."""
rows = parse_results_tsv(self.results_path)
kept = sum(1 for r in rows if r.get("status") == "keep")
discarded = sum(1 for r in rows if r.get("status") == "discard")
crashed = sum(1 for r in rows if r.get("status") == "crash")
metrics = [
float(r["metric"])
for r in rows
if r.get("metric") and r["metric"] != "ERR" and r.get("status") == "keep"
]
payload = json.dumps(
{
"total": len(rows),
"kept": kept,
"discarded": discarded,
"crashed": crashed,
"best_metric": max(metrics) if metrics else None,
"worst_metric": min(metrics) if metrics else None,
"results_modified": os.path.getmtime(self.results_path)
if self.results_path.exists()
else 0,
}
).encode()
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(payload)))
self.end_headers()
self.wfile.write(payload)

def log_message(self, format, *args):
"""Quiet logging — only errors."""
if args and "404" not in str(args[0]):
pass # suppress noisy access logs


def main():
parser = argparse.ArgumentParser(description="Autoresearch Dashboard")
parser.add_argument(
"--port",
"-p",
type=int,
default=8420,
help="Port to serve on (default: 8420)",
)
parser.add_argument(
"--results",
"-r",
type=str,
default="results.tsv",
help="Path to results.tsv (default: ./results.tsv)",
)
args = parser.parse_args()

results_path = Path(args.results).resolve()
DashboardHandler.results_path = results_path

# Serve from the autoresearch working directory for results.tsv access
os.chdir(results_path.parent)

server = HTTPServer(("0.0.0.0", args.port), DashboardHandler)
url = f"http://localhost:{args.port}"
print(f"Autoresearch Dashboard: {url}")
print(f"Watching: {results_path}")
print("Press Ctrl+C to stop")
try:
server.serve_forever()
except KeyboardInterrupt:
print("\nDashboard stopped.")
server.server_close()


if __name__ == "__main__":
main()
Loading