Autonomous product builder for Claude Code. Takes a seed idea and builds a production-ready product through iterative agent teams with three-layer verification.
Also handles improving, hardening, and shipping existing codebases.
# Project-level (recommended)
cp skill.md /path/to/your/project/.claude/skills/autopilot/skill.md
# Global (available in all projects)
cp skill.md ~/.claude/skills/autopilot/skill.mdRequires Claude Code CLI or desktop app.
# Build a new product from a seed idea
/autopilot a SaaS that helps restaurants manage reservations
# Improve/harden an existing project
/autopilot fix the 12 issues from the security audit
# Resume a multi-session build
/autopilot
Autopilot auto-detects the mode:
| Mode | Trigger | What happens |
|---|---|---|
| BUILD | New idea, no existing code | Research, vision, architecture, then iterative sprints until deployed |
| IMPROVE | Existing project + task list | Triage, verify, build, test across parallel agent teams |
| RESUME | PROGRESS.md exists | Picks up from the exact sprint where the last session stopped |
BUILD MODE:
Research (10 parallel agents) → Product Vision → Architecture Plan
→ CONVERGENCE ENGINE (10-30+ sprints):
Assess → Plan → Verify every item (Phase 4) → Build (agent teams)
→ Layer 1: builder self-check
→ Layer 2: independent agent review per builder
→ Layer 3: cross-cutting + dimensional audit
→ Test → Fix → Commit + update docs → Re-assess
→ Quality Sweep → Polish (10-30 landing page iterations) → Deploy
IMPROVE MODE:
Read CLAUDE.md → Triage items → Verify every item (Phase 4)
→ Sprint loop with same three-layer verification → Done
Every item gets an independent agent review before implementation. Not just bugs from audits -- features, config changes, infrastructure, everything. The agent's only job is to determine if the item is real, a false positive, or needs a different approach.
In practice, ~50% of items get adjusted or dropped. Even items from a carefully curated human task list. The cost is ~$0.50 per sprint. The savings from not implementing wrong things are 10-100x that.
Builders self-check (Layer 1), but they always report "all clean." So a separate agent deep-reviews each builder's changes (Layer 2). Then cross-cutting agents check all changes together through specific lenses: security, reliability, integration, UX (Layer 3).
This is where bugs like "the Stripe Charge API has no subscription field" get caught -- the builder wrote the code confidently, but an independent reviewer traced the actual API shape.
Products aren't built in one pass. The skill loops through assess-build-verify-fix cycles, climbing a feature depth ladder:
Level 1: Happy path works
Level 3: Error handling + edge cases
Level 5: Mobile responsive + security
Level 7: Tests
Level 9: Performance + caching
Level 10: Monitoring + observability
Anti-premature-completion rules prevent declaring victory early. Innovation rounds every 3-5 sprints brainstorm new features from competitor research and adjacent product analysis.
Real products take multiple sessions. PROGRESS.md tracks the exact sprint, feature depth per module, and innovation backlog. Any new session resumes exactly where the last one stopped. No work is lost to context limits.
Per sprint, Autopilot typically coordinates:
- 2-4 builder agents with exclusive file ownership (no merge conflicts)
- 1 verification agent per builder (Layer 2)
- 3-5 dimensional audit agents (Layer 3)
- Assessment and synthesis agents between sprints
- Innovation brainstorming agents every 3-5 sprints
- Documentation agents after every commit
All agents use Opus. The orchestrator delegates all code -- it makes decisions, not edits.
The skill includes opinionated defaults for tech stack, sprint size, and verification depth. All are overridable. The things you should not change:
- The universal verification gate (Phase 4 on every item)
- Three-layer verification architecture
- Documentation gates after every sprint
- The rule that the orchestrator never writes code directly
- UI/visual design still needs human taste -- the skill builds functional UIs but not award-winning ones
- Complex architectural decisions benefit from human input at the vision stage
- Full product builds take 3-10 sessions across multiple days
- Context window limits mean very large codebases need careful state management via PROGRESS.md
Built through iterative use on real production systems. Each process innovation (the verification gate, the three-layer architecture, the anti-premature-completion rules) was added after observing a specific failure mode. The skill is itself a product of the convergence methodology it describes.
MIT