Autonomous task orchestration for Claude Code and Cowork. Decompose any objective into a verified, observable pipeline — no API key required.
autonomous-autonomy is a framework made of skills, commands, hooks, and structured files that teaches Claude how to break large objectives into small tasks, execute them through isolated subagents, verify outputs against specifications, and report progress — all within the Claude Code/Cowork ecosystem using your MAX subscription.
It draws on patterns from Stripe's Minions (blueprint pattern, context curation, 2-retry cap), the Ralph Loop (files-on-disk as shared state), and GSD/Spec Kit (Given/When/Then as first-class artifacts).
There's no TypeScript library, no npm install, no API key. Just markdown, JSON, and shell scripts that plug into Claude Code's native primitives.
The framework follows a 6-phase pipeline:
/aa:init → /aa:plan → /aa:run → /aa:verify → /aa:status → /aa:report
│ │ │ │ │ │
Create Decompose Execute Check Monitor Generate
state dir into DAG next stage against spec progress report
All orchestration state lives in an .aa/ directory as structured JSON and markdown. Claude reads the state, decides what to do, executes, and writes updated state. Every action is logged. You can stop and restart at any point.
Install permanently (recommended):
/plugin marketplace add MaxwellCalkin/autonomous-autonomy
/plugin install aa@autonomous-autonomy
Try it in a single session:
claude --plugin-dir https://github.com/MaxwellCalkin/autonomous-autonomy.gitOnce installed, all commands are available as /aa:init, /aa:plan, /aa:run, /aa:verify, /aa:status, /aa:report. The skill loads automatically when Claude detects a task that would benefit from orchestration.
/aa:init
Creates the .aa/ state directory, scans your project structure, and sets up logging.
/aa:plan Build a REST API with authentication and rate limiting
A planner subagent decomposes your objective into 5-15 tasks, each with a Given/When/Then specification, explicit dependencies, and estimated effort. Tasks are organized into parallel stages via topological sort. You review the plan before execution begins.
/aa:run
Executes the next ready stage. For each task, Claude curates context (only the files and prior results relevant to that task), spawns an executor subagent with a clean context window, and collects the output. Independent tasks within a stage run in parallel.
To run everything autonomously:
/aa:run all
/aa:verify
Checks completed tasks against their Given/When/Then specs. Runs the acceptance test for each task (the primary verification mechanism), plus automatic checks (files exist, code compiles, tests pass) and content verification (does the output actually satisfy the spec?). Failed tasks are retried up to twice, then flagged for human review.
/aa:status
Shows a dashboard with task states, stage progress, critical path, and estimated time remaining.
/aa:report
Generates a comprehensive markdown report with timeline, metrics, issues, and a complete artifact inventory.
All orchestration state lives in .aa/:
.aa/
├── objective.md # What you asked for
├── decomposition.json # Task DAG (tasks + dependencies)
├── execution-plan.json # Parallel stages, critical path
├── state.json # Task states, progress, metadata
├── tasks/{taskId}/
│ ├── spec.json # Given/When/Then specification
│ ├── acceptance-test.md # Acceptance test contract
│ ├── context.json # Curated context for the executor
│ ├── result.json # What the executor produced
│ └── verification.json # Verification decision
└── logs/
└── events.jsonl # Structured event stream
┌←←← (retry: retries available) ←←←┐
│ │
│ ┌← (crash, retries available) ←┐ │
↓ ↓ │ │
pending ──→ queued ──→ running ──→ verifying ──→ completed
│ │ │
↓ ↓ ↓ (retries exhausted)
blocked failed ←←←←←←←←←┘
(when resolved → queued)
Tasks move forward through the pipeline. When verification fails, the task is requeued for retry (up to twice — Stripe's empirical finding: a third attempt almost never helps). If the executor crashes, the task is also requeued with a synthetic verification record so the next attempt has failure context. Blocked tasks wait for their dependencies.
Each task runs in its own subagent via Claude's Agent tool. The subagent gets a fresh context window loaded with only the curated context for that task — no cross-task state leakage. This is the same isolation pattern Stripe uses with their devboxes.
Before a task executes, context is assembled deterministically (Stripe pattern):
- Task specification (the Given/When/Then contract)
- Project context (architecture, conventions)
- Source files (only those listed in the task's contextItems)
- Dependency results (outputs from prerequisite tasks)
- Recent history (last few git commits)
Same task always gets same context. Reproducible and auditable.
autonomous-autonomy/
├── .claude-plugin/
│ ├── plugin.json # Plugin manifest
│ └── marketplace.json # Self-hosted marketplace metadata
├── commands/
│ ├── init.md # Initialize state
│ ├── plan.md # Decompose objective
│ ├── run.md # Execute tasks
│ ├── verify.md # Verify outputs
│ ├── status.md # Progress dashboard
│ └── report.md # Generate report
├── skills/
│ └── autonomous-autonomy/
│ └── SKILL.md # Master orchestration skill
├── agents/
│ ├── planner.md # Decomposition agent
│ ├── executor.md # Task execution agent
│ ├── verifier.md # Verification agent
│ ├── researcher.md # Context gathering agent
│ └── plan-scorer.md # Plan quality scoring agent
├── hooks/
│ ├── hooks.json # Hook configuration
│ ├── log-event.sh # Event logging hook
│ └── validate-state.sh # State integrity hook
├── references/
│ ├── given-when-then.md # Spec format guide
│ ├── state-machine.md # State transition reference
│ ├── context-curation.md # Context assembly rules
│ └── error-recovery.md # Retry & failure patterns
├── templates/
│ ├── task-spec.json # Task template
│ ├── state.json # State template
│ └── execution-plan.json # Plan template
├── benchmark/
│ ├── validate.sh # Structural validation (88 checks)
│ ├── README.md # Benchmark documentation
│ └── test-objectives/ # Test objectives for plan scoring
├── LICENSE
├── CHANGELOG.md
└── README.md
Specs are contracts. Given/When/Then specifications are the source of truth. Never skip writing them.
Deterministic orchestration. The execution plan is computed once from the DAG. No LLM reasoning about what to do next — just read the plan.
Isolation per task. Each task gets its own subagent with curated context. No cross-task contamination.
Fail fast, retry bounded. 2 retries max. If it doesn't work, escalate to a human.
Observable everything. Every state change is logged to events.jsonl. Status is always available.
Resumable always. State on disk means you can stop and restart at any point without losing progress.
- Claude Code (CLI) with Pro or MAX subscription
- Claude Cowork (desktop) with Pro or MAX subscription
- No API key required
- No external dependencies
- Any project, any language, any domain