autonomous-autonomy

Autonomous task orchestration for Claude Code and Cowork. Decompose any objective into a verified, observable pipeline — no API key required.

What It Is

autonomous-autonomy is a framework made of skills, commands, hooks, and structured files that teaches Claude how to break large objectives into small tasks, execute them through isolated subagents, verify outputs against specifications, and report progress — all within the Claude Code/Cowork ecosystem using your MAX subscription.

It draws on patterns from Stripe's Minions (blueprint pattern, context curation, 2-retry cap), the Ralph Loop (files-on-disk as shared state), and GSD/Spec Kit (Given/When/Then as first-class artifacts).

There's no TypeScript library, no npm install, no API key. Just markdown, JSON, and shell scripts that plug into Claude Code's native primitives.

How It Works

The framework follows a 6-phase pipeline:

/aa:init  →  /aa:plan  →  /aa:run  →  /aa:verify  →  /aa:status  →  /aa:report
   │            │            │             │               │              │
Create       Decompose    Execute       Check          Monitor        Generate
state dir    into DAG     next stage    against spec   progress       report

All orchestration state lives in an .aa/ directory as structured JSON and markdown. Claude reads the state, decides what to do, executes, and writes updated state. Every action is logged. You can stop and restart at any point.

Installation

Install permanently (recommended):

/plugin marketplace add MaxwellCalkin/autonomous-autonomy
/plugin install aa@autonomous-autonomy

Try it in a single session:

claude --plugin-dir https://github.com/MaxwellCalkin/autonomous-autonomy.git

Once installed, all commands are available as /aa:init, /aa:plan, /aa:run, /aa:verify, /aa:status, /aa:report. The skill loads automatically when Claude detects a task that would benefit from orchestration.

Getting Started

1. Initialize

/aa:init

Creates the .aa/ state directory, scans your project structure, and sets up logging.

2. Plan

/aa:plan Build a REST API with authentication and rate limiting

A planner subagent decomposes your objective into 5-15 tasks, each with a Given/When/Then specification, explicit dependencies, and estimated effort. Tasks are organized into parallel stages via topological sort. You review the plan before execution begins.

3. Execute

/aa:run

Executes the next ready stage. For each task, Claude curates context (only the files and prior results relevant to that task), spawns an executor subagent with a clean context window, and collects the output. Independent tasks within a stage run in parallel.

To run everything autonomously:

/aa:run all

4. Verify

/aa:verify

Checks completed tasks against their Given/When/Then specs. Runs the acceptance test for each task (the primary verification mechanism), plus automatic checks (files exist, code compiles, tests pass) and content verification (does the output actually satisfy the spec?). Failed tasks are retried up to twice, then flagged for human review.

5. Monitor

/aa:status

Shows a dashboard with task states, stage progress, critical path, and estimated time remaining.

6. Report

/aa:report

Generates a comprehensive markdown report with timeline, metrics, issues, and a complete artifact inventory.

Architecture

Files-as-State

All orchestration state lives in .aa/:

.aa/
├── objective.md              # What you asked for
├── decomposition.json        # Task DAG (tasks + dependencies)
├── execution-plan.json       # Parallel stages, critical path
├── state.json                # Task states, progress, metadata
├── tasks/{taskId}/
│   ├── spec.json             # Given/When/Then specification
│   ├── acceptance-test.md    # Acceptance test contract
│   ├── context.json          # Curated context for the executor
│   ├── result.json           # What the executor produced
│   └── verification.json     # Verification decision
└── logs/
    └── events.jsonl          # Structured event stream

Task State Machine

                    ┌←←← (retry: retries available) ←←←┐
                    │                                   │
                    │  ┌← (crash, retries available) ←┐ │
                    ↓  ↓                              │ │
pending ──→ queued ──→ running ──→ verifying ──→ completed
  │                       │            │
  ↓                       ↓            ↓ (retries exhausted)
blocked               failed ←←←←←←←←←┘
  (when resolved → queued)

Tasks move forward through the pipeline. When verification fails, the task is requeued for retry (up to twice — Stripe's empirical finding: a third attempt almost never helps). If the executor crashes, the task is also requeued with a synthetic verification record so the next attempt has failure context. Blocked tasks wait for their dependencies.

Subagent Isolation

Each task runs in its own subagent via Claude's Agent tool. The subagent gets a fresh context window loaded with only the curated context for that task — no cross-task state leakage. This is the same isolation pattern Stripe uses with their devboxes.

Context Curation

Before a task executes, context is assembled deterministically (Stripe pattern):

Task specification (the Given/When/Then contract)
Project context (architecture, conventions)
Source files (only those listed in the task's contextItems)
Dependency results (outputs from prerequisite tasks)
Recent history (last few git commits)

Same task always gets same context. Reproducible and auditable.

Project Structure

autonomous-autonomy/
├── .claude-plugin/
│   ├── plugin.json                        # Plugin manifest
│   └── marketplace.json                   # Self-hosted marketplace metadata
├── commands/
│   ├── init.md                            # Initialize state
│   ├── plan.md                            # Decompose objective
│   ├── run.md                             # Execute tasks
│   ├── verify.md                          # Verify outputs
│   ├── status.md                          # Progress dashboard
│   └── report.md                          # Generate report
├── skills/
│   └── autonomous-autonomy/
│       └── SKILL.md                       # Master orchestration skill
├── agents/
│   ├── planner.md                         # Decomposition agent
│   ├── executor.md                        # Task execution agent
│   ├── verifier.md                        # Verification agent
│   ├── researcher.md                      # Context gathering agent
│   └── plan-scorer.md                     # Plan quality scoring agent
├── hooks/
│   ├── hooks.json                         # Hook configuration
│   ├── log-event.sh                       # Event logging hook
│   └── validate-state.sh                  # State integrity hook
├── references/
│   ├── given-when-then.md                 # Spec format guide
│   ├── state-machine.md                   # State transition reference
│   ├── context-curation.md                # Context assembly rules
│   └── error-recovery.md                  # Retry & failure patterns
├── templates/
│   ├── task-spec.json                     # Task template
│   ├── state.json                         # State template
│   └── execution-plan.json                # Plan template
├── benchmark/
│   ├── validate.sh                        # Structural validation (88 checks)
│   ├── README.md                          # Benchmark documentation
│   └── test-objectives/                   # Test objectives for plan scoring
├── LICENSE
├── CHANGELOG.md
└── README.md

Design Principles

Specs are contracts. Given/When/Then specifications are the source of truth. Never skip writing them.

Deterministic orchestration. The execution plan is computed once from the DAG. No LLM reasoning about what to do next — just read the plan.

Isolation per task. Each task gets its own subagent with curated context. No cross-task contamination.

Fail fast, retry bounded. 2 retries max. If it doesn't work, escalate to a human.

Observable everything. Every state change is logged to events.jsonl. Status is always available.

Resumable always. State on disk means you can stop and restart at any point without losing progress.

Works With

Claude Code (CLI) with Pro or MAX subscription
Claude Cowork (desktop) with Pro or MAX subscription
No API key required
No external dependencies
Any project, any language, any domain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autonomous-autonomy

What It Is

How It Works

Installation

Getting Started

1. Initialize

2. Plan

3. Execute

4. Verify

5. Monitor

6. Report

Architecture

Files-as-State

Task State Machine

Subagent Isolation

Context Curation

Project Structure

Design Principles

Works With

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude-plugin		.claude-plugin
agents		agents
benchmark		benchmark
commands		commands
docs		docs
hooks		hooks
references		references
skills/autonomous-autonomy		skills/autonomous-autonomy
templates		templates
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

autonomous-autonomy

What It Is

How It Works

Installation

Getting Started

1. Initialize

2. Plan

3. Execute

4. Verify

5. Monitor

6. Report

Architecture

Files-as-State

Task State Machine

Subagent Isolation

Context Curation

Project Structure

Design Principles

Works With

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages