Skip to content

MaxwellCalkin/autonomous-autonomy

Repository files navigation

autonomous-autonomy

Autonomous task orchestration for Claude Code and Cowork. Decompose any objective into a verified, observable pipeline — no API key required.

What It Is

autonomous-autonomy is a framework made of skills, commands, hooks, and structured files that teaches Claude how to break large objectives into small tasks, execute them through isolated subagents, verify outputs against specifications, and report progress — all within the Claude Code/Cowork ecosystem using your MAX subscription.

It draws on patterns from Stripe's Minions (blueprint pattern, context curation, 2-retry cap), the Ralph Loop (files-on-disk as shared state), and GSD/Spec Kit (Given/When/Then as first-class artifacts).

There's no TypeScript library, no npm install, no API key. Just markdown, JSON, and shell scripts that plug into Claude Code's native primitives.

How It Works

The framework follows a 6-phase pipeline:

/aa:init  →  /aa:plan  →  /aa:run  →  /aa:verify  →  /aa:status  →  /aa:report
   │            │            │             │               │              │
Create       Decompose    Execute       Check          Monitor        Generate
state dir    into DAG     next stage    against spec   progress       report

All orchestration state lives in an .aa/ directory as structured JSON and markdown. Claude reads the state, decides what to do, executes, and writes updated state. Every action is logged. You can stop and restart at any point.

Installation

Install permanently (recommended):

/plugin marketplace add MaxwellCalkin/autonomous-autonomy
/plugin install aa@autonomous-autonomy

Try it in a single session:

claude --plugin-dir https://github.com/MaxwellCalkin/autonomous-autonomy.git

Once installed, all commands are available as /aa:init, /aa:plan, /aa:run, /aa:verify, /aa:status, /aa:report. The skill loads automatically when Claude detects a task that would benefit from orchestration.

Getting Started

1. Initialize

/aa:init

Creates the .aa/ state directory, scans your project structure, and sets up logging.

2. Plan

/aa:plan Build a REST API with authentication and rate limiting

A planner subagent decomposes your objective into 5-15 tasks, each with a Given/When/Then specification, explicit dependencies, and estimated effort. Tasks are organized into parallel stages via topological sort. You review the plan before execution begins.

3. Execute

/aa:run

Executes the next ready stage. For each task, Claude curates context (only the files and prior results relevant to that task), spawns an executor subagent with a clean context window, and collects the output. Independent tasks within a stage run in parallel.

To run everything autonomously:

/aa:run all

4. Verify

/aa:verify

Checks completed tasks against their Given/When/Then specs. Runs the acceptance test for each task (the primary verification mechanism), plus automatic checks (files exist, code compiles, tests pass) and content verification (does the output actually satisfy the spec?). Failed tasks are retried up to twice, then flagged for human review.

5. Monitor

/aa:status

Shows a dashboard with task states, stage progress, critical path, and estimated time remaining.

6. Report

/aa:report

Generates a comprehensive markdown report with timeline, metrics, issues, and a complete artifact inventory.

Architecture

Files-as-State

All orchestration state lives in .aa/:

.aa/
├── objective.md              # What you asked for
├── decomposition.json        # Task DAG (tasks + dependencies)
├── execution-plan.json       # Parallel stages, critical path
├── state.json                # Task states, progress, metadata
├── tasks/{taskId}/
│   ├── spec.json             # Given/When/Then specification
│   ├── acceptance-test.md    # Acceptance test contract
│   ├── context.json          # Curated context for the executor
│   ├── result.json           # What the executor produced
│   └── verification.json     # Verification decision
└── logs/
    └── events.jsonl          # Structured event stream

Task State Machine

                    ┌←←← (retry: retries available) ←←←┐
                    │                                   │
                    │  ┌← (crash, retries available) ←┐ │
                    ↓  ↓                              │ │
pending ──→ queued ──→ running ──→ verifying ──→ completed
  │                       │            │
  ↓                       ↓            ↓ (retries exhausted)
blocked               failed ←←←←←←←←←┘
  (when resolved → queued)

Tasks move forward through the pipeline. When verification fails, the task is requeued for retry (up to twice — Stripe's empirical finding: a third attempt almost never helps). If the executor crashes, the task is also requeued with a synthetic verification record so the next attempt has failure context. Blocked tasks wait for their dependencies.

Subagent Isolation

Each task runs in its own subagent via Claude's Agent tool. The subagent gets a fresh context window loaded with only the curated context for that task — no cross-task state leakage. This is the same isolation pattern Stripe uses with their devboxes.

Context Curation

Before a task executes, context is assembled deterministically (Stripe pattern):

  1. Task specification (the Given/When/Then contract)
  2. Project context (architecture, conventions)
  3. Source files (only those listed in the task's contextItems)
  4. Dependency results (outputs from prerequisite tasks)
  5. Recent history (last few git commits)

Same task always gets same context. Reproducible and auditable.

Project Structure

autonomous-autonomy/
├── .claude-plugin/
│   ├── plugin.json                        # Plugin manifest
│   └── marketplace.json                   # Self-hosted marketplace metadata
├── commands/
│   ├── init.md                            # Initialize state
│   ├── plan.md                            # Decompose objective
│   ├── run.md                             # Execute tasks
│   ├── verify.md                          # Verify outputs
│   ├── status.md                          # Progress dashboard
│   └── report.md                          # Generate report
├── skills/
│   └── autonomous-autonomy/
│       └── SKILL.md                       # Master orchestration skill
├── agents/
│   ├── planner.md                         # Decomposition agent
│   ├── executor.md                        # Task execution agent
│   ├── verifier.md                        # Verification agent
│   ├── researcher.md                      # Context gathering agent
│   └── plan-scorer.md                     # Plan quality scoring agent
├── hooks/
│   ├── hooks.json                         # Hook configuration
│   ├── log-event.sh                       # Event logging hook
│   └── validate-state.sh                  # State integrity hook
├── references/
│   ├── given-when-then.md                 # Spec format guide
│   ├── state-machine.md                   # State transition reference
│   ├── context-curation.md                # Context assembly rules
│   └── error-recovery.md                  # Retry & failure patterns
├── templates/
│   ├── task-spec.json                     # Task template
│   ├── state.json                         # State template
│   └── execution-plan.json                # Plan template
├── benchmark/
│   ├── validate.sh                        # Structural validation (88 checks)
│   ├── README.md                          # Benchmark documentation
│   └── test-objectives/                   # Test objectives for plan scoring
├── LICENSE
├── CHANGELOG.md
└── README.md

Design Principles

Specs are contracts. Given/When/Then specifications are the source of truth. Never skip writing them.

Deterministic orchestration. The execution plan is computed once from the DAG. No LLM reasoning about what to do next — just read the plan.

Isolation per task. Each task gets its own subagent with curated context. No cross-task contamination.

Fail fast, retry bounded. 2 retries max. If it doesn't work, escalate to a human.

Observable everything. Every state change is logged to events.jsonl. Status is always available.

Resumable always. State on disk means you can stop and restart at any point without losing progress.

Works With

  • Claude Code (CLI) with Pro or MAX subscription
  • Claude Cowork (desktop) with Pro or MAX subscription
  • No API key required
  • No external dependencies
  • Any project, any language, any domain

About

Autonomous task orchestration plugin for Claude Code & Cowork. No API key required.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors