Skip to content

Production system for coordinating multiple AI agents on shared codebase using stigmergy-based coordination. Built with Claude API.

License

Notifications You must be signed in to change notification settings

KeepALifeUS/multi-agent-orchestration

Repository files navigation

Multi-Agent Autonomous Development System

A production system where 4 AI agents collaborate on software development without direct communication, coordinating through stigmergy (indirect communication via shared environment).

Built with Claude API License: MIT

Overview

This system enables multiple AI agents to work autonomously on a shared codebase. Instead of complex message-passing protocols, agents coordinate through stigmergy - the same principle that allows ant colonies to build complex structures without central coordination.

Key Achievement: 80% reduction in token usage through context optimization and knowledge caching.

                    ┌─────────────────────────────────────┐
                    │         SHARED ENVIRONMENT          │
                    │            (Git Repo)               │
                    └─────────────────────────────────────┘
                                     │
        ┌────────────────────────────┼────────────────────────────┐
        │                            │                            │
        ▼                            ▼                            ▼
  ┌──────────┐               ┌──────────────┐              ┌───────────┐
  │ THINKER  │               │   BUILDERS   │              │ GUARDIAN  │
  │ Architect│               │  UI & DDD    │              │ Reviewer  │
  └──────────┘               └──────────────┘              └───────────┘
       │                            │                            │
       │ Creates                    │ Claims &                   │ Reviews &
       │ Tasks                      │ Implements                 │ Approves
       ▼                            ▼                            ▼
  ┌──────────┐               ┌──────────────┐              ┌───────────┐
  │ queue/   │ ──────────▶   │  active/     │ ──────────▶  │ pending/  │
  │ tasks    │               │  work        │              │ reviews   │
  └──────────┘               └──────────────┘              └───────────┘

Agent Types

Agent Role Responsibility
THINKER Architect Creates tasks, analyzes stuck items, runs self-improvement cycles
BUILDER-UI Frontend React components, TypeScript, styling, browser testing
BUILDER-DDD Backend Domain logic, API routes, database services
GUARDIAN Reviewer Code review, quality gates, security checks

Core Mechanisms

1. Stigmergy-Based Coordination

No direct agent-to-agent communication. All coordination happens through file changes:

  • Tasks are created in queue.json, claimed by moving to active.json
  • Reviews are submitted to pending/, approved by moving to approved/
  • Knowledge accumulates in shared patterns.jsonl and lessons.jsonl
  • Events propagate through timestamped log files

2. Distributed Task Claiming

1. Agent reads tasks/queue.json
2. Finds unclaimed task matching their skills
3. Moves task to tasks/active.json with their ID
4. Commits and pushes immediately
5. If push fails (conflict) → another agent claimed → retry

Git's built-in conflict detection acts as a distributed mutex.

3. Self-Healing Mechanisms

Scenario Recovery
Agent crash 4-hour task timeout, auto-release
Stale lock 2-hour expiry, can override
Stuck task (3+ rejections) THINKER analyzes, decomposes or escalates
Review overflow (>10 pending) Alert triggered
Git conflict Auto-rebase with exponential backoff

4. Self-Improvement Cycle

Every 24 hours:

  1. Collect all rejections
  2. Group by pattern
  3. If pattern occurs 3+ times → draft prompt improvement
  4. Apply to agent, track metrics
  5. Evaluate after 24h

Project Structure

├── config/
│   └── system.json         # Agent definitions, safety rules, timing
├── docs/
│   ├── COORDINATION.md     # Full protocol specification
│   └── SELF-HEALING.md     # Recovery mechanisms
├── examples/
│   ├── task.json           # Task structure
│   ├── lock.json           # Resource locking
│   └── event.jsonl         # Event propagation
├── knowledge/
│   ├── patterns.jsonl      # Validated code patterns
│   └── lessons.jsonl       # Recorded mistakes and fixes
└── prompts/
    └── thinker.md          # Agent work instructions

Knowledge Base

The system maintains a self-improving knowledge base:

Patterns - Reusable code solutions:

{
  "id": "P-005",
  "name": "Tenant Isolation Pattern",
  "description": "Always include tenant_id in WHERE clauses",
  "code": "SELECT * FROM jobs WHERE id = $1 AND tenant_id = $2",
  "confidence": 1.0,
  "uses": 25
}

Lessons - Recorded mistakes:

{
  "id": "L-002",
  "trigger": "database query, multi-tenant lookup",
  "mistake": "Missing tenant_id in WHERE clause",
  "fix": "Always add AND tenant_id = $N to every query",
  "success_rate": 1.0
}

Results

  • 80% token reduction through incremental context loading and knowledge caching
  • Zero conflicts with proper locking protocol
  • Autonomous operation for extended periods
  • Self-improving through pattern recognition

Architecture Decisions

Why Stigmergy?

  1. Scalability - Adding agents doesn't require protocol changes
  2. Resilience - Agent crashes don't break coordination
  3. Simplicity - No complex message routing
  4. Auditability - All state in files, git history tracks everything

Why Git as State Store?

  1. Built-in conflict detection via push failures
  2. Full history of all state changes
  3. Distributed - agents can work offline
  4. Human-readable state files

Tech Stack

  • Coordination: Git + JSON files
  • Agent Runtime: Claude API (Anthropic)
  • Target Project: TypeScript, React, Node.js, PostgreSQL

Related Work

This approach draws from:

  • Swarm intelligence (ant colony optimization)
  • Distributed systems consensus algorithms
  • Event sourcing patterns
  • GitOps principles

About

Designed and implemented by Vladyslav Shapovalov as part of building an AI-powered field service platform. The multi-agent approach enables continuous development with minimal human oversight.

License

MIT License - see LICENSE for details.

Support

For questions and support, please open an issue on GitHub.

About

Production system for coordinating multiple AI agents on shared codebase using stigmergy-based coordination. Built with Claude API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •