System Architecture - Autonomous AI Team

Design Principles

This system is built on four core principles:

1. KISS (Keep It Simple, Stupid)

Simple solutions over complex ones
Direct file-based configuration
Minimal abstractions
Clear code paths

2. DRY (Don't Repeat Yourself)

Prompts defined once in markdown files
Configuration centralized in JSON
Reusable components
Shared utilities

3. Token Safety

Track every token used
Budget management and alerts
Caching to prevent redundant API calls
Cost optimization suggestions

4. Lean Approach

Minimal dependencies
Fast startup and execution
No unnecessary features
Production-ready from day 1

System Overview

┌─────────────────────────────────────────────────────────────┐
│                    CONFIGURATION LAYER                       │
│  ┌────────────────┬───────────────┬──────────────────────┐  │
│  │ config/prompts/│ config/*.json │ config/knowledge/    │  │
│  │ - manager.md   │ - agents.json │ - agent_guidelines.md│  │
│  │ - analyst.md   │ - tools.json  │ - best_practices.md  │  │
│  │ - ...          │ - ...         │ - ...                │  │
│  └────────────────┴───────────────┴──────────────────────┘  │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                      CORE LAYER                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ PromptLoader: Load & cache prompts from markdown     │   │
│  │ TokenManager: Track usage & enforce budgets          │   │
│  │ AgentBase: Base class with prompt + token integration│   │
│  │ ToolRegistry: Centralized tool management            │   │
│  └──────────────────────────────────────────────────────┘   │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                    AGENT LAYER                               │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Manager Agent ──┬─→ Analyst Agent                    │   │
│  │                 ├─→ Growth Hacker Agent              │   │
│  │                 ├─→ Sales Machine Agent              │   │
│  │                 ├─→ System Builder Agent             │   │
│  │                 └─→ Brand Builder Agent              │   │
│  └──────────────────────────────────────────────────────┘   │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                      API LAYER                               │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ FastAPI Application                                   │   │
│  │ - POST /api/v1/tasks                                  │   │
│  │ - GET /api/v1/agents                                  │   │
│  │ - GET /api/v1/usage (token tracking)                  │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Directory Structure

autonomous-ai-team/
├── config/                          # 📁 Configuration (KISS)
│   ├── prompts/                     # Agent prompts (markdown)
│   │   ├── manager.md               # Manager agent system prompt
│   │   ├── analyst.md               # Analyst agent system prompt
│   │   ├── growth_hacker.md         # Growth Hacker prompt
│   │   ├── sales_machine.md         # Sales Machine prompt
│   │   ├── system_builder.md        # System Builder prompt
│   │   └── brand_builder.md         # Brand Builder prompt
│   ├── knowledge/                   # Knowledge base (guidelines)
│   │   ├── agent_guidelines.md      # Best practices for all agents
│   │   └── ...                      # Additional knowledge files
│   ├── evaluation/                  # Test cases and benchmarks
│   │   └── test_cases.json          # Automated test scenarios
│   ├── schemas/                     # JSON schemas for validation
│   ├── agents.json                  # Agent metadata & configuration
│   └── tools.json                   # Tool definitions & metadata
│
├── src/
│   ├── core/                        # 🎯 Core Framework (DRY)
│   │   ├── agent_base.py            # Base agent class
│   │   ├── prompt_loader.py         # Loads prompts from markdown
│   │   ├── token_manager.py         # Token counting & budgets
│   │   ├── tools.py                 # Tool implementations
│   │   ├── config.py                # Settings management
│   │   └── logger.py                # Structured logging
│   │
│   ├── agents/                      # 🤖 Specialized Agents
│   │   ├── manager.py               # Manager (orchestrator)
│   │   ├── analyst.py               # Analyst specialist
│   │   ├── growth_hacker.py         # Growth Hacker specialist
│   │   ├── sales_machine.py         # Sales Machine specialist
│   │   ├── system_builder.py        # System Builder specialist
│   │   └── brand_builder.py         # Brand Builder specialist
│   │
│   ├── api/                         # 🌐 REST API
│   │   ├── routes.py                # API endpoints
│   │   └── models.py                # Request/response models
│   │
│   ├── db/                          # 💾 Data Layer
│   │   ├── models.py                # Database models
│   │   └── crud.py                  # CRUD operations
│   │
│   └── evaluation/                  # 🧪 Testing & Evaluation
│       ├── metrics.py               # Custom evaluation metrics
│       └── runner.py                # Test runner
│
├── docker/                          # 🐳 Deployment
│   ├── Dockerfile                   # Container image
│   └── docker-compose.yml           # Multi-container setup
│
├── scripts/                         # 🛠️ Utilities
│   ├── setup.sh                     # Quick setup script
│   ├── example_usage.py             # Usage examples
│   └── evaluate.sh                  # Run evaluation tests
│
├── tests/                           # ✅ Tests
│   └── integration/                 # Integration tests
│
├── .env.example                     # Environment template
├── requirements.txt                 # Python dependencies
├── main.py                          # Application entry point
├── README.md                        # User documentation
├── QUICKSTART.md                    # Quick start guide
└── ARCHITECTURE.md                  # This file

Key Components

1. Configuration Layer (KISS + DRY)

Prompt Files (`config/prompts/*.md`)

Why Markdown: Human-readable, version-controllable, easy to edit
Caching: LRU cache prevents re-loading (saves tokens)
Structure: Each prompt has clear sections (Identity, Methodology, Output Format)
Validation: PromptLoader validates completeness

Configuration Files (`config/*.json`)

agents.json: Agent metadata (model, temperature, costs, tools)
tools.json: Tool definitions, rate limits, costs
Separation: Configuration separate from code (12-factor app)

Knowledge Base (`config/knowledge/*.md`)

Guidelines: Best practices, decision frameworks
Shared: Available to all agents
Extensible: Easy to add new knowledge

2. Core Framework

PromptLoader (`src/core/prompt_loader.py`)

Purpose: Load prompts from markdown files with caching

Key Features:

LRU cache (maxsize=32) - prevents redundant loads
Validation of prompt completeness
Loads agent configs and tool configs
Cache statistics for monitoring

Usage:

from src.core.prompt_loader import load_prompt

prompt = load_prompt("analyst")  # Cached automatically

TokenManager (`src/core/token_manager.py`)

Purpose: Enforce Token Safety principle

Key Features:

Estimate tokens before API calls (using tiktoken)
Track actual usage (per agent, per day)
Budget enforcement (daily limits)
Cost optimization suggestions
Multi-model pricing support

Usage:

from src.core.token_manager import get_token_manager

tm = get_token_manager()

# Before API call
allowed, reason = tm.should_allow_call(input_text)

# After API call
tm.record_usage("analyst", input_tokens=1500, output_tokens=2000)

# Get stats
summary = tm.get_summary()

AgentBase (`src/core/agent_base.py`)

Purpose: Base class for all agents with built-in token management

Key Features:

Loads prompt from PromptLoader (DRY)
Tracks tokens automatically (Token Safety)
Tool use via ToolRegistry (KISS)
Conversation history management
Error handling and retries

Lifecycle:

1. Initialize → Load prompt from file (cached)
2. Run task → Check budget before API call
3. Claude API → Execute with prompt + tools
4. Tool use → Handle tool calls
5. Record usage → Track tokens and cost
6. Return result → With metadata

ToolRegistry (`src/core/tools.py`)

Purpose: Centralized tool management (DRY)

Key Features:

Register tools once, use everywhere
Tool metadata from config/tools.json
Rate limiting per tool
Cost tracking per tool
Enable/disable tools dynamically

3. Agent Layer

Each agent:

Inherits from AgentBase or SpecialistAgent
Loads its prompt from config/prompts/{agent_id}.md
Uses tools from ToolRegistry
Tracks tokens via TokenManager
Returns structured results

Agent Configuration (from agents.json):

{
  "id": "analyst",
  "model": "claude-sonnet-4-5",
  "temperature": 0.5,
  "max_tokens": 4096,
  "max_iterations": 10,
  "capabilities": [...],
  "available_tools": [...],
  "quality_metrics": {...}
}

4. API Layer

REST API (`src/api/routes.py`)

Endpoints:

Endpoint	Method	Purpose
`/`	GET	Health check
`/api/v1/tasks`	POST	Execute task (route to agents)
`/api/v1/agents`	GET	List agents and capabilities
`/api/v1/usage`	GET	Token usage and costs
`/api/v1/config`	GET	System configuration

Task Execution Flow:

User Request
    ↓
POST /api/v1/tasks
    ↓
Determine Agent (manager or specific)
    ↓
Check Budget (TokenManager)
    ↓
Execute Agent.run()
    ↓
Track Tokens
    ↓
Return Response

Token Safety Implementation

Budget Enforcement

# config/.env
MAX_COST_PER_DAY=50.0

Flow:

Every API call checks budget: tm.check_budget_available()
If 80% used → Warning logged
If 100% used → Calls rejected
Per-agent usage tracked for analysis

Token Estimation

Before API call:

estimated_tokens = tm.estimate_tokens(input_text)
estimated_cost = tm.estimate_call_cost(input_text, expected_output=1000)

if estimated_cost > remaining_budget:
    # Reject call or summarize context

After API call:

tm.record_usage(
    agent_id="analyst",
    input_tokens=response.usage.input_tokens,
    output_tokens=response.usage.output_tokens
)

Cost Optimization

Automatic Suggestions:

High tokens per call → Break into smaller tasks
Repeated searches → Implement caching
Large context → Summarize history

Manual Optimization:

Use prompt caching (LRU cache)
Batch similar requests
Use cheaper models for simple tasks (future: Haiku)

Data Flow Examples

Example 1: Simple Task (Analyst)

User: "Analyze the SaaS market"
    ↓
[API] POST /api/v1/tasks { task: "...", agent: "analyst" }
    ↓
[PromptLoader] Load analyst.md (cached if exists)
    ↓
[TokenManager] Check budget → OK
    ↓
[AnalystAgent] Claude API call with:
    - System prompt (from analyst.md)
    - Tools: web_search, extract_data, store_context
    - User message
    ↓
[Claude] Uses web_search tool → Results returned
    ↓
[AnalystAgent] Continues until complete (max 10 iterations)
    ↓
[TokenManager] Record usage: 2,500 input + 1,800 output = $0.035
    ↓
[API] Return result with metadata

Cost: ~$0.03-0.05 per analysis

Example 2: Complex Multi-Agent Task

User: "Create a complete go-to-market strategy"
    ↓
[API] Routes to Manager Agent
    ↓
[Manager] Analyzes request → Needs Analyst + Growth Hacker + Sales Machine
    ↓
[Manager] Calls Analyst: "Find market opportunities"
    ↓
[Analyst] Executes, returns opportunities (cost: $0.04)
    ↓
[Manager] Calls Growth Hacker: "Design strategy for [opportunities]"
    ↓
[GrowthHacker] Executes, returns strategy (cost: $0.06)
    ↓
[Manager] Calls Sales Machine: "Create landing page for [product]"
    ↓
[SalesMachine] Executes, returns copy (cost: $0.05)
    ↓
[Manager] Synthesizes all outputs into cohesive plan
    ↓
[API] Returns integrated strategy

Total Cost: ~$0.20-0.30 for complex orchestration

Configuration Management

Agent Configuration (`config/agents.json`)

Purpose: Centralize agent metadata

Benefits:

Change model per agent without code changes
Adjust temperature for creativity
Define cost per agent
Enable/disable capabilities

Example:

{
  "agents": {
    "analyst": {
      "model": "claude-sonnet-4-5",
      "temperature": 0.5,  # Lower = more factual
      "quality_metrics": {
        "min_data_sources": 3,
        "require_urls": true
      }
    },
    "sales_machine": {
      "model": "claude-sonnet-4-5",
      "temperature": 0.9,  # Higher = more creative
      "quality_metrics": {
        "require_cta": true,
        "require_ab_variants": true
      }
    }
  }
}

Tool Configuration (`config/tools.json`)

Purpose: Define tool behavior and limits

Benefits:

Enable/disable tools without code
Set rate limits per tool
Track costs per tool
Configure API keys

Example:

{
  "tools": {
    "web_search": {
      "enabled": true,
      "rate_limit_per_minute": 10,
      "cost_per_call": 0.001,
      "requires_api_key": true,
      "timeout_seconds": 30
    }
  }
}

Evaluation Framework

Test Cases (`config/evaluation/test_cases.json`)

Purpose: Automated quality assurance

Structure:

{
  "test_cases": [
    {
      "id": "analyst_001",
      "agent": "analyst",
      "task": "...",
      "expected_outputs": {
        "opportunities_count": 3,
        "has_data_sources": true,
        "min_data_sources": 3
      },
      "quality_checks": [...],
      "max_cost_usd": 0.50,
      "max_duration_seconds": 120
    }
  ]
}

Run Tests:

python -m src.evaluation.runner

Benefits:

Catch regressions
Validate quality standards
Monitor performance
Track costs

Performance Benchmarks

Target Metrics (from `config/evaluation/test_cases.json`)

Agent	Response Time	Iterations	Token Budget	Cost
Manager	<30s	3-5	6,000	$0.05-0.15
Analyst	<60s	5-8	8,000	$0.03-0.08
Growth Hacker	<45s	4-6	7,000	$0.04-0.10
Sales Machine	<30s	2-4	5,000	$0.03-0.06
System Builder	<60s	5-8	8,000	$0.04-0.10
Brand Builder	<45s	3-5	6,000	$0.03-0.08

Daily Budget Allocation

Recommended ($50/day budget):

Manager: $15/day (30% - orchestration heavy)
Analyst: $10/day (20% - data-intensive)
Growth Hacker: $8/day (16%)
Sales Machine: $6/day (12%)
System Builder: $6/day (12%)
Brand Builder: $5/day (10%)

Deployment

Development

python main.py

Production (Docker)

cd docker
docker-compose up -d

Environment Variables

See .env.example for required configuration.

Monitoring & Observability

Structured Logging

All operations logged with structlog:

{
  "event": "token_usage_recorded",
  "agent_id": "analyst",
  "input_tokens": 1500,
  "output_tokens": 2000,
  "cost_usd": 0.0345,
  "timestamp": "2025-01-15T10:30:00Z"
}

Token Usage Endpoint

GET /api/v1/usage

Returns:

{
  "daily_usage": {
    "total_cost_usd": 12.45,
    "budget_remaining_usd": 37.55,
    "budget_used_percentage": 24.9
  },
  "agent_usage": {...},
  "optimization_suggestions": [...]
}

Extending the System

Adding a New Agent

Create prompt file: config/prompts/my_agent.md
Add to agents.json: Define metadata
Create agent class: src/agents/my_agent.py
Register in Manager: Add tool for calling agent
Test: Add test case in test_cases.json

Adding a New Tool

Define in tools.json: Metadata, rate limits, cost
Implement in tools.py: Create Tool class
Register: tool_registry.register(MyTool())
Test: Verify tool works with agents

Security & Compliance

API Key Management

Keys in .env (never committed)
Accessed via settings (type-safe)
Validated at startup

Budget Controls

Daily spending limits
Per-agent tracking
Automatic alerts at 80% budget
Stop execution at 95% budget

Data Handling

No PII stored in logs
Context storage (Redis) is ephemeral
All communications over HTTPS in production

Troubleshooting

High Token Usage

# Get optimization suggestions
tm = get_token_manager()
suggestions = tm.get_cost_optimization_suggestions()

Prompt Not Loading

# Check available prompts
from src.core.prompt_loader import get_prompt_loader
loader = get_prompt_loader()
prompts = loader.list_available_prompts()

Budget Exceeded

# Check usage
GET /api/v1/usage

# Adjust budget in .env
MAX_COST_PER_DAY=100.0

Future Enhancements

Phase 2

Add Claude Haiku support for simple tasks (5x cheaper)
Implement prompt caching API (Anthropic native)
Add PostgreSQL for persistent storage
Build web dashboard for monitoring

Phase 3

Multi-model support (Gemini for specific tasks)
Advanced RAG for knowledge base
A/B testing framework for prompts
Automated prompt optimization

Conclusion

This architecture implements a bulletproof backbone for autonomous multi-agent AI:

✅ KISS: Simple file-based configuration, clear code structure ✅ DRY: Prompts and configs defined once, reused everywhere ✅ Token Safety: Comprehensive tracking, budgets, optimization ✅ Lean: Production-ready, minimal overhead, fast execution

Result: A maintainable, cost-effective, production-ready system that scales from MVP to serving 10,000 customers.

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

System Architecture - Autonomous AI Team

Design Principles

1. KISS (Keep It Simple, Stupid)

2. DRY (Don't Repeat Yourself)

3. Token Safety

4. Lean Approach

System Overview

Directory Structure

Key Components

1. Configuration Layer (KISS + DRY)

Prompt Files (config/prompts/*.md)

Configuration Files (config/*.json)

Knowledge Base (config/knowledge/*.md)

2. Core Framework

PromptLoader (src/core/prompt_loader.py)

TokenManager (src/core/token_manager.py)

AgentBase (src/core/agent_base.py)

ToolRegistry (src/core/tools.py)

3. Agent Layer

4. API Layer

REST API (src/api/routes.py)

Token Safety Implementation

Budget Enforcement

Token Estimation

Cost Optimization

Data Flow Examples

Example 1: Simple Task (Analyst)

Example 2: Complex Multi-Agent Task

Configuration Management

Agent Configuration (config/agents.json)

Tool Configuration (config/tools.json)

Evaluation Framework

Test Cases (config/evaluation/test_cases.json)

Performance Benchmarks

Target Metrics (from config/evaluation/test_cases.json)

Daily Budget Allocation

Deployment

Development

Production (Docker)

Environment Variables

Monitoring & Observability

Structured Logging

Token Usage Endpoint

Extending the System

Adding a New Agent

Adding a New Tool

Security & Compliance

API Key Management

Budget Controls

Data Handling

Troubleshooting

High Token Usage

Prompt Not Loading

Budget Exceeded

Future Enhancements

Phase 2

Phase 3

Conclusion

Prompt Files (`config/prompts/*.md`)

Configuration Files (`config/*.json`)

Knowledge Base (`config/knowledge/*.md`)

PromptLoader (`src/core/prompt_loader.py`)

TokenManager (`src/core/token_manager.py`)

AgentBase (`src/core/agent_base.py`)

ToolRegistry (`src/core/tools.py`)

REST API (`src/api/routes.py`)

Agent Configuration (`config/agents.json`)

Tool Configuration (`config/tools.json`)

Test Cases (`config/evaluation/test_cases.json`)

Target Metrics (from `config/evaluation/test_cases.json`)