This system is built on four core principles:
- Simple solutions over complex ones
- Direct file-based configuration
- Minimal abstractions
- Clear code paths
- Prompts defined once in markdown files
- Configuration centralized in JSON
- Reusable components
- Shared utilities
- Track every token used
- Budget management and alerts
- Caching to prevent redundant API calls
- Cost optimization suggestions
- Minimal dependencies
- Fast startup and execution
- No unnecessary features
- Production-ready from day 1
┌─────────────────────────────────────────────────────────────┐
│ CONFIGURATION LAYER │
│ ┌────────────────┬───────────────┬──────────────────────┐ │
│ │ config/prompts/│ config/*.json │ config/knowledge/ │ │
│ │ - manager.md │ - agents.json │ - agent_guidelines.md│ │
│ │ - analyst.md │ - tools.json │ - best_practices.md │ │
│ │ - ... │ - ... │ - ... │ │
│ └────────────────┴───────────────┴──────────────────────┘ │
└───────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────────┐
│ CORE LAYER │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ PromptLoader: Load & cache prompts from markdown │ │
│ │ TokenManager: Track usage & enforce budgets │ │
│ │ AgentBase: Base class with prompt + token integration│ │
│ │ ToolRegistry: Centralized tool management │ │
│ └──────────────────────────────────────────────────────┘ │
└───────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────────┐
│ AGENT LAYER │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Manager Agent ──┬─→ Analyst Agent │ │
│ │ ├─→ Growth Hacker Agent │ │
│ │ ├─→ Sales Machine Agent │ │
│ │ ├─→ System Builder Agent │ │
│ │ └─→ Brand Builder Agent │ │
│ └──────────────────────────────────────────────────────┘ │
└───────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────────┐
│ API LAYER │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ FastAPI Application │ │
│ │ - POST /api/v1/tasks │ │
│ │ - GET /api/v1/agents │ │
│ │ - GET /api/v1/usage (token tracking) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
autonomous-ai-team/
├── config/ # 📁 Configuration (KISS)
│ ├── prompts/ # Agent prompts (markdown)
│ │ ├── manager.md # Manager agent system prompt
│ │ ├── analyst.md # Analyst agent system prompt
│ │ ├── growth_hacker.md # Growth Hacker prompt
│ │ ├── sales_machine.md # Sales Machine prompt
│ │ ├── system_builder.md # System Builder prompt
│ │ └── brand_builder.md # Brand Builder prompt
│ ├── knowledge/ # Knowledge base (guidelines)
│ │ ├── agent_guidelines.md # Best practices for all agents
│ │ └── ... # Additional knowledge files
│ ├── evaluation/ # Test cases and benchmarks
│ │ └── test_cases.json # Automated test scenarios
│ ├── schemas/ # JSON schemas for validation
│ ├── agents.json # Agent metadata & configuration
│ └── tools.json # Tool definitions & metadata
│
├── src/
│ ├── core/ # 🎯 Core Framework (DRY)
│ │ ├── agent_base.py # Base agent class
│ │ ├── prompt_loader.py # Loads prompts from markdown
│ │ ├── token_manager.py # Token counting & budgets
│ │ ├── tools.py # Tool implementations
│ │ ├── config.py # Settings management
│ │ └── logger.py # Structured logging
│ │
│ ├── agents/ # 🤖 Specialized Agents
│ │ ├── manager.py # Manager (orchestrator)
│ │ ├── analyst.py # Analyst specialist
│ │ ├── growth_hacker.py # Growth Hacker specialist
│ │ ├── sales_machine.py # Sales Machine specialist
│ │ ├── system_builder.py # System Builder specialist
│ │ └── brand_builder.py # Brand Builder specialist
│ │
│ ├── api/ # 🌐 REST API
│ │ ├── routes.py # API endpoints
│ │ └── models.py # Request/response models
│ │
│ ├── db/ # 💾 Data Layer
│ │ ├── models.py # Database models
│ │ └── crud.py # CRUD operations
│ │
│ └── evaluation/ # 🧪 Testing & Evaluation
│ ├── metrics.py # Custom evaluation metrics
│ └── runner.py # Test runner
│
├── docker/ # 🐳 Deployment
│ ├── Dockerfile # Container image
│ └── docker-compose.yml # Multi-container setup
│
├── scripts/ # 🛠️ Utilities
│ ├── setup.sh # Quick setup script
│ ├── example_usage.py # Usage examples
│ └── evaluate.sh # Run evaluation tests
│
├── tests/ # ✅ Tests
│ └── integration/ # Integration tests
│
├── .env.example # Environment template
├── requirements.txt # Python dependencies
├── main.py # Application entry point
├── README.md # User documentation
├── QUICKSTART.md # Quick start guide
└── ARCHITECTURE.md # This file
- Why Markdown: Human-readable, version-controllable, easy to edit
- Caching: LRU cache prevents re-loading (saves tokens)
- Structure: Each prompt has clear sections (Identity, Methodology, Output Format)
- Validation: PromptLoader validates completeness
- agents.json: Agent metadata (model, temperature, costs, tools)
- tools.json: Tool definitions, rate limits, costs
- Separation: Configuration separate from code (12-factor app)
- Guidelines: Best practices, decision frameworks
- Shared: Available to all agents
- Extensible: Easy to add new knowledge
Purpose: Load prompts from markdown files with caching
Key Features:
- LRU cache (maxsize=32) - prevents redundant loads
- Validation of prompt completeness
- Loads agent configs and tool configs
- Cache statistics for monitoring
Usage:
from src.core.prompt_loader import load_prompt
prompt = load_prompt("analyst") # Cached automaticallyPurpose: Enforce Token Safety principle
Key Features:
- Estimate tokens before API calls (using tiktoken)
- Track actual usage (per agent, per day)
- Budget enforcement (daily limits)
- Cost optimization suggestions
- Multi-model pricing support
Usage:
from src.core.token_manager import get_token_manager
tm = get_token_manager()
# Before API call
allowed, reason = tm.should_allow_call(input_text)
# After API call
tm.record_usage("analyst", input_tokens=1500, output_tokens=2000)
# Get stats
summary = tm.get_summary()Purpose: Base class for all agents with built-in token management
Key Features:
- Loads prompt from PromptLoader (DRY)
- Tracks tokens automatically (Token Safety)
- Tool use via ToolRegistry (KISS)
- Conversation history management
- Error handling and retries
Lifecycle:
1. Initialize → Load prompt from file (cached)
2. Run task → Check budget before API call
3. Claude API → Execute with prompt + tools
4. Tool use → Handle tool calls
5. Record usage → Track tokens and cost
6. Return result → With metadata
Purpose: Centralized tool management (DRY)
Key Features:
- Register tools once, use everywhere
- Tool metadata from config/tools.json
- Rate limiting per tool
- Cost tracking per tool
- Enable/disable tools dynamically
Each agent:
- Inherits from
AgentBaseorSpecialistAgent - Loads its prompt from
config/prompts/{agent_id}.md - Uses tools from
ToolRegistry - Tracks tokens via
TokenManager - Returns structured results
Agent Configuration (from agents.json):
{
"id": "analyst",
"model": "claude-sonnet-4-5",
"temperature": 0.5,
"max_tokens": 4096,
"max_iterations": 10,
"capabilities": [...],
"available_tools": [...],
"quality_metrics": {...}
}Endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/ |
GET | Health check |
/api/v1/tasks |
POST | Execute task (route to agents) |
/api/v1/agents |
GET | List agents and capabilities |
/api/v1/usage |
GET | Token usage and costs |
/api/v1/config |
GET | System configuration |
Task Execution Flow:
User Request
↓
POST /api/v1/tasks
↓
Determine Agent (manager or specific)
↓
Check Budget (TokenManager)
↓
Execute Agent.run()
↓
Track Tokens
↓
Return Response
# config/.env
MAX_COST_PER_DAY=50.0Flow:
- Every API call checks budget:
tm.check_budget_available() - If 80% used → Warning logged
- If 100% used → Calls rejected
- Per-agent usage tracked for analysis
Before API call:
estimated_tokens = tm.estimate_tokens(input_text)
estimated_cost = tm.estimate_call_cost(input_text, expected_output=1000)
if estimated_cost > remaining_budget:
# Reject call or summarize contextAfter API call:
tm.record_usage(
agent_id="analyst",
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens
)Automatic Suggestions:
- High tokens per call → Break into smaller tasks
- Repeated searches → Implement caching
- Large context → Summarize history
Manual Optimization:
- Use prompt caching (LRU cache)
- Batch similar requests
- Use cheaper models for simple tasks (future: Haiku)
User: "Analyze the SaaS market"
↓
[API] POST /api/v1/tasks { task: "...", agent: "analyst" }
↓
[PromptLoader] Load analyst.md (cached if exists)
↓
[TokenManager] Check budget → OK
↓
[AnalystAgent] Claude API call with:
- System prompt (from analyst.md)
- Tools: web_search, extract_data, store_context
- User message
↓
[Claude] Uses web_search tool → Results returned
↓
[AnalystAgent] Continues until complete (max 10 iterations)
↓
[TokenManager] Record usage: 2,500 input + 1,800 output = $0.035
↓
[API] Return result with metadata
Cost: ~$0.03-0.05 per analysis
User: "Create a complete go-to-market strategy"
↓
[API] Routes to Manager Agent
↓
[Manager] Analyzes request → Needs Analyst + Growth Hacker + Sales Machine
↓
[Manager] Calls Analyst: "Find market opportunities"
↓
[Analyst] Executes, returns opportunities (cost: $0.04)
↓
[Manager] Calls Growth Hacker: "Design strategy for [opportunities]"
↓
[GrowthHacker] Executes, returns strategy (cost: $0.06)
↓
[Manager] Calls Sales Machine: "Create landing page for [product]"
↓
[SalesMachine] Executes, returns copy (cost: $0.05)
↓
[Manager] Synthesizes all outputs into cohesive plan
↓
[API] Returns integrated strategy
Total Cost: ~$0.20-0.30 for complex orchestration
Purpose: Centralize agent metadata
Benefits:
- Change model per agent without code changes
- Adjust temperature for creativity
- Define cost per agent
- Enable/disable capabilities
Example:
{
"agents": {
"analyst": {
"model": "claude-sonnet-4-5",
"temperature": 0.5, # Lower = more factual
"quality_metrics": {
"min_data_sources": 3,
"require_urls": true
}
},
"sales_machine": {
"model": "claude-sonnet-4-5",
"temperature": 0.9, # Higher = more creative
"quality_metrics": {
"require_cta": true,
"require_ab_variants": true
}
}
}
}Purpose: Define tool behavior and limits
Benefits:
- Enable/disable tools without code
- Set rate limits per tool
- Track costs per tool
- Configure API keys
Example:
{
"tools": {
"web_search": {
"enabled": true,
"rate_limit_per_minute": 10,
"cost_per_call": 0.001,
"requires_api_key": true,
"timeout_seconds": 30
}
}
}Purpose: Automated quality assurance
Structure:
{
"test_cases": [
{
"id": "analyst_001",
"agent": "analyst",
"task": "...",
"expected_outputs": {
"opportunities_count": 3,
"has_data_sources": true,
"min_data_sources": 3
},
"quality_checks": [...],
"max_cost_usd": 0.50,
"max_duration_seconds": 120
}
]
}Run Tests:
python -m src.evaluation.runnerBenefits:
- Catch regressions
- Validate quality standards
- Monitor performance
- Track costs
| Agent | Response Time | Iterations | Token Budget | Cost |
|---|---|---|---|---|
| Manager | <30s | 3-5 | 6,000 | $0.05-0.15 |
| Analyst | <60s | 5-8 | 8,000 | $0.03-0.08 |
| Growth Hacker | <45s | 4-6 | 7,000 | $0.04-0.10 |
| Sales Machine | <30s | 2-4 | 5,000 | $0.03-0.06 |
| System Builder | <60s | 5-8 | 8,000 | $0.04-0.10 |
| Brand Builder | <45s | 3-5 | 6,000 | $0.03-0.08 |
Recommended ($50/day budget):
- Manager: $15/day (30% - orchestration heavy)
- Analyst: $10/day (20% - data-intensive)
- Growth Hacker: $8/day (16%)
- Sales Machine: $6/day (12%)
- System Builder: $6/day (12%)
- Brand Builder: $5/day (10%)
python main.pycd docker
docker-compose up -dSee .env.example for required configuration.
All operations logged with structlog:
{
"event": "token_usage_recorded",
"agent_id": "analyst",
"input_tokens": 1500,
"output_tokens": 2000,
"cost_usd": 0.0345,
"timestamp": "2025-01-15T10:30:00Z"
}GET /api/v1/usageReturns:
{
"daily_usage": {
"total_cost_usd": 12.45,
"budget_remaining_usd": 37.55,
"budget_used_percentage": 24.9
},
"agent_usage": {...},
"optimization_suggestions": [...]
}- Create prompt file:
config/prompts/my_agent.md - Add to agents.json: Define metadata
- Create agent class:
src/agents/my_agent.py - Register in Manager: Add tool for calling agent
- Test: Add test case in
test_cases.json
- Define in tools.json: Metadata, rate limits, cost
- Implement in tools.py: Create Tool class
- Register:
tool_registry.register(MyTool()) - Test: Verify tool works with agents
- Keys in
.env(never committed) - Accessed via
settings(type-safe) - Validated at startup
- Daily spending limits
- Per-agent tracking
- Automatic alerts at 80% budget
- Stop execution at 95% budget
- No PII stored in logs
- Context storage (Redis) is ephemeral
- All communications over HTTPS in production
# Get optimization suggestions
tm = get_token_manager()
suggestions = tm.get_cost_optimization_suggestions()# Check available prompts
from src.core.prompt_loader import get_prompt_loader
loader = get_prompt_loader()
prompts = loader.list_available_prompts()# Check usage
GET /api/v1/usage
# Adjust budget in .env
MAX_COST_PER_DAY=100.0- Add Claude Haiku support for simple tasks (5x cheaper)
- Implement prompt caching API (Anthropic native)
- Add PostgreSQL for persistent storage
- Build web dashboard for monitoring
- Multi-model support (Gemini for specific tasks)
- Advanced RAG for knowledge base
- A/B testing framework for prompts
- Automated prompt optimization
This architecture implements a bulletproof backbone for autonomous multi-agent AI:
✅ KISS: Simple file-based configuration, clear code structure ✅ DRY: Prompts and configs defined once, reused everywhere ✅ Token Safety: Comprehensive tracking, budgets, optimization ✅ Lean: Production-ready, minimal overhead, fast execution
Result: A maintainable, cost-effective, production-ready system that scales from MVP to serving 10,000 customers.