Skip to content

Releases: robertmnyborg/claude-oak-agents

v2.0.0 - Anthropic Agent Skills Parity + Enhanced Features

17 Oct 05:34

Choose a tag to compare

Release Notes: OaK Agents v2.0.0

Release Date: October 16, 2025
Codename: "Skills Parity"
Status: Major Release


🎯 Overview

OaK Agents v2.0.0 achieves complete feature parity with Anthropic's Agent Skills while maintaining all of OaK's superior self-learning capabilities. This release introduces multi-file agent packages, bundled executable scripts, dynamic agent discovery, and Model Context Protocol integration.

TL;DR: All the power of Anthropic's Agent Skills + OaK's self-learning intelligence = Best of both worlds!


🚀 Major Features

1. Multi-File Agent Packages

What: Agents can now be sophisticated packages with bundled resources

Structure:

agents/security-auditor/
├── agent.md                # Main definition
├── metadata.yaml           # Discovery metadata
├── scripts/                # Bundled executables
│   ├── dependency_scan.py
│   ├── secrets_detector.py
│   └── threat_modeler.py
├── reference/              # Documentation
│   ├── owasp_top_10.md
│   └── compliance_checklists.md
└── templates/              # Code templates
    └── security_test.py.template

Benefits:

  • Better organization for complex agents
  • Pre-tested, reliable utility scripts
  • Rich documentation without clutter
  • Reusable templates

Backward Compatible: Single-file agents work unchanged

Documentation: docs/MULTI_FILE_AGENTS.md


2. Bundled Executable Scripts

What: Agents can execute pre-built scripts for 10-100x faster performance

Example: CVE Scanner

python3 agents/security-auditor/scripts/dependency_scan.py \
  --directory=. \
  --output-format=markdown

# Result: Markdown report with CVE findings in 2s
# vs 30s token generation approach

Performance Gains:

  • Sort 10K items: 100x faster, 100% token savings
  • Parse 1MB JSON: 100x faster, 100% token savings
  • CVE scan (100 deps): 15x faster, 100% token savings
  • Secret detection: 30x faster, 100% token savings

Supported Runtimes: Python, Bash, Node.js, Go

Documentation: docs/MULTI_FILE_AGENTS.md


3. Dynamic Agent Discovery (Metadata-Only Prompts)

What: 90% smaller system prompts with on-demand loading

How It Works:

  • Level 1 (Startup): Load lightweight metadata (6KB vs 87KB)
  • Level 2 (Invocation): Load full definition only when agent is used
  • Level 3 (Execution): Load scripts/docs as needed

Performance:

  • 93% smaller prompts: 87KB → 6KB
  • 4x faster classification: 2s → 0.5s
  • 3x+ scalability: 30 agents → 100+ agents
  • Token savings: 81K tokens per conversation

Cost Savings:

Monthly (1000 conversations):
- Before: 87M tokens ≈ $174/month (GPT-4)
- After: 6M tokens ≈ $12/month (GPT-4)
- Savings: $162/month

Status: Built and ready, opt-in via ./scripts/enable_metadata_prompts.sh

Documentation:


4. Model Context Protocol (MCP) Integration

What: Standardized telemetry and agent coordination via Anthropic's MCP

Components:

  • oak-telemetry server: Telemetry logging and data access
  • oak-agents server: Agent discovery, metadata, script execution

MCP Resources:

  • oak://telemetry/invocations - Recent agent invocations
  • oak://telemetry/metrics - Performance metrics
  • oak://telemetry/gaps - Capability gaps
  • oak://agents/metadata - All agent metadata
  • oak://agents/{name}/definition - Full agent definition
  • oak://agents/{name}/scripts - Bundled scripts

MCP Tools:

  • log_agent_invocation - Log agent execution
  • update_invocation - Update with completion data
  • query_telemetry - Query historical data
  • find_agents - Discover agents by keywords/domains
  • execute_agent_script - Run bundled scripts
  • get_agent_recommendations - ML-powered suggestions

Benefits:

  • Industry-standard protocol
  • Better ecosystem integration
  • Cleaner than custom hooks
  • Built-in error handling

Setup:

cd mcp
npm install
npm run build

# Configure in ~/.config/claude/mcp_servers.json

Documentation: mcp/README.md


📊 Feature Comparison

Feature Anthropic Skills OaK v1.x OaK v2.0
Core Functionality
Multi-file packages
Bundled scripts
Dynamic discovery
MCP integration
OaK-Exclusive
Comprehensive telemetry
Learning from experience
A/B testing
Auto gap detection
Agent-auditor (HR)
ML optimization 🚧 ✅ (Phase 6)
Auto agent creation
Portfolio management

Result: Full Anthropic parity + 8 exclusive OaK features


🆕 New Files & Components

Core Implementation

  • core/agent_loader.py - Multi-format agent loader (single-file + multi-file)
  • core/generate_agent_metadata.py - Metadata listing generator
  • scripts/enable_metadata_prompts.sh - One-command enablement
  • mcp/src/telemetry-server.ts - MCP telemetry server
  • mcp/src/agents-server.ts - MCP agent coordination server
  • mcp/package.json - MCP dependencies

Example Implementations

  • agents/security-auditor-multifile/ - Complete multi-file reference
    • metadata.yaml - Discovery metadata
    • agent.md - Full definition
    • scripts/dependency_scan.py - CVE scanner (working!)
    • reference/ - OWASP, compliance docs
    • templates/ - Security test templates

Documentation

  • docs/MULTI_FILE_AGENTS.md - Multi-file architecture guide
  • docs/MIGRATION_GUIDE.md - Single-file to multi-file migration
  • docs/METADATA_ONLY_PROMPTS.md - Progressive disclosure deep dive
  • docs/ENABLE_METADATA_PROMPTS.md - Enablement guide
  • mcp/README.md - MCP setup and usage
  • ANTHROPIC_SKILLS_PARITY.md - Implementation summary
  • FINAL_IMPLEMENTATION_SUMMARY.md - Complete status
  • RELEASE_NOTES_v2.0.0.md - This document
  • USER_GUIDE.md - Non-technical user guide (new!)

Updated Files

  • README.md - Added Anthropic Skills comparison, new features
  • All documentation updated with references to new features

🔄 Migration & Backward Compatibility

100% Backward Compatible

No breaking changes:

  • ✅ Single-file agents work unchanged
  • ✅ Existing workflows continue
  • ✅ No migration required
  • ✅ Agent loader auto-detects format

Optional migrations:

  • Single-file → Multi-file (for advanced features)
  • Full definitions → Metadata-only (for performance)
  • Hooks → MCP (for standardization)

Migration guides:


📈 Performance Improvements

System Prompt Size (with metadata-only)

  • Before: 87KB full agent definitions
  • After: 6KB metadata only
  • Improvement: 93% reduction

Classification Speed

  • Before: ~2s with full definitions
  • After: ~0.5s with metadata
  • Improvement: 4x faster

Script Execution Speed

Task Token Gen Script Speedup
Sort 10K items 5s, 50K tokens 0.05s, 0 tokens 100x
Parse 1MB JSON 10s, 100K tokens 0.1s, 0 tokens 100x
CVE scan (100) 30s, 200K tokens 2s, 0 tokens 15x
Secret detect 15s, 80K tokens 0.5s, 0 tokens 30x

Token Cost Savings (with metadata-only)

Per conversation:
- Savings: 81K tokens

Monthly (1000 conversations):
- Savings: 81M tokens ≈ $162/month (GPT-4)

Scalability

  • Before: ~30 agents (practical limit)
  • After: 100+ agents supported
  • Improvement: 3x+ capacity

🛠️ Installation & Upgrade

New Installation

# Clone repository
git clone https://github.com/robertmnyborg/claude-oak-agents.git ~/Projects/claude-oak-agents
cd ~/Projects/claude-oak-agents

# Install agents
mkdir -p ~/.claude/agents
ln -s ~/Projects/claude-oak-agents/agents/* ~/.claude/agents/

# Install automation (optional)
./automation/install_automation.sh

# Enable metadata-only prompts (optional, recommended)
./scripts/enable_metadata_prompts.sh

# Install MCP servers (optional)
cd mcp
npm install
npm run build

Upgrading from v1.x

cd ~/Projects/claude-oak-agents

# Pull latest changes
git pull origin main

# No migration required - everything backward compatible!

# Optional: Enable metadata-only prompts
./scripts/enable_metadata_prompts.sh

# Optional: Install MCP servers
cd mcp
npm install
npm run build

🧪 Testing

Automated Tests

All new components have been tested:

# Test agent loader
python3 core/agent_loader.py --command=metadata
# ✅ Loads all 26 agents

# Test metadata generator
python3 core/generate_agent_metadata.py --format=compact
# ✅ Generates 6KB listing

# Test bundled script
python3 agents/security-auditor-multifile/scripts/dependency_scan.py
# ✅ Finds vulnerabilities in 2s

# Test multi-file loading
python3 core/agent_loader.py --command=load --agent=security-auditor-multifile
# ✅ Loads multi-file agent with 3 scripts, 4 reference docs

Manual Testing

Recommended after upgrade:

  1. Agent Invocation: Test agent delegation works normally
  2. Script Execution: Test bundled scripts if using multi-file agents
  3. Metadata-Only: If enabled, verify agent discovery works
  4. MCP: If using, test MCP tool invocations

📚 Documentation

New Documentation

Read more

v1.0.0 - Phases 1-5 Complete: Self-Improving Agent System

16 Oct 22:06

Choose a tag to compare

Claude OaK Agents v1.0.0 🎉

First stable release of the self-improving agent system for Claude Code. Phases 1-5 complete with 29+ specialized agents, automatic capability gap detection, and 80-95% automation.


🎯 What's Included

✅ Phase 1-3: Telemetry Infrastructure (Complete)

Automatic performance tracking and state analysis

  • Telemetry System: Automatic logging of every agent invocation with state features, outcomes, and performance metrics
  • Hooks: Pre/post agent execution hooks for fail-safe telemetry capture (never blocks agents)
  • State Analysis: Automated feature extraction and ranking for systematic task decomposition
  • Data Storage: JSONL-based telemetry storage with comprehensive schemas

Key Files:

  • telemetry/logger.py - Automatic invocation logging
  • telemetry/analyzer.py - Performance statistics and analysis
  • hooks/pre_agent_hook.py & hooks/post_agent_hook.py - Automatic telemetry capture
  • agents/state-analyzer.md - State feature extraction agent

✅ Phase 4: Transition Models & Utility Tracking (Complete)

Performance dashboards and feedback collection

  • Transition Models: YAML documentation of expected agent behavior patterns
  • Utility Tracking: Success metrics and quality ratings collection
  • Performance Dashboards: HTML dashboards with visualizations
  • Batch Feedback: Interactive feedback collection UI

Key Files:

  • scripts/phase4/generate_transition_models.py - Auto-generate behavior docs
  • scripts/phase4/batch_feedback.py - Feedback collection
  • scripts/phase4/generate_dashboard.py - Performance visualizations

✅ Phase 5: Adaptive Curation & Human-in-the-Loop (Complete)

Strategic portfolio management with human oversight

  • Agent-Auditor (Agentic HR): Strategic portfolio manager that evaluates agent performance, identifies gaps, detects redundancy, and recommends lifecycle actions
  • Capability Gap Detection: Automatic detection when no suitable agent exists (3+ failures → create new agent)
  • Human Review Workflow: All auto-created agents require human approval before first deployment
  • A/B Testing Framework: Structured testing of improved agent versions
  • Automation System: Shell prompts, scheduled tasks, and macOS notifications

Key Features:

  • Automatic Agent Creation: System detects capability gaps and creates new agents automatically
  • Review Commands: oak-list-pending-agents, oak-review-agent, oak-approve-agent, oak-modify-agent, oak-reject-agent
  • Portfolio Management: Monthly audits identify underperforming agents, gaps, and redundancy
  • Intelligent Prompting: Weekly/monthly review prompts only when actionable data exists

Key Files:

  • agents/agent-auditor.md - Strategic HR agent for portfolio management
  • scripts/agent_review.py - Review and approval workflow
  • scripts/phase5/run_agent_audit.py - Portfolio audit automation
  • automation/oak_prompts.sh - Shell integration with review commands
  • automation/oak_notify.sh - Notification system
  • automation/install_automation.sh - One-command setup

🤖 29+ Specialized Agents

Core Development (7 agents)

  • frontend-developer - React/Vue/Angular, UI/UX, browser compatibility
  • backend-architect - APIs, databases, microservices, system design
  • infrastructure-specialist - AWS CDK, Terraform, cloud deployment
  • mobile-developer - React Native, iOS, Android
  • blockchain-developer - Solidity, Web3, DeFi protocols
  • ml-engineer - TensorFlow/PyTorch, ML pipelines, MLOps
  • legacy-maintainer - Java, C#, enterprise systems

Quality & Security (5 agents)

  • security-auditor - Penetration testing, compliance, threat modeling
  • code-reviewer - Quality gates, standards enforcement
  • unit-test-expert - Comprehensive testing, edge cases
  • dependency-scanner - Supply chain security, vulnerabilities
  • qa-specialist - Integration testing, E2E validation

Infrastructure & Operations (4 agents)

  • systems-architect - High-level design, technical specs
  • performance-optimizer - Bottleneck identification, optimization
  • debug-specialist - Critical error resolution (HIGHEST PRIORITY)
  • git-workflow-manager - Git operations, PRs, branch management

Analysis & Planning (5 agents)

  • state-analyzer - State feature extraction and ranking
  • business-analyst - Requirements analysis, stakeholder communication
  • data-scientist - Data analysis, statistical processing
  • project-manager - Multi-step coordination, timeline management
  • agent-auditor - NEW: Strategic HR for agent portfolio

Documentation & Content (3 agents)

  • technical-documentation-writer - API docs, technical specifications
  • content-writer - Marketing content, user-facing docs
  • changelog-recorder - Automatic changelog generation

Special Purpose (3+ agents)

  • design-simplicity-advisor - KISS enforcement (mandatory)
  • agent-creator - Meta-agent for creating new specialists
  • general-purpose - Fallback for basic tasks

Plus: System automatically creates new agents when gaps are detected!


📊 Key Capabilities

1. Automatic Telemetry

  • Zero-effort telemetry capture via hooks
  • Comprehensive state features (languages, frameworks, file counts)
  • Performance metrics (duration, success rate, quality ratings)
  • All data stored locally in telemetry/ directory

2. Capability Gap Detection

  • Detects when no suitable agent exists
  • Automatic agent creation after 3+ routing failures
  • Human review required before first deployment
  • After approval, agents can auto-update based on learning

3. Agent-Auditor (Agentic HR)

  • Monthly portfolio audits
  • Performance evaluation (success rates, quality, utilization)
  • Capability gap identification from patterns
  • Redundancy detection and consolidation recommendations
  • Lifecycle management (create/refactor/consolidate/deprecate)

4. Human-in-the-Loop Quality Control

  • All auto-created agents saved to agents/pending_review/
  • Review workflow: list → review → approve/modify/reject
  • Notification system (shell prompts + macOS notifications)
  • After first approval, system can auto-update

5. Intelligent Automation

  • Weekly reviews: 15 minutes (5 min automated)
  • Monthly audits: 1 hour (30 min automated)
  • Health checks: Every 3 days (fully automated)
  • Daily checks: 9am for actionable items
  • 80-95% automation with intelligent prompting

6. A/B Testing Framework

  • Structured testing of improved agent versions
  • Statistical significance validation
  • Performance metrics tracking
  • Best version deployment

🚀 Quick Start

Installation (5 Minutes)

```bash

1. Clone repository

git clone https://github.com/robertmnyborg/claude-oak-agents.git ~/Projects/claude-oak-agents
cd ~/Projects/claude-oak-agents

2. Install agents (creates symlinks)

mkdir -p ~/.claude/agents
ln -s ~/Projects/claude-oak-agents/agents/* ~/.claude/agents/

3. Install automation (optional but recommended)

./automation/install_automation.sh

4. Start using Claude Code normally!

```

Daily Usage

Use agents normally - System handles everything automatically:

  • Classifies requests
  • Selects best agents
  • Logs telemetry
  • Detects gaps
  • Creates new agents when needed

Weekly rhythm (15 minutes):
```bash
oak-weekly-review # View performance summary
```

Monthly rhythm (1 hour):
```bash
oak-monthly-review # Strategic portfolio audit
```

Agent review (as needed, 5-10 minutes):
```bash
oak-list-pending-agents # See pending agents
oak-review-agent # Read specification
oak-approve-agent # Deploy immediately
```


📈 The Learning Flywheel

```
Use Agents

Telemetry Captures Performance

Weekly/Monthly Analysis

Insights & Recommendations

A/B Testing (Phase 5)

Improvements Deployed

ML Learning (Phase 6 - coming soon)

Better Agent Selection

(Back to Use Agents - but smarter)
```

Each iteration makes the system better at serving YOUR needs.


🗓️ What's Next: Phase 6 (Coming Soon)

ML Pipeline & Continuous Learning (Month 5-6):

  • Conservative Q-Learning (CQL) for offline RL
  • Policy learning from telemetry data
  • Automated agent selection recommendations
  • Continuous model retraining
  • Policy advisor agent for optimization

Timeline: Q1 2026


📚 Documentation


🙏 Credits

  • Original System: claude-squad by jamsajones
  • OaK Architecture: Inspired by hierarchical reinforcement learning research
  • Built with: Claude Code and lots of telemetry data

📝 License

MIT License - See LICENSE for details


Status: ✅ Phases 1-5 Complete | 🚧 Phase 6 In Progress | 29+ Agents | Self-Learning Active | Automation Ready

Get Started: Installation Guide