Automated Circuit Breaker System

Overview

The Automated Circuit Breaker System solves the problem of managing probabilistic AI systems with static, manual experiments. In the time it takes to run a traditional A/B test, the model might have changed or user behavior might have shifted. This system enables automated, metric-driven rollouts with built-in safety mechanisms.

The Problem

"This is 'Old World' thinking applied to 'New World' speed. In the time it takes to run a valid A/B test, the model might have changed, or the user behavior might have shifted. We cannot manage probabilistic systems with static, manual experiments."

Traditional A/B testing for AI systems has fundamental limitations:

Too Slow: Takes weeks to gather statistically significant data
Static: Can't adapt to changing model behavior or user patterns
Manual: Requires human intervention to roll back or advance
Risky: All-or-nothing deployments can impact many users

The Solution: Automated Circuit Breakers

The circuit breaker system provides real-time, automated rollout management with three key components:

1. The Probe

Gradual rollout that starts conservatively and scales automatically:

1% → Initial probe with minimal user impact
5% → Small rollout after metrics validation
20% → Medium rollout with broader coverage
100% → Full deployment once stability is proven

2. The Watchdog

Real-time monitoring of deterministic metrics:

Task Completion Rate: Must stay above 85% (configurable)
Latency: Must stay below 2000ms (configurable)
Sample Size: Requires minimum data before making decisions
Time Windows: Calculates metrics over rolling windows

3. Auto-Scale & Auto-Rollback

Automated decision-making based on metrics:

Advance: Automatically scale up when metrics are excellent
Maintain: Hold current phase when metrics are acceptable
Rollback: Immediately revert when metrics degrade

Architecture

Components

┌─────────────────────────────────────────────────────────┐
│              CircuitBreakerController                    │
├─────────────────────────────────────────────────────────┤
│                                                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │   Metrics    │  │   Watchdog   │  │    State     │  │
│  │   Tracker    │  │   Monitor    │  │   Manager    │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
│         │                  │                  │          │
│         └──────────────────┴──────────────────┘          │
│                            │                              │
│                   Decision Engine                         │
│         (Advance / Maintain / Rollback)                   │
└─────────────────────────────────────────────────────────┘

Key Classes

CircuitBreakerConfig
- Defines metric thresholds and rollout parameters
- Configurable for different use cases
- Persists configuration for consistency
CircuitBreakerMetrics
- Tracks task completion rate and latency
- Calculates metrics over time windows
- Validates against thresholds
CircuitBreakerWatchdog
- Real-time monitoring and decision engine
- Determines when to advance or rollback
- Manages traffic splits per phase
CircuitBreakerController
- Main orchestrator for the system
- Handles state persistence
- Provides simple API for integration

Usage

Basic Example

from circuit_breaker import CircuitBreakerController, CircuitBreakerConfig

# Create configuration
config = CircuitBreakerConfig(
    min_task_completion_rate=0.85,  # 85%
    max_latency_ms=2000.0,           # 2000ms
    min_samples_per_phase=10,        # Minimum samples before advancing
    monitoring_window_minutes=5      # 5-minute rolling window
)

# Initialize controller
controller = CircuitBreakerController(config=config)

# For each request
for request in requests:
    # Determine version based on traffic split
    version = "new" if controller.should_use_new_version(request.id) else "old"
    
    # Execute with selected version
    success, latency_ms = execute_request(request, version)
    
    # Record metrics
    controller.record_execution(version, success, latency_ms)
    
    # Periodically evaluate (or run in separate process)
    if should_evaluate():
        decision = controller.evaluate_and_decide()
        
        if decision["action"] == "rollback":
            alert_team(decision["reason"])

Integration with DoerAgent

from agent import DoerAgent

# Create agent with circuit breaker enabled
agent = DoerAgent(
    enable_circuit_breaker=True,
    circuit_breaker_config_file="cb_config.json"
)

# The agent automatically handles version selection and metrics tracking
result = agent.run(query="What is 10 + 20?", user_id="user123")

# Check which version was used
print(f"Version: {result['version_used']}")
print(f"Latency: {result['latency_ms']:.0f}ms")

Rollout Phases

Phase 1: PROBE (1%)

Purpose: Validate new version with minimal risk
Traffic: 1% to new version, 99% to old version
Duration: Until minimum samples collected and metrics validated
Advancement: Requires excellent metrics (>95% completion, <90% max latency)

Phase 2: SMALL (5%)

Purpose: Broader validation with manageable risk
Traffic: 5% to new version, 95% to old version
Duration: Until metrics remain stable
Advancement: Consistent good performance

Phase 3: MEDIUM (20%)

Purpose: Significant user coverage before full rollout
Traffic: 20% to new version, 80% to old version
Duration: Final validation before full deployment
Advancement: Sustained excellent metrics

Phase 4: FULL (100%)

Purpose: Complete deployment
Traffic: 100% to new version
State: Circuit breaker CLOSED (normal operation)
Monitoring: Continues to detect degradation

Rollback: OFF (0%)

Trigger: Metrics fall below thresholds
Traffic: 0% to new version, 100% to old version
State: Circuit breaker OPEN (tripped)
Recovery: Manual intervention or automatic retry after fixes

Metrics

Task Completion Rate

Definition: Percentage of tasks that complete successfully
Default Threshold: 85%
Purpose: Ensures quality of responses
Example: If 90 out of 100 tasks succeed, rate is 90%

Latency

Definition: Average response time in milliseconds
Default Threshold: 2000ms
Purpose: Ensures responsiveness
Example: If 10 requests take [1200, 1500, 1800, ...], avg is calculated

Sample Size

Definition: Minimum number of executions before decision
Default: 10 samples per phase
Purpose: Prevents decisions on insufficient data
Adjustment: Increase for higher statistical confidence

Configuration Options

CircuitBreakerConfig(
    # Metric thresholds
    min_task_completion_rate=0.85,    # Must stay above 85%
    max_latency_ms=2000.0,            # Must stay below 2000ms
    
    # Rollout parameters
    initial_phase=RolloutPhase.PROBE, # Start at PROBE (1%)
    min_samples_per_phase=10,         # Min samples before advancing
    monitoring_window_minutes=5,       # Time window for calculations
    
    # Decision thresholds
    advancement_threshold=0.95,        # Must be 95% good to advance
    rollback_threshold=0.80            # Trip if below 80%
)

Example Scenarios

Scenario 1: Successful Rollout

Phase: PROBE (1%)
  ✓ Metrics: 100% completion, 1200ms latency
  → Action: ADVANCE to SMALL

Phase: SMALL (5%)
  ✓ Metrics: 98% completion, 1300ms latency
  → Action: ADVANCE to MEDIUM

Phase: MEDIUM (20%)
  ✓ Metrics: 97% completion, 1400ms latency
  → Action: ADVANCE to FULL

Phase: FULL (100%)
  ✓ Circuit breaker CLOSED
  ✓ All traffic on new version

Scenario 2: Automatic Rollback

Phase: PROBE (1%)
  ✓ Metrics: 98% completion, 1200ms latency
  → Action: ADVANCE to SMALL

Phase: SMALL (5%)
  ✗ Metrics: 80% completion, 2500ms latency
  → Action: ROLLBACK to OFF

Phase: OFF (0%)
  🚨 Circuit breaker OPEN
  ✓ All traffic reverted to old version
  📧 Team alerted to degradation

Benefits

Automated Risk Management
- No manual intervention required
- Fast rollback prevents widespread impact
- Gradual rollout minimizes exposure
Data-Driven Decisions
- Based on actual performance metrics
- Not subjective human judgment
- Continuous monitoring vs. one-time tests
Speed
- Deploy faster than traditional A/B tests
- Adapt to changing conditions in real-time
- Scale from 1% to 100% in hours, not weeks
Safety
- Built-in circuit breaker for automatic rollback
- Minimal user impact during problems
- Deterministic thresholds prevent degradation
Scalability
- Works with any traffic volume
- Configurable for different risk tolerances
- State persistence for reliability

Testing

Run comprehensive tests:

python test_circuit_breaker.py

See example scenarios:

python example_circuit_breaker.py

Integration Checklist

Define metric thresholds for your use case
Configure rollout phases and sample sizes
Integrate with your agent/service
Set up monitoring and alerting
Test rollback scenarios
Configure state persistence
Document team procedures for manual intervention

Future Enhancements

Multi-Metric Support: Beyond completion rate and latency
Cost Tracking: Monitor resource usage during rollout
User Segmentation: Different rollout rates per user segment
Canary Regions: Geographic-based gradual rollout
Automated Recovery: Retry after fixes without manual intervention
ML-Based Predictions: Predict failures before they occur

Conclusion

The Automated Circuit Breaker System replaces "Old World" manual A/B testing with "New World" automated, metric-driven rollouts. By continuously monitoring deterministic metrics and making real-time decisions, it enables safe, fast deployment of new agent versions while automatically protecting users from degraded performance.

Key Principle: Let the system drive based on real metrics, not manual experiments that can't keep pace with the speed of AI evolution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Circuit Breaker System

Overview

The Problem

The Solution: Automated Circuit Breakers

1. The Probe

2. The Watchdog

3. Auto-Scale & Auto-Rollback

Architecture

Components

Key Classes

Usage

Basic Example

Integration with DoerAgent

Rollout Phases

Phase 1: PROBE (1%)

Phase 2: SMALL (5%)

Phase 3: MEDIUM (20%)

Phase 4: FULL (100%)

Rollback: OFF (0%)

Metrics

Task Completion Rate

Latency

Sample Size

Configuration Options

Example Scenarios

Scenario 1: Successful Rollout

Scenario 2: Automatic Rollback

Benefits

Testing

Integration Checklist

Future Enhancements

Conclusion

FilesExpand file tree

CIRCUIT_BREAKER.md

Latest commit

History

CIRCUIT_BREAKER.md

File metadata and controls

Automated Circuit Breaker System

Overview

The Problem

The Solution: Automated Circuit Breakers

1. The Probe

2. The Watchdog

3. Auto-Scale & Auto-Rollback

Architecture

Components

Key Classes

Usage

Basic Example

Integration with DoerAgent

Rollout Phases

Phase 1: PROBE (1%)

Phase 2: SMALL (5%)

Phase 3: MEDIUM (20%)

Phase 4: FULL (100%)

Rollback: OFF (0%)

Metrics

Task Completion Rate

Latency

Sample Size

Configuration Options

Example Scenarios

Scenario 1: Successful Rollout

Scenario 2: Automatic Rollback

Benefits

Testing

Integration Checklist

Future Enhancements

Conclusion