The Automated Circuit Breaker System solves the problem of managing probabilistic AI systems with static, manual experiments. In the time it takes to run a traditional A/B test, the model might have changed or user behavior might have shifted. This system enables automated, metric-driven rollouts with built-in safety mechanisms.
"This is 'Old World' thinking applied to 'New World' speed. In the time it takes to run a valid A/B test, the model might have changed, or the user behavior might have shifted. We cannot manage probabilistic systems with static, manual experiments."
Traditional A/B testing for AI systems has fundamental limitations:
- Too Slow: Takes weeks to gather statistically significant data
- Static: Can't adapt to changing model behavior or user patterns
- Manual: Requires human intervention to roll back or advance
- Risky: All-or-nothing deployments can impact many users
The circuit breaker system provides real-time, automated rollout management with three key components:
Gradual rollout that starts conservatively and scales automatically:
- 1% → Initial probe with minimal user impact
- 5% → Small rollout after metrics validation
- 20% → Medium rollout with broader coverage
- 100% → Full deployment once stability is proven
Real-time monitoring of deterministic metrics:
- Task Completion Rate: Must stay above 85% (configurable)
- Latency: Must stay below 2000ms (configurable)
- Sample Size: Requires minimum data before making decisions
- Time Windows: Calculates metrics over rolling windows
Automated decision-making based on metrics:
- Advance: Automatically scale up when metrics are excellent
- Maintain: Hold current phase when metrics are acceptable
- Rollback: Immediately revert when metrics degrade
┌─────────────────────────────────────────────────────────┐
│ CircuitBreakerController │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Metrics │ │ Watchdog │ │ State │ │
│ │ Tracker │ │ Monitor │ │ Manager │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └──────────────────┴──────────────────┘ │
│ │ │
│ Decision Engine │
│ (Advance / Maintain / Rollback) │
└─────────────────────────────────────────────────────────┘
-
CircuitBreakerConfig
- Defines metric thresholds and rollout parameters
- Configurable for different use cases
- Persists configuration for consistency
-
CircuitBreakerMetrics
- Tracks task completion rate and latency
- Calculates metrics over time windows
- Validates against thresholds
-
CircuitBreakerWatchdog
- Real-time monitoring and decision engine
- Determines when to advance or rollback
- Manages traffic splits per phase
-
CircuitBreakerController
- Main orchestrator for the system
- Handles state persistence
- Provides simple API for integration
from circuit_breaker import CircuitBreakerController, CircuitBreakerConfig
# Create configuration
config = CircuitBreakerConfig(
min_task_completion_rate=0.85, # 85%
max_latency_ms=2000.0, # 2000ms
min_samples_per_phase=10, # Minimum samples before advancing
monitoring_window_minutes=5 # 5-minute rolling window
)
# Initialize controller
controller = CircuitBreakerController(config=config)
# For each request
for request in requests:
# Determine version based on traffic split
version = "new" if controller.should_use_new_version(request.id) else "old"
# Execute with selected version
success, latency_ms = execute_request(request, version)
# Record metrics
controller.record_execution(version, success, latency_ms)
# Periodically evaluate (or run in separate process)
if should_evaluate():
decision = controller.evaluate_and_decide()
if decision["action"] == "rollback":
alert_team(decision["reason"])from agent import DoerAgent
# Create agent with circuit breaker enabled
agent = DoerAgent(
enable_circuit_breaker=True,
circuit_breaker_config_file="cb_config.json"
)
# The agent automatically handles version selection and metrics tracking
result = agent.run(query="What is 10 + 20?", user_id="user123")
# Check which version was used
print(f"Version: {result['version_used']}")
print(f"Latency: {result['latency_ms']:.0f}ms")- Purpose: Validate new version with minimal risk
- Traffic: 1% to new version, 99% to old version
- Duration: Until minimum samples collected and metrics validated
- Advancement: Requires excellent metrics (>95% completion, <90% max latency)
- Purpose: Broader validation with manageable risk
- Traffic: 5% to new version, 95% to old version
- Duration: Until metrics remain stable
- Advancement: Consistent good performance
- Purpose: Significant user coverage before full rollout
- Traffic: 20% to new version, 80% to old version
- Duration: Final validation before full deployment
- Advancement: Sustained excellent metrics
- Purpose: Complete deployment
- Traffic: 100% to new version
- State: Circuit breaker CLOSED (normal operation)
- Monitoring: Continues to detect degradation
- Trigger: Metrics fall below thresholds
- Traffic: 0% to new version, 100% to old version
- State: Circuit breaker OPEN (tripped)
- Recovery: Manual intervention or automatic retry after fixes
- Definition: Percentage of tasks that complete successfully
- Default Threshold: 85%
- Purpose: Ensures quality of responses
- Example: If 90 out of 100 tasks succeed, rate is 90%
- Definition: Average response time in milliseconds
- Default Threshold: 2000ms
- Purpose: Ensures responsiveness
- Example: If 10 requests take [1200, 1500, 1800, ...], avg is calculated
- Definition: Minimum number of executions before decision
- Default: 10 samples per phase
- Purpose: Prevents decisions on insufficient data
- Adjustment: Increase for higher statistical confidence
CircuitBreakerConfig(
# Metric thresholds
min_task_completion_rate=0.85, # Must stay above 85%
max_latency_ms=2000.0, # Must stay below 2000ms
# Rollout parameters
initial_phase=RolloutPhase.PROBE, # Start at PROBE (1%)
min_samples_per_phase=10, # Min samples before advancing
monitoring_window_minutes=5, # Time window for calculations
# Decision thresholds
advancement_threshold=0.95, # Must be 95% good to advance
rollback_threshold=0.80 # Trip if below 80%
)Phase: PROBE (1%)
✓ Metrics: 100% completion, 1200ms latency
→ Action: ADVANCE to SMALL
Phase: SMALL (5%)
✓ Metrics: 98% completion, 1300ms latency
→ Action: ADVANCE to MEDIUM
Phase: MEDIUM (20%)
✓ Metrics: 97% completion, 1400ms latency
→ Action: ADVANCE to FULL
Phase: FULL (100%)
✓ Circuit breaker CLOSED
✓ All traffic on new version
Phase: PROBE (1%)
✓ Metrics: 98% completion, 1200ms latency
→ Action: ADVANCE to SMALL
Phase: SMALL (5%)
✗ Metrics: 80% completion, 2500ms latency
→ Action: ROLLBACK to OFF
Phase: OFF (0%)
🚨 Circuit breaker OPEN
✓ All traffic reverted to old version
📧 Team alerted to degradation
-
Automated Risk Management
- No manual intervention required
- Fast rollback prevents widespread impact
- Gradual rollout minimizes exposure
-
Data-Driven Decisions
- Based on actual performance metrics
- Not subjective human judgment
- Continuous monitoring vs. one-time tests
-
Speed
- Deploy faster than traditional A/B tests
- Adapt to changing conditions in real-time
- Scale from 1% to 100% in hours, not weeks
-
Safety
- Built-in circuit breaker for automatic rollback
- Minimal user impact during problems
- Deterministic thresholds prevent degradation
-
Scalability
- Works with any traffic volume
- Configurable for different risk tolerances
- State persistence for reliability
Run comprehensive tests:
python test_circuit_breaker.pySee example scenarios:
python example_circuit_breaker.py- Define metric thresholds for your use case
- Configure rollout phases and sample sizes
- Integrate with your agent/service
- Set up monitoring and alerting
- Test rollback scenarios
- Configure state persistence
- Document team procedures for manual intervention
- Multi-Metric Support: Beyond completion rate and latency
- Cost Tracking: Monitor resource usage during rollout
- User Segmentation: Different rollout rates per user segment
- Canary Regions: Geographic-based gradual rollout
- Automated Recovery: Retry after fixes without manual intervention
- ML-Based Predictions: Predict failures before they occur
The Automated Circuit Breaker System replaces "Old World" manual A/B testing with "New World" automated, metric-driven rollouts. By continuously monitoring deterministic metrics and making real-time decisions, it enables safe, fast deployment of new agent versions while automatically protecting users from degraded performance.
Key Principle: Let the system drive based on real metrics, not manual experiments that can't keep pace with the speed of AI evolution.