A cost management library for the Strands Agents SDK with budget enforcement, adaptive model routing, and OpenTelemetry-compatible metrics.
- Budget Enforcement: Define budgets at tenant, strand, workflow, and run levels with configurable limits and actions
- Adaptive Model Routing: Automatically route to fallback models based on budget utilization and other conditions
- Cost Tracking: Track and attribute costs by tenant, strand, workflow, run, model, and tool
- OpenTelemetry Metrics: Emit cost metrics compatible with OTel collectors for long-term storage and analysis
- Flexible Policies: Configure via YAML files or environment variables
- Persistent Budget State: Optional Valkey/Redis persistence for budget state across restarts
- Python 3.10+
- Strands Agents SDK 0.1.0+
pip install strands-costguardFor persistence support:
pip install strands-costguard[valkey]# Install the package in development mode
pip install -e .
# Run the basic usage example
python examples/basic_usage.pyfrom strands_costguard import (
CostGuard,
CostGuardConfig,
FilePolicySource,
ModelUsage,
)
# Initialize Cost Guard
config = CostGuardConfig(
policy_source=FilePolicySource(path="./policies"),
enable_budget_enforcement=True,
enable_routing=True,
enable_metrics=True,
)
guard = CostGuard(config=config)
# Start a run
decision = guard.on_run_start(
tenant_id="prod-tenant",
strand_id="analytics_assistant",
workflow_id="data_analysis",
run_id="run-123",
)
if not decision.allowed:
print(f"Run rejected: {decision.reason}")
else:
# Execute your agent loop...
# Before model calls
model_decision = guard.before_model_call(
run_id="run-123",
model_name="gpt-4o",
stage="planning",
prompt_tokens_estimate=500,
)
# Use the effective model (may be downgraded)
effective_model = model_decision.effective_model
# After model calls
guard.after_model_call(
run_id="run-123",
usage=ModelUsage.from_response(
model_name=effective_model,
prompt_tokens=500,
completion_tokens=200,
),
)
# End the run
guard.on_run_end("run-123", "completed")
# Shutdown (flushes metrics)
guard.shutdown()budgets:
- id: "tenant-default"
scope: "tenant"
match:
tenant_id: "*"
period: "monthly"
max_cost: 1000.0
soft_thresholds: [0.7, 0.9, 1.0]
hard_limit: true
on_soft_threshold_exceeded: "DOWNGRADE_MODEL"
on_hard_limit_exceeded: "REJECT_NEW_RUNS"
- id: "analytics-strand"
scope: "strand"
match:
strand_id: "analytics_assistant"
period: "daily"
max_cost: 50.0
max_runs_per_period: 1000
max_concurrent_runs: 100
constraints:
max_iterations_per_run: 8
max_tool_calls_per_run: 20
max_model_tokens_per_run: 30000routing_policies:
- id: "default-routing"
match:
strand_id: "*"
stages:
- stage: "planning"
default_model: "gpt-4o-mini"
max_tokens: 2000
- stage: "synthesis"
default_model: "gpt-4o"
fallback_model: "gpt-4o-mini"
trigger_downgrade_on:
soft_threshold_exceeded: true
remaining_budget_below: 5.0pricing:
currency: "USD"
models:
"gpt-4o":
input_per_1k: 2.50
output_per_1k: 10.00
"gpt-4o-mini":
input_per_1k: 0.15
output_per_1k: 0.60
tools:
"web_search":
cost_per_call: 0.01Cost Guard integrates with your agent runtime via lifecycle hooks:
| Hook | When Called | Returns |
|---|---|---|
on_run_start() |
Before starting a new run | AdmissionDecision |
on_run_end() |
After a run completes | None |
before_iteration() |
Before each agent loop iteration | IterationDecision |
after_iteration() |
After each iteration completes | None |
before_model_call() |
Before each model call | ModelDecision |
after_model_call() |
After each model call | None |
before_tool_call() |
Before each tool call | ToolDecision |
after_tool_call() |
After each tool call | None |
To export metrics to an OpenTelemetry collector, configure StrandsTelemetry before initializing CostGuard:
from strands.telemetry.config import StrandsTelemetry
from strands_costguard import CostGuard, CostGuardConfig, FilePolicySource
# Configure telemetry with OTLP export
telemetry = StrandsTelemetry()
telemetry.setup_otlp_exporter(endpoint="http://localhost:4317")
telemetry.setup_meter(enable_otlp_exporter=True)
# Initialize CostGuard (will use the global MeterProvider)
config = CostGuardConfig(
policy_source=FilePolicySource(path="./policies"),
enable_metrics=True,
)
guard = CostGuard(config=config)Requirements:
- An OpenTelemetry collector running at the specified endpoint (default:
localhost:4317) - For local development, you can run a collector with Docker:
docker run -p 4317:4317 otel/opentelemetry-collector:latest
Disabling OTLP Export:
If you don't have a collector running, disable OTLP export to avoid connection errors:
telemetry.setup_meter(enable_otlp_exporter=False)Cost Guard emits the following metrics:
| Metric | Type | Description |
|---|---|---|
genai.cost.total |
Counter | Total cost in currency units |
genai.cost.model |
Counter | Cost per model |
genai.cost.tool |
Counter | Cost per tool |
genai.tokens.input |
Counter | Total input tokens |
genai.tokens.output |
Counter | Total output tokens |
genai.agent.iterations |
Counter | Agent loop iterations |
genai.agent.tool_calls |
Counter | Tool calls |
genai.cost.downgrade_events |
Counter | Model downgrade events |
genai.cost.rejection_events |
Counter | Run rejection events |
Metrics include resource attributes:
service.name,service.namespace,deployment.environmentstrands.tenant_id,strands.strand_id,strands.workflow_id
Budgets can be defined at multiple scopes, with higher priority scopes taking precedence:
- Global (lowest priority) - Default limits for all
- Tenant - Organization-level limits
- Strand - Agent definition limits
- Workflow (highest priority) - Specific workflow limits
When multiple budgets match, constraints are merged with more specific budgets taking priority.
When budget soft thresholds are exceeded:
| Action | Effect |
|---|---|
LOG_ONLY |
Log warning, continue normally |
DOWNGRADE_MODEL |
Switch to fallback models |
LIMIT_CAPABILITIES |
Reduce max tokens/iterations |
HALT_NEW_RUNS |
Reject new runs |
When hard limits are exceeded:
| Action | Effect |
|---|---|
HALT_RUN |
Stop the current run |
REJECT_NEW_RUNS |
Reject new runs only |
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Type checking
mypy src/
# Linting
ruff check src/Apache-2.0