Agent-first monitoring, incident response, and continuous improvement for OpenClaw-like systems.
Claw Ops is a generic operational control plane for agentic systems. It is designed around an agent-first operating model where an agent is the first observer, first triager, and bounded first responder, while humans retain control over high-risk actions.
- structured signals over ad hoc logs
- incidents as machine-readable state
- policy-bounded autonomy
- executable runbooks
- cognitive/runtime observability
- continuous improvement from repeated pain
docs/— product, architecture, schema, and execution docsdeploy/— Compose files and deployment configpolicies/— machine-readable action policiesrunbooks/— machine-readable runbooksapp/— event processor and core application code
This repository is scaffold-first. The initial focus is a safe Phase 1 deployment profile plus the smallest working app spine:
/health- alert intake
- normalized events
- incident persistence
- policy/runbook loading
- private-by-default
- explicit resource limits
- conservative retention
- no broad autonomous remediation
- side-stack deployment rather than modifying the primary app container
Claw Ops distinguishes between:
- autonomous actions
- approval-gated actions
- human-only actions
High-risk system changes should never be executed automatically in the initial phases.
- scaffold the event processor app
- implement Alertmanager webhook intake
- persist normalized incidents in sqlite
- load policies and runbooks with validation
- connect the Phase 1 observability stack end-to-end