A closed-loop safety harness for agentic LLMs — find failures, lock them in as regressions, gate releases on them, and replay real incidents. Each stage is a self-contained module; together they form the loop:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ stress- │──▶│ regression- │──▶│ release- │──▶│ incident- │
│ testing │ │ suite │ │ gate │ │ lab │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
surface slow- pin failures as block a release replay & root-
burn failures regression tests that regresses cause incidents
▲ │
└────────────────── feeds new cases back ◀───────────────┘
…all exercised against the simulator (a controllable agent under test), and driven end-to-end by the demo orchestrator.
Agent safety failures rarely stay in one neat box. A red-team finding needs to become a regression test; a regression needs to block release; an incident needs to add new scenarios. safety-harness keeps those steps connected so safety work does not die as a one-off report.
Use it when you want a runnable skeleton for:
- finding slow-burn agent failures
- turning failures into regression cases
- blocking releases when safety metrics regress
- replaying incidents into root-cause graphs
- showing the whole loop in a demo
| Stage | What it does |
|---|---|
stress-testing/ |
Static + adaptive red-teaming that surfaces delayed (slow-burn) safety failures, with attack mutators, a template catalog, and statistical power analysis. |
regression-suite/ |
Turns discovered failures into a deterministic regression suite via pluggable eval adapters (misuse, red-team, traffic). |
release-gate/ |
A production-style evaluation pipeline that computes a safety budget and blocks a release when safety metrics regress. |
incident-lab/ |
Reproducible incident replay + causal-graph root-cause analysis, with adapters that integrate every other stage. |
simulator/ |
A controllable agent (planner / memory / tools / executor) that serves as the system under test. |
demo/ |
One end-to-end run of the full loop: stress → regression → release gate → incident replay. |
Each stage is independently runnable and tested. From a stage directory:
cd stress-testing
pip install -r requirements.txt # if present
PYTHONPATH=. python -m pytest -q # run that stage's testsThe demo/ stage orchestrates the whole pipeline end-to-end.
git clone https://github.com/yingchen-coding/safety-harness
cd safety-harness/demo
pip install -r requirements.txt
make demoCC BY-NC 4.0 — see LICENSE.