Skip to content

yingchen-coding/safety-harness

Repository files navigation

safety-harness

CI License: CC BY-NC 4.0

A closed-loop safety harness for agentic LLMs — find failures, lock them in as regressions, gate releases on them, and replay real incidents. Each stage is a self-contained module; together they form the loop:

 ┌──────────────┐   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
 │   stress-    │──▶│  regression- │──▶│   release-   │──▶│  incident-   │
 │   testing    │   │    suite     │   │    gate      │   │     lab      │
 └──────────────┘   └──────────────┘   └──────────────┘   └──────────────┘
   surface slow-      pin failures as     block a release    replay & root-
   burn failures      regression tests    that regresses     cause incidents
        ▲                                                          │
        └──────────────────  feeds new cases back  ◀───────────────┘

…all exercised against the simulator (a controllable agent under test), and driven end-to-end by the demo orchestrator.

Why It Matters

Agent safety failures rarely stay in one neat box. A red-team finding needs to become a regression test; a regression needs to block release; an incident needs to add new scenarios. safety-harness keeps those steps connected so safety work does not die as a one-off report.

Use it when you want a runnable skeleton for:

  • finding slow-burn agent failures
  • turning failures into regression cases
  • blocking releases when safety metrics regress
  • replaying incidents into root-cause graphs
  • showing the whole loop in a demo

Stages

Stage What it does
stress-testing/ Static + adaptive red-teaming that surfaces delayed (slow-burn) safety failures, with attack mutators, a template catalog, and statistical power analysis.
regression-suite/ Turns discovered failures into a deterministic regression suite via pluggable eval adapters (misuse, red-team, traffic).
release-gate/ A production-style evaluation pipeline that computes a safety budget and blocks a release when safety metrics regress.
incident-lab/ Reproducible incident replay + causal-graph root-cause analysis, with adapters that integrate every other stage.
simulator/ A controllable agent (planner / memory / tools / executor) that serves as the system under test.
demo/ One end-to-end run of the full loop: stress → regression → release gate → incident replay.

Run it

Each stage is independently runnable and tested. From a stage directory:

cd stress-testing
pip install -r requirements.txt   # if present
PYTHONPATH=. python -m pytest -q  # run that stage's tests

The demo/ stage orchestrates the whole pipeline end-to-end.

Quick Start

git clone https://github.com/yingchen-coding/safety-harness
cd safety-harness/demo
pip install -r requirements.txt
make demo

License

CC BY-NC 4.0 — see LICENSE.

About

A closed-loop safety harness for agentic LLMs: stress-test → regression → release-gate → incident-replay.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors