Skip to content

Latest commit

 

History

History
4 lines (3 loc) · 342 Bytes

File metadata and controls

4 lines (3 loc) · 342 Bytes

sabotage

Heron AI Work Test | Project - Michael Chen Sabotage Threat Modeling.

An automated eval harness designed to measure Monitor Negligence. The script simulates a "Monitor" (the insider) reviewing dangerous code (e.g., model poisoning, data exfiltration) and an "Auditor" (gpt-5.2) that grades the monitor's reasoning and decisions.