sabotage

Heron AI Work Test | Project - Michael Chen Sabotage Threat Modeling.

An automated eval harness designed to measure Monitor Negligence. The script simulates a "Monitor" (the insider) reviewing dangerous code (e.g., model poisoning, data exfiltration) and an "Auditor" (gpt-5.2) that grades the monitor's reasoning and decisions.