You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds 05-jailbreak-test.py with comprehensive test coverage for jailbreak
detection across multiple classifier paths:
- Batch API security classification (ModernBERT path)
- Direct security endpoint testing
- ExtProc pipeline security validation
- Pattern analysis across multiple test cases
Features:
- Cache-busting with unique test cases per run
- Clear documentation of expected results per path
- Detailed logging of classifier behavior differences
- Comprehensive security gap analysis
Tests expose critical security vulnerabilities where jailbreak content
bypasses detection and reaches LLM backends, generating harmful responses.
Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
0 commit comments