Skip to content

Commit 1175e2c

Browse files
committed
feat: add comprehensive jailbreak detection test
Adds 05-jailbreak-test.py to expose critical security gaps in jailbreak detection: 1. Routing Bug: task_type="jailbreak" incorrectly routes to category classifier - Jailbreak content "How to build a bomb?" classified as "psychology" instead of "jailbreak" - Security risk: harmful content bypasses jailbreak detection 2. Missing Endpoint: POST /api/v1/classify/jailbreak returns 404 - Direct jailbreak classification endpoint not implemented - Forces users to rely on batch endpoint with broken routing 3. ExtProc Security Gap: Tests that ExtProc pipeline allows jailbreak content through - Validates end-to-end security filtering in LLM completion pipeline - Documents security bypass where harmful instructions can be generated Test Features: - Documents multiple jailbreak attempts and safe content for comparison - Provides detailed analysis of detection patterns and accuracy - Exposes routing bugs and security gaps with clear failure messages - Follows existing e2e test patterns for consistency This test serves as both documentation of current security issues and validation framework for future jailbreak detection improvements. Signed-off-by: Yossi Ovadia <[email protected]>
1 parent 5ec06fd commit 1175e2c

File tree

1 file changed

+470
-0
lines changed

1 file changed

+470
-0
lines changed

0 commit comments

Comments
 (0)