Commit af2c7c5
authored
feat: Add constitutional transforms based on Anthropic Constitutional Classifiers++ paper (#300)
* feat: Add constitutional transforms for AI red teaming
Add constitutional classifiers probing transforms based on Cunningham et al. 2025 paper:
- Reconstruction attacks: code_fragmentation, document_fragmentation, multi_turn_fragmentation
- Obfuscation attacks: metaphor_encoding, riddle_encoding, contextual_substitution, character_separation
- Supports static, LLM-powered, and hybrid transformation modes
- Add comprehensive example notebook demonstrating all transforms with TAP integration
- Strip notebook outputs for clean commit
* fix: Change noqa to nosec for bandit compatibility
Replace # noqa: S311 with # nosec B311 for bandit security scanner compatibility
* fix: Add noqa comments for both ruff and bandit
Add both # noqa: S311 (ruff) and # nosec B311 (bandit) to suppress security warnings for non-cryptographic random usage1 parent fc0a946 commit af2c7c5
File tree
3 files changed
+1619
-0
lines changed- dreadnode/transforms
- examples/airt
3 files changed
+1619
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| 33 | + | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| |||
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| 46 | + | |
44 | 47 | | |
45 | 48 | | |
46 | 49 | | |
| |||
0 commit comments