Skip to content

Commit af2c7c5

Browse files
authored
feat: Add constitutional transforms based on Anthropic Constitutional Classifiers++ paper (#300)
* feat: Add constitutional transforms for AI red teaming Add constitutional classifiers probing transforms based on Cunningham et al. 2025 paper: - Reconstruction attacks: code_fragmentation, document_fragmentation, multi_turn_fragmentation - Obfuscation attacks: metaphor_encoding, riddle_encoding, contextual_substitution, character_separation - Supports static, LLM-powered, and hybrid transformation modes - Add comprehensive example notebook demonstrating all transforms with TAP integration - Strip notebook outputs for clean commit * fix: Change noqa to nosec for bandit compatibility Replace # noqa: S311 with # nosec B311 for bandit security scanner compatibility * fix: Add noqa comments for both ruff and bandit Add both # noqa: S311 (ruff) and # nosec B311 (bandit) to suppress security warnings for non-cryptographic random usage
1 parent fc0a946 commit af2c7c5

File tree

3 files changed

+1619
-0
lines changed

3 files changed

+1619
-0
lines changed

dreadnode/transforms/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
if t.TYPE_CHECKING:
1313
from dreadnode.transforms import (
1414
cipher,
15+
constitutional,
1516
encoding,
1617
image,
1718
perturbation,
@@ -29,6 +30,7 @@
2930
"TransformWarning",
3031
"TransformsLike",
3132
"cipher",
33+
"constitutional",
3234
"encoding",
3335
"image",
3436
"perturbation",
@@ -41,6 +43,7 @@
4143

4244
__lazy_submodules__: list[str] = [
4345
"cipher",
46+
"constitutional",
4447
"encoding",
4548
"image",
4649
"perturbation",

0 commit comments

Comments
 (0)