Skip to content

feat(C7): add steganographic covert channel detection in generated outputs (7.3.9)#627

Open
RicoKomenda wants to merge 1 commit intoOWASP:mainfrom
RicoKomenda:feat/c7-generated-output-steganography
Open

feat(C7): add steganographic covert channel detection in generated outputs (7.3.9)#627
RicoKomenda wants to merge 1 commit intoOWASP:mainfrom
RicoKomenda:feat/c7-generated-output-steganography

Conversation

@RicoKomenda
Copy link
Copy Markdown
Collaborator

Summary

Adds 7.3.9 to C7.3 (Output Safety & Privacy Filtering) to address a gap where no control currently requires scanning AI-generated outputs for steganographic covert channels.

New control:

Verify that generated text and structured outputs are scanned for steganographic covert channels (e.g., whitespace encoding, Unicode homoglyph substitution, unusual token-choice patterns) that could be used to exfiltrate data or communicate with external parties without user awareness, and that detections block or flag the response for review.

Level: 3

Why this is needed

Research demonstrated at USENIX Security 2024 and ACL 2024 shows that LLMs can be induced (via fine-tuning or adversarial prompts) to encode hidden signals in generated text using whitespace variation, Unicode homoglyph swaps, and token selection patterns -- channels invisible to users but decodable by a colluding receiver. An attacker who compromises a model or injects a backdoor trigger can exfiltrate retrieved documents, session data, or partial system prompt content through the model's normal-looking output.

Existing controls cover input steganography scanning (2.7.3) and auto-rendered external requests (7.3.8) but nothing addresses encoding in the model's generated text itself. This is AI-specific: the channel exists because generative models have many valid token choices at each position, giving an adversary statistical degrees of freedom invisible to standard content filters.

Level 3 is appropriate because detection requires statistical analysis of token distributions or model internals and is an emerging capability.

Changes

  • 1.0/en/0x10-C07-Model-Behavior.md: add 7.3.9, fix MD060 separator rows and trailing pipe spacing
  • 1.0/en/0x93-Appendix-D_AI_Security_Controls_Inventory.md: add entry to AD.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant