feat(C7): add steganographic covert channel detection in generated outputs (7.3.9)#627
Open
RicoKomenda wants to merge 1 commit intoOWASP:mainfrom
Open
Conversation
…erated outputs (7.3.9)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 7.3.9 to C7.3 (Output Safety & Privacy Filtering) to address a gap where no control currently requires scanning AI-generated outputs for steganographic covert channels.
New control:
Level: 3
Why this is needed
Research demonstrated at USENIX Security 2024 and ACL 2024 shows that LLMs can be induced (via fine-tuning or adversarial prompts) to encode hidden signals in generated text using whitespace variation, Unicode homoglyph swaps, and token selection patterns -- channels invisible to users but decodable by a colluding receiver. An attacker who compromises a model or injects a backdoor trigger can exfiltrate retrieved documents, session data, or partial system prompt content through the model's normal-looking output.
Existing controls cover input steganography scanning (2.7.3) and auto-rendered external requests (7.3.8) but nothing addresses encoding in the model's generated text itself. This is AI-specific: the channel exists because generative models have many valid token choices at each position, giving an adversary statistical degrees of freedom invisible to standard content filters.
Level 3 is appropriate because detection requires statistical analysis of token distributions or model internals and is an emerging capability.
Changes
1.0/en/0x10-C07-Model-Behavior.md: add 7.3.9, fix MD060 separator rows and trailing pipe spacing1.0/en/0x93-Appendix-D_AI_Security_Controls_Inventory.md: add entry to AD.8