Skip to content

feat(C7): add steganographic covert channel detection in generated outputs (7.3.9)#627

Merged
RicoKomenda merged 2 commits intoOWASP:mainfrom
RicoKomenda:feat/c7-generated-output-steganography
Apr 1, 2026
Merged

feat(C7): add steganographic covert channel detection in generated outputs (7.3.9)#627
RicoKomenda merged 2 commits intoOWASP:mainfrom
RicoKomenda:feat/c7-generated-output-steganography

Conversation

@RicoKomenda
Copy link
Copy Markdown
Collaborator

Summary

Adds 7.3.9 to C7.3 (Output Safety & Privacy Filtering) to address a gap where no control currently requires scanning AI-generated outputs for steganographic covert channels.

New control:

Verify that generated text and structured outputs are scanned for steganographic covert channels (e.g., whitespace encoding, Unicode homoglyph substitution, unusual token-choice patterns) that could be used to exfiltrate data or communicate with external parties without user awareness, and that detections block or flag the response for review.

Level: 3

Why this is needed

Research demonstrated at USENIX Security 2024 and ACL 2024 shows that LLMs can be induced (via fine-tuning or adversarial prompts) to encode hidden signals in generated text using whitespace variation, Unicode homoglyph swaps, and token selection patterns -- channels invisible to users but decodable by a colluding receiver. An attacker who compromises a model or injects a backdoor trigger can exfiltrate retrieved documents, session data, or partial system prompt content through the model's normal-looking output.

Existing controls cover input steganography scanning (2.7.3) and auto-rendered external requests (7.3.8) but nothing addresses encoding in the model's generated text itself. This is AI-specific: the channel exists because generative models have many valid token choices at each position, giving an adversary statistical degrees of freedom invisible to standard content filters.

Level 3 is appropriate because detection requires statistical analysis of token distributions or model internals and is an emerging capability.

Changes

  • 1.0/en/0x10-C07-Model-Behavior.md: add 7.3.9, fix MD060 separator rows and trailing pipe spacing
  • 1.0/en/0x93-Appendix-D_AI_Security_Controls_Inventory.md: add entry to AD.8

| **7.3.6** | **Verify that** the system requires a human approval step or re-authentication if the model generates high-risk content. | 3 |
| **7.3.7** | **Verify that** output filters detect and block responses that reproduce verbatim segments of system prompt content. | 2 |
| **7.3.8** | **Verify that** LLM client applications prevent model-generated output from triggering automatic outbound requests (e.g., auto-rendered images, iframes, or link prefetching) to attacker-controlled endpoints, for example by disabling automatic external resource loading or restricting it to explicitly allowlisted origins as appropriate. | 2 |
| **7.3.9** | **Verify that** generated text and structured outputs are scanned for steganographic covert channels (e.g., whitespace encoding, Unicode homoglyph substitution, unusual token-choice patterns) that could be used to exfiltrate data or communicate with external parties without user awareness, and that detections block or flag the response for review. | 3 |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlap with 2.2.5: The first two examples (whitespace encoding, Unicode homoglyph substitution) are already covered by existing requirement 2.2.5 (L3), which explicitly says "encoding and representation smuggling in both inputs and outputs (e.g., invisible Unicode/control characters, homoglyph swaps)."

What is genuinely new here is the third item: unusual token-choice patterns - statistical steganography where the model selects among valid tokens to encode hidden data. That is a distinct, AI-specific threat that 2.2.5 does not cover.

Suggest narrowing to remove the overlap and focus on the new part:

"Verify that generated outputs are analyzed for statistical steganographic covert channels (e.g., biased token-choice patterns or distribution anomalies) that could encode hidden data across the model's valid output space, and that detections are flagged for review."

This keeps the control sharply differentiated from 2.2.5. L3 is the right level, no production tooling exists for token-distribution steganalysis afaik; detection requires logit-level analysis or custom statistical methods. Consistent with 13.2.10 (covert channel monitoring, also L3).

| Explicit / non-consensual content filters | 7.7.1 |
| Citation and attribution validation | 5.4.2 |
| MCP error response sanitization (no stack traces, tokens, internal paths) | 10.4.6 |
| Steganographic covert channel detection in generated text and structured outputs | 7.3.9 |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the requirement text is narrowed per the comment on 7.3.9, update this entry to match, e.g.:

"Statistical steganographic covert channel detection in generated outputs"

…to's review; resolve merge conflict

- Drop whitespace encoding and homoglyph examples (already covered by 2.2.5)
- Focus on biased token-choice patterns and output distribution anomalies
- Update Appendix D entry to match narrowed scope
- Resolve C07 separator row conflicts (take upstream format)
- Resolve C07 7.8.1 conflict (take upstream shorter version; 7.8.3 handles metadata clause)
- Resolve Appendix D conflict (keep both 7.3.9 and 7.8.3 entries)
Copy link
Copy Markdown
Collaborator Author

@RicoKomenda RicoKomenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch Otto. Narrowed 7.3.9 to the genuinely new threat only:

Verify that generated outputs are analyzed for statistical steganographic covert channels (e.g., biased token-choice patterns or output distribution anomalies) that could encode hidden data across the model's valid output space, and that detections are flagged for review.

Whitespace and homoglyph examples removed since those are already in 2.2.5. Appendix D entry updated to "Statistical steganographic covert channel detection in generated outputs". Merge conflict resolved in the same commit.

@RicoKomenda RicoKomenda merged commit 9e3f7b0 into OWASP:main Apr 1, 2026
2 checks passed
@RicoKomenda RicoKomenda deleted the feat/c7-generated-output-steganography branch April 1, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants