feat(C7): add steganographic covert channel detection in generated outputs (7.3.9) by RicoKomenda · Pull Request #627 · OWASP/AISVS

RicoKomenda · 2026-03-31T21:10:42Z

Summary

Adds 7.3.9 to C7.3 (Output Safety & Privacy Filtering) to address a gap where no control currently requires scanning AI-generated outputs for steganographic covert channels.

New control:

Verify that generated text and structured outputs are scanned for steganographic covert channels (e.g., whitespace encoding, Unicode homoglyph substitution, unusual token-choice patterns) that could be used to exfiltrate data or communicate with external parties without user awareness, and that detections block or flag the response for review.

Level: 3

Why this is needed

Research demonstrated at USENIX Security 2024 and ACL 2024 shows that LLMs can be induced (via fine-tuning or adversarial prompts) to encode hidden signals in generated text using whitespace variation, Unicode homoglyph swaps, and token selection patterns -- channels invisible to users but decodable by a colluding receiver. An attacker who compromises a model or injects a backdoor trigger can exfiltrate retrieved documents, session data, or partial system prompt content through the model's normal-looking output.

Existing controls cover input steganography scanning (2.7.3) and auto-rendered external requests (7.3.8) but nothing addresses encoding in the model's generated text itself. This is AI-specific: the channel exists because generative models have many valid token choices at each position, giving an adversary statistical degrees of freedom invisible to standard content filters.

Level 3 is appropriate because detection requires statistical analysis of token distributions or model internals and is an emerging capability.

Changes

1.0/en/0x10-C07-Model-Behavior.md: add 7.3.9, fix MD060 separator rows and trailing pipe spacing
1.0/en/0x93-Appendix-D_AI_Security_Controls_Inventory.md: add entry to AD.8

…erated outputs (7.3.9)

ottosulin · 2026-04-01T19:55:49Z

1.0/en/0x10-C07-Model-Behavior.md

 | **7.3.6** | **Verify that** the system requires a human approval step or re-authentication if the model generates high-risk content. | 3 |
 | **7.3.7** | **Verify that** output filters detect and block responses that reproduce verbatim segments of system prompt content. | 2 |
 | **7.3.8** | **Verify that** LLM client applications prevent model-generated output from triggering automatic outbound requests (e.g., auto-rendered images, iframes, or link prefetching) to attacker-controlled endpoints, for example by disabling automatic external resource loading or restricting it to explicitly allowlisted origins as appropriate. | 2 |
+| **7.3.9** | **Verify that** generated text and structured outputs are scanned for steganographic covert channels (e.g., whitespace encoding, Unicode homoglyph substitution, unusual token-choice patterns) that could be used to exfiltrate data or communicate with external parties without user awareness, and that detections block or flag the response for review. | 3 |


Overlap with 2.2.5: The first two examples (whitespace encoding, Unicode homoglyph substitution) are already covered by existing requirement 2.2.5 (L3), which explicitly says "encoding and representation smuggling in both inputs and outputs (e.g., invisible Unicode/control characters, homoglyph swaps)."

What is genuinely new here is the third item: unusual token-choice patterns - statistical steganography where the model selects among valid tokens to encode hidden data. That is a distinct, AI-specific threat that 2.2.5 does not cover.

Suggest narrowing to remove the overlap and focus on the new part:

"Verify that generated outputs are analyzed for statistical steganographic covert channels (e.g., biased token-choice patterns or distribution anomalies) that could encode hidden data across the model's valid output space, and that detections are flagged for review."

This keeps the control sharply differentiated from 2.2.5. L3 is the right level, no production tooling exists for token-distribution steganalysis afaik; detection requires logit-level analysis or custom statistical methods. Consistent with 13.2.10 (covert channel monitoring, also L3).

ottosulin · 2026-04-01T19:56:15Z

1.0/en/0x93-Appendix-D_AI_Security_Controls_Inventory.md

 | Explicit / non-consensual content filters | 7.7.1 |
 | Citation and attribution validation | 5.4.2 |
 | MCP error response sanitization (no stack traces, tokens, internal paths) | 10.4.6 |
+| Steganographic covert channel detection in generated text and structured outputs | 7.3.9 |


If the requirement text is narrowed per the comment on 7.3.9, update this entry to match, e.g.:

"Statistical steganographic covert channel detection in generated outputs"

…to's review; resolve merge conflict - Drop whitespace encoding and homoglyph examples (already covered by 2.2.5) - Focus on biased token-choice patterns and output distribution anomalies - Update Appendix D entry to match narrowed scope - Resolve C07 separator row conflicts (take upstream format) - Resolve C07 7.8.1 conflict (take upstream shorter version; 7.8.3 handles metadata clause) - Resolve Appendix D conflict (keep both 7.3.9 and 7.8.3 entries)

RicoKomenda

Good catch Otto. Narrowed 7.3.9 to the genuinely new threat only:

Verify that generated outputs are analyzed for statistical steganographic covert channels (e.g., biased token-choice patterns or output distribution anomalies) that could encode hidden data across the model's valid output space, and that detections are flagged for review.

Whitespace and homoglyph examples removed since those are already in 2.2.5. Appendix D entry updated to "Statistical steganographic covert channel detection in generated outputs". Merge conflict resolved in the same commit.

feat(C7): add steganographic covert channel detection control for gen…

026774c

…erated outputs (7.3.9)

ottosulin requested changes Apr 1, 2026

View reviewed changes

RicoKomenda commented Apr 1, 2026

View reviewed changes

RicoKomenda merged commit 9e3f7b0 into OWASP:main Apr 1, 2026
2 checks passed

RicoKomenda deleted the feat/c7-generated-output-steganography branch April 1, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(C7): add steganographic covert channel detection in generated outputs (7.3.9)#627

feat(C7): add steganographic covert channel detection in generated outputs (7.3.9)#627
RicoKomenda merged 2 commits intoOWASP:mainfrom
RicoKomenda:feat/c7-generated-output-steganography

RicoKomenda commented Mar 31, 2026

Uh oh!

ottosulin Apr 1, 2026

Uh oh!

ottosulin Apr 1, 2026

Uh oh!

RicoKomenda left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RicoKomenda commented Mar 31, 2026

Summary

Why this is needed

Changes

Uh oh!

ottosulin Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

ottosulin Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

RicoKomenda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants