LiveReview already protects model input (diffs, comments, metadata) before an LLM call. The post-LLM phase solves the next security gap: model output can still contain sensitive data or unsafe text patterns before it is shown to users.
This control adds an output safety layer that is:
- Non-blocking for developer workflow.
- Focused on redaction of high-confidence sensitive patterns.
- Applied consistently across key response paths.
- Verifiable through focused tests and operational logs.
The core policy is redact-and-continue, not reject-and-fail.
Even with strong input sanitization, model output can still introduce security risk:
- The model can echo or synthesize secrets in output text.
- The model can include PII-like values in summaries or comments.
- The model can produce instruction-like control phrases that are unsafe to relay directly.
- Output formatting/repair paths can produce content that bypasses earlier assumptions unless sanitized at the end.
From a governance perspective, this is an egress-control problem: sensitive content leaving the model boundary and reaching user-visible channels.
Without post-output controls, organizations face:
- Data exposure risk in review comments and summaries.
- Audit findings around incomplete AI control coverage.
- Higher incident response cost for accidental leakage.
- Loss of trust in AI-assisted review workflows.
Before model output is returned to users, sanitize user-visible text for high-confidence sensitive patterns while preserving normal review flow.
- Minimal workflow disruption: no new hard-fail gate in this phase.
- High ROI coverage: focus on secrets, PII markers, and known control-token artifacts.
- Shared policy surface: use one sanitizer policy rather than separate per-feature rules.
- Defense in depth: pair with existing pre-LLM controls.
- Redacts high-confidence secret patterns.
- Redacts PII-like markers in natural-language output.
- Sanitizes markdown output to neutralize raw HTML tags.
- Neutralizes unsafe markdown link schemes while preserving safe links.
- Applies in both plain-text and structured output paths.
- Preserves developer experience by returning sanitized output instead of hard failures.
- It does not enforce broad semantic moderation policy.
- It does not reject all malformed outputs.
- It does not replace upstream JSON repair logic.
LiveReview now has two complementary sanitization phases:
- Pre-LLM sanitization: neutralize and scrub untrusted input before model calls.
- Post-LLM sanitization: scrub user-visible output before return.
In practical terms:
- Input protections reduce prompt-injection and data leakage risk into the model.
- Output protections reduce data leakage risk out of the model.
This closes the key control loop: ingress and egress controls around model interaction.
The shared post-output entrypoint is implemented in:
Key function:
- SanitizationPostflight
This function applies the same core redaction policy used by existing sanitization capabilities, tuned for user-visible text. It also applies markdown/link safety controls so unsafe rendered content is neutralized before output leaves the backend.
Markdown and link normalization is implemented as a dedicated helper stage in the sanitizer module so behavior is consistent across output paths.
For plain-text LLM responses, sanitization is applied immediately after model output is received and before it is returned.
Implementation location:
Key behavior:
- Response is sanitized via applyPostOutputSanitization.
- A panic-safe wrapper prevents sanitizer panics from breaking provider response flow.
- Metadata-only logs record whether sanitization was applied and high-level counts.
- Internal sanitizer error signals (for example PII redaction internal failures) are logged for observability.
- No raw sensitive output is logged.
For structured outputs that pass through parse/repair logic, sanitization is applied after repair and successful parse.
Implementation location:
Ordering:
- Parse attempt.
- Repair path if needed.
- Successful parse.
- Post-output sanitization on user-visible fields.
Scope sanitized in this path:
- Technical summary text.
- Comment content.
- Suggestion text.
Reliability and error-handling behavior in this path:
- Post-sanitization in parsed results is panic-safe.
- Internal sanitizer errors are counted and surfaced through logs.
- Sanitization failures do not crash parsing; processing continues with best-effort safe output.
Structural fields are preserved as-is (for example: file path and line mapping data).
In addition to central post-output sanitization, comment posting formatters/providers apply output sanitization before sending comment bodies to external Git provider markdown renderers.
Implemented in:
- internal/providers/github/github.go
- internal/providers/gitlab/gitlab.go
- internal/providers/gitea/gitea_provider.go
This ensures markdown/link safety is enforced even at the final outbound boundary. Provider formatters also emit explicit warnings when sanitizer internals report errors, improving operational visibility without blocking comment posting.
Output markdown is normalized with these rules:
- Raw HTML tags in model output are escaped (rendered inert).
- Markdown links with unsafe schemes are rewritten to inert targets.
- Safe schemes are preserved:
http,https, andmailto. - Relative/anchor links remain allowed.
- Link parsing supports nested parentheses in URLs and labels with whitespace.
This keeps useful markdown formatting while removing high-risk rendering vectors.
The current web UI review/event views render content as plain text, not markdown HTML rendering. This reduces browser-side rendering risk and complements backend output sanitization.
Focused validation exists for each control point:
- internal/aisanitize/postflight_test.go
- internal/api/unified_processor_v2_post_sanitize_test.go
- internal/ai/langchain/json_repair_integration_test.go
- internal/aisanitize/markdown_test.go
- internal/providers/github/github_comment_format_test.go
- internal/providers/gitlab/gitlab_comment_format_test.go
- internal/providers/gitea/gitea_provider_test.go
Operational no-LLM verification mode is also available through:
The post-LLM mode executes focused checks and writes artifacts to:
- scripts/.sample_review_logs/sample_review.latest.log
- scripts/.sample_review_logs/sample_review.latest.json
Additional focused tests for markdown/link safety exist in provider and sanitizer packages.
Recent validation run coverage includes:
go test ./internal/aisanitize -run Markdown -count=1go test ./internal/providers/github -run Comment -count=1go test ./internal/providers/gitlab -run Comment -count=1go test ./internal/providers/gitea -run Comment -count=1bash -lc 'go build livereview.go'
This phase introduces a practical egress control with low operational friction:
- It reduces accidental sensitive-data exposure in AI outputs.
- It maintains developer productivity by avoiding brittle reject behavior.
- It provides measurable evidence through tests and run artifacts.
- It creates a stable base for stricter policy gates in future phases if needed.