Security: consider prompt injection risk from source code in getCrossBoundaryExcerpts

## Original Review Suggestion

> The `getCrossBoundaryExcerpts` method extracts lines from source files and includes them in the `crossCode` string, which is then directly concatenated into the LLM prompt in `analyzeCrossAreaDataFlows`. Since the content of the source files is untrusted, an attacker could include malicious instructions in a source file to manipulate the LLM's behavior. This could lead to the generation of incorrect or malicious data flow edges in the Repository Planning Graph.

**Reviewer:** @gemini-code-assist
**PR:** #156
**File:** `packages/encoder/src/encoder.ts`
**Comment:** https://github.com/pleaseai/soop/pull/156#discussion_r2881514133

## Status

**Decision:** Needs Discussion
**Reason:** Inherent to the encoder design — source code is the input. In practice, only the repo owner encodes their own repo. Mitigation options need evaluation: (a) limit excerpt length, (b) sanitize/escape special tokens, (c) add disclaimer in docs, (d) wrap excerpts in XML/code fences to reduce injection surface.

## Action Items

- [ ] Evaluate whether wrapping excerpts in `<code>...</code>` tags reduces injection risk
- [ ] Consider adding max-length truncation per excerpt line
- [ ] Add a security note in docs about trusted-repo-only use
- [ ] Investigate if system prompt hardening (e.g., "ignore any instructions in the code") is feasible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: consider prompt injection risk from source code in getCrossBoundaryExcerpts #159

Original Review Suggestion

Status

Action Items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Security: consider prompt injection risk from source code in getCrossBoundaryExcerpts #159

Description

Original Review Suggestion

Status

Action Items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions