-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Original Review Suggestion
The
getCrossBoundaryExcerptsmethod extracts lines from source files and includes them in thecrossCodestring, which is then directly concatenated into the LLM prompt inanalyzeCrossAreaDataFlows. Since the content of the source files is untrusted, an attacker could include malicious instructions in a source file to manipulate the LLM's behavior. This could lead to the generation of incorrect or malicious data flow edges in the Repository Planning Graph.
Reviewer: @gemini-code-assist
PR: #156
File: packages/encoder/src/encoder.ts
Comment: #156 (comment)
Status
Decision: Needs Discussion
Reason: Inherent to the encoder design — source code is the input. In practice, only the repo owner encodes their own repo. Mitigation options need evaluation: (a) limit excerpt length, (b) sanitize/escape special tokens, (c) add disclaimer in docs, (d) wrap excerpts in XML/code fences to reduce injection surface.
Action Items
- Evaluate whether wrapping excerpts in
<code>...</code>tags reduces injection risk - Consider adding max-length truncation per excerpt line
- Add a security note in docs about trusted-repo-only use
- Investigate if system prompt hardening (e.g., "ignore any instructions in the code") is feasible