Skip to content

[RFC] Security sandboxing CI runs against prompt injection and secret leakage #38

@Zildj1an

Description

@Zildj1an

When reviewing patches from untrusted contributors, the agent has filesystem read/write without prompting (--permission-mode acceptEdits), access to Bash in orc.md and review.md, full environment inheritance (Anthropic key, GitHub tokens, SSH, git credentials, etc.) and outbound network (Claude API, lore.kernel.org).

The untrusted input -- commit message, code comments, lore email threads -- can potentially instruct Claude to exfiltrate secrets via Bash and the network. Everything is also logged to review.json. This is mostly an issue for future automated CI pipelines reviewing external contributors.

I would propose:

  • Add a minimal image at kernel/scripts/Dockerfile. This would also provide greater reproducibility.
  • kernel/scripts/run-container.sh wraps review_one.sh/agent_one.sh, passing only Anthropic key and Claude Model.
  • Firewall rules limit egress to Anthropic's API and lore.kernel.org.
  • For extra security, the API key can be kept outside the container entirely, with a proxy reviewer forwarding requests to Anthropic's API.
  • Optionally, a dedicated agent reviews patches for potential injection attempts before the main analysis runs.

I understand CI is not current concern but leaving this issue open for when the time comes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions