[RFC] Security sandboxing CI runs against prompt injection and secret leakage

When reviewing patches from untrusted contributors, the agent has filesystem read/write without prompting (`--permission-mode acceptEdits`), access to Bash in `orc.md` and `review.md`, full environment inheritance (Anthropic key, GitHub tokens, SSH, git credentials, etc.) and outbound network (Claude API, lore.kernel.org).

The untrusted input -- commit message, code comments, lore email threads -- can potentially instruct Claude to exfiltrate secrets via Bash and the network. Everything is also logged to `review.json`. This is mostly an issue for future automated CI pipelines reviewing external contributors.

I would propose:

- Add a minimal image at `kernel/scripts/Dockerfile`. This would also provide greater reproducibility.
- `kernel/scripts/run-container.sh` wraps  `review_one.sh`/`agent_one.sh`, passing only Anthropic key and Claude Model.
- Firewall rules limit egress to Anthropic's API and lore.kernel.org.
- For extra security, the API key can be kept outside the container entirely, with a proxy reviewer forwarding requests to Anthropic's API.
- Optionally, a dedicated agent reviews patches for potential injection attempts before the main analysis runs.

I understand CI is not current concern but leaving this issue open for when the time comes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Security sandboxing CI runs against prompt injection and secret leakage #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[RFC] Security sandboxing CI runs against prompt injection and secret leakage #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions