Skip to content

Make agent policy approvals explainable with the Policy Prover #1532

@zredlined

Description

@zredlined

Problem Statement

OpenShell is moving toward agents that recover from sandbox denials by proposing policy changes. That creates a product challenge: how do we let agents move quickly without asking developers to inspect raw policy YAML or trust the agent's own explanation of what changed?

A reviewer needs to know what authority the proposal grants: a new destination, a new credentialed action, an opaque protocol path, or access to infrastructure metadata. The Policy Prover is a strong fit because the agent cannot argue around it or rewrite its own risk story. The prover compares the enforceable policy boundary before and after a proposal and explains what authority changed.

The prover should not assign universal severity by itself. Severity is contextual: a POST to GitHub may be expected in one workflow and suspicious in another. What the prover can reliably provide is a stable, tamper-resistant description of policy expansion. Human reviewers, reviewer agents, org policy, and workload context can layer risk judgment on top.

Parent: #1059
Related: #1097, #1062, #1038, #1435

Proposed Design

Use the Policy Prover as the deterministic explanation and gating layer for agent policy approvals.

When an agent submits a policy proposal, the gateway should compare the baseline policy to the merged proposed policy, run the prover over both states, compute only the new findings introduced by the proposal, and store a concise reviewer-readable explanation on the draft policy chunk.

Auto-approval should be possible only when both are true:

  • the prover finds no new tracked authority expansion
  • the sandbox explicitly opts into auto approval

Any new prover finding should route to human or reviewer-agent judgment.

The prover should report what changed, not a universal severity score. Initial expansion categories:

  • link_local_reach — reach to link-local metadata ranges such as 169.254.0.0/16 or fe80::/10. Cloud platforms commonly expose metadata services on these addresses, including credential-bearing endpoints such as AWS instance metadata. This access can bypass OpenShell's normal credential attachment model because credentials come from surrounding infrastructure rather than a sandbox provider.
  • l7_bypass_credentialed — an L7-opaque binary such as git, ssh, or nc gains access to a host where a credential is in scope.
  • credential_reach_expansion — a binary gains credentialed reach to a host/port it could not previously reach.
  • capability_expansion — a binary that already had credentialed reach gains a new HTTP method.

Example reviewer output:

prover: 1 new finding
  capability_expansion: PUT on api.github.com:443 via /usr/bin/curl

The product contract is: the agent may propose and explain a policy change, but the prover independently explains what authority the change actually grants. Contextual risk scoring can be layered on later, but it should consume prover findings rather than replace them.

Why This Matters

This turns formal verification into a practical developer experience. Without this layer, agent-driven policy management either asks developers to review low-level policy diffs by hand or asks them to trust the agent's natural-language explanation of the access it wants.

The prover gives OpenShell a better path: a potentially complicated policy change becomes an enforceable delta reviewers can understand.

Agent: I need to upload release metadata.
Prover: This adds PUT on api.github.com:443 for /usr/bin/curl.
Reviewer: That matches the task. Approve.

This is why categories are better than built-in severity grades at this layer. Severity is a policy decision. The prover's job is to produce stable facts about authority expansion.

Alternatives Considered

Use severity grades directly in the prover.

Severity labels are familiar, but they imply context the prover does not have. A write action may be normal for a release workflow and dangerous for a read-only analysis workflow.

Trust the agent's explanation.

The agent's explanation is useful for intent, but it cannot be the enforcement source of truth. A compromised or confused agent can understate the policy impact.

Require manual policy review for every proposal.

This is safe but defeats the agentic workflow. Developers should only be interrupted when a proposal expands tracked authority.

Let an LLM reviewer decide directly.

Reviewer agents are valuable for contextual judgment, but they need grounded inputs. The prover should provide the non-negotiable policy delta.

Agent Investigation

Definition of Done

  • Agent policy proposals are evaluated by comparing prover results for baseline and merged policy.
  • The approval surface shows only new authority introduced by the proposal.
  • Auto-approval is possible only when the prover delta is empty and the sandbox explicitly opts into auto approval.
  • Findings describe stable policy expansion categories rather than built-in contextual severity.
  • Reviewer-facing validation text is understandable without reading raw policy YAML.
  • Agent-provided intent is shown separately from prover-derived policy consequences.
  • Documentation explains why link-local reach is treated as a policy-boundary expansion.
  • Documentation explains that contextual severity and reviewer-agent recommendations can be layered on top later.

Metadata

Metadata

Assignees

Labels

area:policyPolicy engine and policy lifecycle workroadmap

Type

No type
No fields configured for issues without a type.

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions