Skip to content

feat(policy): add agentic approval loop#1528

Draft
zredlined wants to merge 5 commits into
mainfrom
1097-agentic-policy-approval-loop
Draft

feat(policy): add agentic approval loop#1528
zredlined wants to merge 5 commits into
mainfrom
1097-agentic-policy-approval-loop

Conversation

@zredlined
Copy link
Copy Markdown
Collaborator

@zredlined zredlined commented May 22, 2026

Summary

Ships the agentic policy approval loop end-to-end. When the sandbox denies a network request, an agent inside the sandbox can propose a narrow policy refinement; the gateway runs a formal prover against the merged-policy delta; safe proposals (no new findings) auto-approve in ~1s; risky ones land in pending with structured evidence the reviewer can act on. The agent waits on a socket — zero LLM tokens burn during human review.

This is the loop the platform has been building toward: agents do the narrowing work, the prover catches changes the operator should know about, and the audit trail makes every approval reconstructable.

Closes #1097
Refs #1062
Refs #1532

What this PR ships

The loop. Sandbox denial → agent reads /etc/openshell/skills/policy_advisor.md → agent POSTs a narrow proposal to policy.local → gateway runs the prover → either auto-approve (empty delta) or pending (any finding) → on approval, sandbox hot-reloads → agent retries.

Prover wired in as the auto-approval referee. Every proposal (mechanistic and agent-authored alike) runs through openshell-prover. The prover answers four categorical questions about the proposed change — see What the prover decides. The gateway computes the delta vs the baseline policy and the auto-approval gate fires only when the delta is empty.

Providers-v2 in the loop. The prover validates against the effective policy — provider profiles composed in via providers-v2 are part of the model the prover reasons over. Agent-authored chunks for endpoints a provider profile covers land as their own rules (Fix A in merge.rs) instead of getting silently absorbed into the provider rule, so the prover sees the agent's narrow contribution honestly.

Default-deny posture preserved. Auto-approval is opt-in through the standard settings model: gateway-scoped proposal_approval_mode wins, sandbox-scoped settings apply otherwise, and the default is manual review. "auto" enables empty-delta auto-approval. CLI keeps openshell sandbox create --approval-mode <manual|auto> as shorthand by writing the sandbox setting after creation.

Demo that walks the full loop. examples/agent-driven-policy-management/demo.sh runs a Codex agent through a two-path flow against a local gateway: one un-credentialed action auto-approves silently; one credentialed action escalates with a categorical finding, demo.sh approves on behalf, the agent retries and the file lands in GitHub. End-to-end in ~50–110s with one human-visible escalation, exactly the kind the prover cannot decide unilaterally.

Reconstructable audit. Every auto-approval emits a CONFIG:APPROVED OCSF event with unmapped fields auto=true, source=<mechanistic|agent_authored>, prover_delta=empty, and resolved_from=<gateway|sandbox|default>. The chunk's persisted validation_result carries the categorical finding lines for human-reviewed approvals.

Provider profile tightening. providers/github.yaml defaults api.github.com from read-write to read-only. Writes (gh / git via REST) now flow through the agentic loop — the loop becomes the on-ramp to write access, and the prover audits each capability change.

What the prover decides

The prover answers four formal questions about each proposed change. Each "yes" is its own categorical finding — no severity grade. Any finding blocks auto-approval; empty delta means the change is provably safe under the model.

Category The prover detects
link_local_reach Reach to a host in 169.254.0.0/16 or fe80::/10 (cloud-metadata range, serves credentials).
l7_bypass_credentialed A binary using a wire protocol the L7 proxy cannot inspect (git-remote-https, ssh, nc) reaches a host where a credential is in scope.
credential_reach_expansion A binary gains credentialed reach to a (host, port) it could not reach before.
capability_expansion On a (binary, host, port) that already had credentialed reach, the policy adds a new HTTP method. Finding cites the specific method.

Detail in crates/openshell-prover/README.md.

What the demo shows

==> Step 1 — un-credentialed reach (auto-approves)
   curl GET raw.githubusercontent.com/.../api.github.com.json
   prover: no findings (no credential in scope for the host)
   gateway: auto-approved in ~1s
   audit: "auto-approved: no new prover findings (source=agent_authored)"

==> Step 2 — credentialed capability change (escalates)
   curl PUT api.github.com/.../specific.md
   prover: credential_reach_expansion (or capability_expansion) on api.github.com:443
   gateway: pending — human review required
   demo.sh approves on behalf → agent retries → file lands in github

Acceptance criteria (deterministic, in tests)

  1. Un-credentialed reach auto-approves under auto-mode (zero findings, terminal status approved).
  2. Credentialed reach expansion lands in pending with credential_reach_expansion in validation_result.
  3. Capability expansion on an already-reached credentialed host lands in pending with capability_expansion citing the new method.
  4. Link-local reach lands in pending unconditionally with link_local_reach.
  5. L7-bypass binary with credential lands in pending with l7_bypass_credentialed.
  6. Implicit supersede works in both directions on (host, port, binary) overlap.
  7. Default approval mode is manual — empty delta does NOT auto-approve when proposal_approval_mode is unset, "manual", or any unknown future value.
  8. Approval mode resolves through settings: gateway scope wins over sandbox scope, and CLI --approval-mode auto writes the sandbox-scoped setting after create.
  9. Auto-approval audit carries auto=true, source=<mode>, prover_delta=empty, and resolved_from=<gateway|sandbox|default> as unmapped OCSF fields.
  10. Agent-submitted rule names using the reserved _provider_ prefix are rejected at submit time.
  11. Categorical findings (no severity tiers) appear in validation_result.

All covered by unit and integration tests in crates/openshell-server/src/grpc/policy.rs::tests.

Testing

  • cargo test --workspace --lib — 534 gateway tests, all 16 crates green.
  • cargo clippy -p openshell-server -p openshell-cli -p openshell-core --all-targets -- -D warnings — clean.
  • cargo fmt --check — clean.
  • ./examples/agent-driven-policy-management/demo.sh runs end-to-end against the local Docker gateway and writes the demo file to GitHub.

Explicitly deferred (follow-up PRs)

  • LLM-based contextual review layered on top of the deterministic gate.
  • Intent files / per-sandbox config of "which findings auto-reject vs. escalate."
  • Credential scope modeling (read-only vs write-scoped tokens).
  • MCP as a third L7 surface (REST + GraphQL + MCP).
  • Per-binary credential isolation (binaries see only the credentials their policy authorizes).
  • L7 watch mode for L4 grants (record HTTP requests through approved L4 tunnels for later L4→L7 conversion).
  • Trust tiers per sandbox class (production sandboxes get tighter defaults).
  • Dedicated CONFIG:AUTO_APPROVED OCSF event class (today reuses CONFIG:APPROVED with auto=true unmapped).
  • User-facing docs page under docs/ for the agentic loop.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

zredlined added 4 commits May 19, 2026 11:07
Signed-off-by: Alexander Watson <zredlined@gmail.com>
…roval

Run the prover on every proposal regardless of analysis_mode. Auto-approve
proposals whose merged-policy delta is empty (proposer-agnostic, with the
global-policy gate respected). Calibrate prover findings to a single HIGH
severity emitted on link-local hosts, L4+credential-in-scope, and
bypass-L7-binary+credential-in-scope. Add implicit supersede on
(host, port, binary): newer submissions auto-reject older pending chunks,
and incoming mechanistic chunks auto-reject when an approved agent_authored
chunk already covers the same endpoint.

Audit auto-approvals via CONFIG:APPROVED OCSF events carrying auto=true,
source=<mode>, prover_delta=empty as unmapped fields, with message text
"auto-approved: no new prover findings". Build credential set from
sandbox-attached providers (presence only — no scope modeling in v1).
Signed-off-by: Alexander Watson <zredlined@gmail.com>
The prover now answers four formal questions about a proposed policy
change and emits one finding per "yes" answer:

  - link_local_reach
  - l7_bypass_credentialed
  - credential_reach_expansion
  - capability_expansion

There is no severity grade. The category name is the signal; the
per-path evidence carries the structured detail. The auto-approval
gate is binary — empty delta or not. This removes the previous
HIGH/MEDIUM/CRITICAL severity tiers and the narrowness classifier
that was inconsistent across the access-shorthand / explicit-rules
boundary.

Gateway-side finding_delta gains category suppression:
capability_expansion paths whose (binary, host, port) appears in the
credential_reach_expansion delta are suppressed, so a brand-new
credentialed reach surfaces as one finding rather than one reach plus
N method findings.

The github provider profile now defaults api.github.com to read-only
(was: read-write). Writes flow through the agentic loop — the prover
audits each capability change rather than treating broad write access
as the default.

Demo, sandbox skill, and architecture docs updated to describe the
four-category model. Prover gains a README.md documenting the formal
queries, evidence shape, and how to add a new category.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(gateway): persist and validate agent policy proposal operations

1 participant