Skip to content

feat(tools): adversarial policy agent — pre-execution LLM validation of tool calls#2457

Merged
bug-ops merged 1 commit intomainfrom
feat/issue-2447/adversarial-policy-agent
Mar 30, 2026
Merged

feat(tools): adversarial policy agent — pre-execution LLM validation of tool calls#2457
bug-ops merged 1 commit intomainfrom
feat/issue-2447/adversarial-policy-agent

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 30, 2026

Summary

Implements the adversarial policy agent (#2447): before each tool call is dispatched, an independent LLM instance validates the call against user-defined plain-language policies. The validator runs in a separate context with no access to main conversation history.

  • New AdversarialPolicyGateExecutor<T> in zeph-tools, gated on policy-enforcer feature
  • New PolicyValidator with PolicyLlmClient trait and AdversarialPolicyLlmAdapter bridge in runner.rs
  • Fail-closed by default (fail_open = false): LLM errors → deny
  • Strict response parsing: only case-insensitive ALLOW/DENY accepted; anything else → deny
  • Tool params wrapped in triple-backtick code fence to prevent prompt injection (CRIT-11)
  • Full ToolExecutor trait delegation (CRIT-06)
  • Policy file loading with canonicalize + config-directory boundary check (SEC-01)
  • Executor chain order: PolicyGateExecutor → AdversarialPolicyGateExecutor → TrustGateExecutor
  • adversarial_policy_decision field added to AuditEntry (additive, optional)
  • Default timeout 3000ms (PERF-01)
  • 79 new tests (7269 → 7348)

Config

[tools.adversarial_policy]
enabled = false
policy_provider = "fast"   # named provider from [[llm.providers]]
policy_file = "policies.txt"
fail_open = false
timeout_ms = 3000

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy --all-targets --features full --workspace -- -D warnings — clean
  • cargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins — 7348/7348 pass
  • Arch critique (3 HIGH findings resolved: CRIT-01, CRIT-06, CRIT-11)
  • Security audit (SEC-01 path traversal fixed, SEC-02/04/05 acceptable)
  • Perf analysis (PERF-01 timeout fixed, PERF-02 caching noted as follow-up)
  • Code review approved after blocker fixes

Closes #2447

…of tool calls (#2447)

Add AdversarialPolicyGateExecutor to zeph-tools: before each tool call, an
independent LLM instance validates the call against user-defined plain-language
policies loaded from a file. Runs in a separate context with no access to the
main conversation history.

Key design decisions:
- Fail-closed by default (fail_open = false): LLM errors → deny
- Strict response parsing: only "ALLOW"/"DENY" accepted; anything else → deny
- Tool params wrapped in triple-backtick code fence to prevent prompt injection
- PolicyLlmClient trait with AdversarialPolicyLlmAdapter wired in runner.rs
- All ToolExecutor delegation methods implemented (set_skill_env, set_effective_trust, etc.)
- Executor chain order: PolicyGateExecutor → AdversarialPolicyGateExecutor → TrustGateExecutor
- Policy file loaded with canonicalize + boundary check (SEC-01) matching load_policy_file()
- Default timeout 3000ms (fast model budget), configurable via tools.policy.timeout_ms
- adversarial_policy_decision field added to AuditEntry (additive, optional)
- Gated behind policy-enforcer feature flag

Config:
  [tools.policy]
  enabled = false
  policy_provider = "fast"
  policy_file = "policies.txt"
  fail_open = false
  timeout_ms = 3000
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 30, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 30, 2026 19:08
@bug-ops bug-ops merged commit d8f305c into main Mar 30, 2026
27 checks passed
@bug-ops bug-ops deleted the feat/issue-2447/adversarial-policy-agent branch March 30, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security(tools): adversarial policy agent — pre-execution LLM validation of tool calls against user-defined policies

1 participant