feat(tools): adversarial policy agent — pre-execution LLM validation of tool calls by bug-ops · Pull Request #2457 · bug-ops/zeph

bug-ops · 2026-03-30T19:08:19Z

Summary

Implements the adversarial policy agent (#2447): before each tool call is dispatched, an independent LLM instance validates the call against user-defined plain-language policies. The validator runs in a separate context with no access to main conversation history.

New AdversarialPolicyGateExecutor<T> in zeph-tools, gated on policy-enforcer feature
New PolicyValidator with PolicyLlmClient trait and AdversarialPolicyLlmAdapter bridge in runner.rs
Fail-closed by default (fail_open = false): LLM errors → deny
Strict response parsing: only case-insensitive ALLOW/DENY accepted; anything else → deny
Tool params wrapped in triple-backtick code fence to prevent prompt injection (CRIT-11)
Full ToolExecutor trait delegation (CRIT-06)
Policy file loading with canonicalize + config-directory boundary check (SEC-01)
Executor chain order: PolicyGateExecutor → AdversarialPolicyGateExecutor → TrustGateExecutor
adversarial_policy_decision field added to AuditEntry (additive, optional)
Default timeout 3000ms (PERF-01)
79 new tests (7269 → 7348)

Config

[tools.adversarial_policy]
enabled = false
policy_provider = "fast"   # named provider from [[llm.providers]]
policy_file = "policies.txt"
fail_open = false
timeout_ms = 3000

Test plan

cargo +nightly fmt --check — clean
cargo clippy --all-targets --features full --workspace -- -D warnings — clean
cargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins — 7348/7348 pass
Arch critique (3 HIGH findings resolved: CRIT-01, CRIT-06, CRIT-11)
Security audit (SEC-01 path traversal fixed, SEC-02/04/05 acceptable)
Perf analysis (PERF-01 timeout fixed, PERF-02 caching noted as follow-up)
Code review approved after blocker fixes

Closes #2447

…of tool calls (#2447) Add AdversarialPolicyGateExecutor to zeph-tools: before each tool call, an independent LLM instance validates the call against user-defined plain-language policies loaded from a file. Runs in a separate context with no access to the main conversation history. Key design decisions: - Fail-closed by default (fail_open = false): LLM errors → deny - Strict response parsing: only "ALLOW"/"DENY" accepted; anything else → deny - Tool params wrapped in triple-backtick code fence to prevent prompt injection - PolicyLlmClient trait with AdversarialPolicyLlmAdapter wired in runner.rs - All ToolExecutor delegation methods implemented (set_skill_env, set_effective_trust, etc.) - Executor chain order: PolicyGateExecutor → AdversarialPolicyGateExecutor → TrustGateExecutor - Policy file loaded with canonicalize + boundary check (SEC-01) matching load_policy_file() - Default timeout 3000ms (fast model budget), configurable via tools.policy.timeout_ms - adversarial_policy_decision field added to AuditEntry (additive, optional) - Gated behind policy-enforcer feature flag Config: [tools.policy] enabled = false policy_provider = "fast" policy_file = "policies.txt" fail_open = false timeout_ms = 3000

github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 30, 2026

bug-ops enabled auto-merge (squash) March 30, 2026 19:08

bug-ops merged commit d8f305c into main Mar 30, 2026
27 checks passed

bug-ops deleted the feat/issue-2447/adversarial-policy-agent branch March 30, 2026 19:15

bug-ops mentioned this pull request Mar 31, 2026

research(security): Agent Audit — static analysis for LLM agent apps: dataflow + credential detection, 40/42 vulnerabilities found #2506

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): adversarial policy agent — pre-execution LLM validation of tool calls#2457

feat(tools): adversarial policy agent — pre-execution LLM validation of tool calls#2457
bug-ops merged 1 commit intomainfrom
feat/issue-2447/adversarial-policy-agent

bug-ops commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 30, 2026

Summary

Config

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant