Skip to content

Commit d8f305c

Browse files
authored
feat(tools): adversarial policy agent — pre-execution LLM validation of tool calls (#2447) (#2457)
Add AdversarialPolicyGateExecutor to zeph-tools: before each tool call, an independent LLM instance validates the call against user-defined plain-language policies loaded from a file. Runs in a separate context with no access to the main conversation history. Key design decisions: - Fail-closed by default (fail_open = false): LLM errors → deny - Strict response parsing: only "ALLOW"/"DENY" accepted; anything else → deny - Tool params wrapped in triple-backtick code fence to prevent prompt injection - PolicyLlmClient trait with AdversarialPolicyLlmAdapter wired in runner.rs - All ToolExecutor delegation methods implemented (set_skill_env, set_effective_trust, etc.) - Executor chain order: PolicyGateExecutor → AdversarialPolicyGateExecutor → TrustGateExecutor - Policy file loaded with canonicalize + boundary check (SEC-01) matching load_policy_file() - Default timeout 3000ms (fast model budget), configurable via tools.policy.timeout_ms - adversarial_policy_decision field added to AuditEntry (additive, optional) - Gated behind policy-enforcer feature flag Config: [tools.policy] enabled = false policy_provider = "fast" policy_file = "policies.txt" fail_open = false timeout_ms = 3000
1 parent 4b34cca commit d8f305c

File tree

12 files changed

+1203
-5
lines changed

12 files changed

+1203
-5
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
99
### Added
1010

1111
- feat(acp): expose current model in `session/list` and emit `SessionInfoUpdate` on model change — each in-memory `SessionInfo` now carries `meta.currentModel`; after `session/set_config_option` with `configId=model` a `SessionInfoUpdate` notification with `meta.currentModel` is sent in addition to the existing `ConfigOptionUpdate`; same notification is sent after `session/set_session_model` (closes #2435)
12+
- feat(tools): adversarial policy agent — LLM-based pre-execution tool call validation against plain-language policies; configurable fail-closed/fail-open behavior (`fail_open = false` default); prompt injection hardening via code-fence param quoting; strict allow/deny response parsing; full `ToolExecutor` trait delegation; audit log `adversarial_policy_decision` field; executor chain order `PolicyGateExecutor → AdversarialPolicyGateExecutor → TrustGateExecutor`; gated on `policy-enforcer` feature; config `[tools.adversarial_policy]` (closes #2447)
1213
- feat(memory): Memex tool output archive — before compaction, `ToolOutput` bodies in the compaction range are saved to `tool_overflow` with `archive_type = 'archive'`; archived UUIDs are appended as a postfix after LLM summarization so references survive compaction; controlled by `[memory.compression] archive_tool_outputs = false`; archives are excluded from the short-lived cleanup job via `archive_type` column (migration 054, closes #2432)
1314
- feat(memory): ACON per-category compression guidelines — `compression_failure_pairs` now stores a `category` column (`tool_output`, `assistant_reasoning`, `user_context`, `unknown`); the compression guidelines table gains a `category` column with `UNIQUE(version, category)` constraint; the `compression_guidelines` updater can now maintain per-category guideline documents when `categorized_guidelines = true`; failure category is classified from the compaction summary content before calling the LLM (migration 054, closes #2433)
1415
- feat(memory): RL-based admission control — new `AdmissionStrategy` enum with `heuristic` (default) and `rl` variants; `admission_training_data` table records all messages seen by A-MAC (admitted and rejected) to eliminate survivorship bias; `was_recalled` flag is set by `SemanticMemory::recall()` to provide positive training signal; lightweight logistic regression model in `admission_rl.rs` replaces the LLM `future_utility` factor when enough samples are available; weights persisted in `admission_rl_weights` table; controlled by `[memory.admission] admission_strategy`, `rl_min_samples = 500`, `rl_retrain_interval_secs = 3600` (migration 055, closes #2416)

crates/zeph-core/src/agent/tool_execution/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,6 +434,7 @@ impl<C: Channel> Agent<C> {
434434
injection_flagged: has_injection_flags,
435435
embedding_anomalous: false,
436436
cross_boundary_mcp_to_acp: true,
437+
adversarial_policy_decision: None,
437438
};
438439
let logger = std::sync::Arc::clone(logger);
439440
tokio::spawn(async move { logger.log(&entry).await });

crates/zeph-core/src/agent/tool_execution/native.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -764,6 +764,7 @@ impl<C: Channel> Agent<C> {
764764
injection_flagged: false,
765765
embedding_anomalous: false,
766766
cross_boundary_mcp_to_acp: false,
767+
adversarial_policy_decision: None,
767768
};
768769
let logger = std::sync::Arc::clone(logger);
769770
tokio::spawn(async move { logger.log(&entry).await });

0 commit comments

Comments
 (0)