Skip to content

security(mcp): tool poisoning detection and per-tool trust metadata (#2459, #2420)#2472

Merged
bug-ops merged 1 commit intomainfrom
mcp-tool-poisoning-threat-mode
Mar 30, 2026
Merged

security(mcp): tool poisoning detection and per-tool trust metadata (#2459, #2420)#2472
bug-ops merged 1 commit intomainfrom
mcp-tool-poisoning-threat-mode

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 30, 2026

Summary

Test plan

  • 6833 tests pass (cargo nextest run --workspace --lib --bins)
  • cargo +nightly fmt --check clean
  • cargo clippy --workspace -- -D warnings clean
  • 59 new tests: injection patterns (16 individual + multi-tool + unicode), apply_injection_penalties (8 direct: zero/1/cap-at-3/cap-at-10/no-store/demotion), infer_security_meta (all keyword categories + false-positive guards), check_data_flow (all DataSensitivity x McpTrustLevel combinations), SanitizeResult population

Closes #2459
Closes #2420

…ta (#2459, #2420)

Implement multi-layered MCP client-side defenses against tool poisoning
(arXiv:2603.22489) and per-tool capability/sensitivity metadata for
data-flow policy enforcement (arXiv:2601.08012).

- sanitize_tools() now returns SanitizeResult with injection_count,
  flagged_tools, and flagged_patterns (pattern name per matched field)
- 16 injection patterns in INJECTION_PATTERNS (role override, jailbreak,
  delimiter escape, base64 payload, exfil via image/link, etc.)
- Unicode hardening: strip Cf-category format chars before pattern scan
- apply_injection_penalties(): applies trust score penalties (capped at
  MAX_INJECTION_PENALTIES_PER_REGISTRATION=3) and auto-demotes server
  trust level when recommended level is more restrictive; never promotes
- ToolSecurityMeta on McpTool: DataSensitivity (None/Low/Medium/High)
  and CapabilityClass set (FilesystemRead/Write, Network, Shell,
  DatabaseRead, MemoryWrite, ExternalApi)
- infer_security_meta(): keyword-based heuristic classifier; explicit
  filesystem keywords only, generic verbs excluded; defaults to Low
- Operator config override via mcp.servers[].tool_metadata TOML section
- check_data_flow(): blocks High-sensitivity tools on Untrusted/Sandboxed
  servers at registration time; Medium on Sandboxed emits warning
- sanitize_string delegates to sanitize_string_tracked (DRY)

Closes #2459, closes #2420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

1 participant