Skip to content

research(security): Agent Audit — static analysis for LLM agent apps: dataflow + credential detection, 40/42 vulnerabilities found #2506

@bug-ops

Description

@bug-ops

Source

arXiv:2603.22853 — "Agent Audit: A Security Analysis System for LLM Agent Applications" (March 24, 2026)

Finding

Agent Audit applies static program analysis to LLM agent codebases:

  • Dataflow taint tracking: traces user input → tool arguments → external calls
  • Credential detection: finds API keys/tokens that flow into tool calls without sanitization
  • Privilege-risk scoring: ranks tool combinations by blast radius (high-privilege tool + unsanitized input = high risk)

Evaluated on 20 open-source agent apps: detected 40/42 annotated vulnerabilities, 2 false negatives. False positive rate: 8%.

Applicability to Zeph

zeph-tools has ToolAuditor for runtime logging but no static analysis. The taint-tracking approach could be applied at compile time via Rust's type system or as a cargo plugin to verify:

  • User input does not reach shell executor arguments without sanitization
  • Vault keys are not exposed in tool call arguments
  • MCP server responses are quarantined before reaching shell tools

Implementation sketch:

  • Explore cargo audit-style static analysis pass for tool argument taint
  • Annotate ShellExecutor input path with #[tainted] proc-macro (design only, no proc-macro needed initially)
  • Add ToolAuditConfig.static_analysis_report = true to output a per-tool risk summary on startup
  • Near-term: document the taint paths in zeph-tools architecture notes (.local/specs/)

Priority rationale

P3: the ToolAuditor and adversarial policy already provide runtime defense (#2472, #2457). Static analysis adds defense-in-depth at development time. Low urgency but high value for the security hardening arc (#2496, #2497, #2504).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Research — medium-high complexityllmzeph-llm crate (Ollama, Claude)researchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions