[Paper Suggestion] AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems — fits Benchmarks & Evaluation + Security Agents

## Paper Suggestion: AgentLeak

Hi! I'd like to suggest adding **AgentLeak** to the **Benchmarks & Evaluation** section (and potentially Security Agents) — it's the first full-stack benchmark for privacy leakage across multi-agent LLM pipelines.

### 📄 Paper

**"AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems"**
- **arXiv**: https://arxiv.org/abs/2602.11510
- **GitHub**: https://github.com/Privatris/AgentLeak
- **Project page**: https://privatris.github.io/AgentLeak/
- **Authors**: Faouzi El Yagoubi, Godwin Badu-Marfo, Ranwa Al Mallah (Polytechnique Montreal)
- **Date**: 2025-02

### 🔍 Why it fits

AgentLeak benchmarks AI security risks (privacy leakage) in autonomous multi-agent systems — directly relevant to this list's focus on AI for security and security agents:

- **7 leakage channels** tracked simultaneously: tool calls, inter-agent messages, RAG queries, code execution, API calls, final outputs, reasoning traces
- **68.8% of privacy leakage occurs in inter-agent messages** — invisible to output-only auditing
- **41.7% of leakage is missed** by output-only evaluation (standard practice today)
- Tests across AutoGen, LangGraph, CrewAI frameworks
- Evaluates GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3.1 70B
- Directly addresses security risks of **autonomous agents** communicating with each other

This bridges the Benchmarks & Evaluation section (security evaluation of LLM agents) and Security Agents section (understanding vulnerabilities in multi-agent architectures).

Thanks for maintaining this resource!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paper Suggestion] AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems — fits Benchmarks & Evaluation + Security Agents #8

Paper Suggestion: AgentLeak

📄 Paper

🔍 Why it fits

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Paper Suggestion] AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems — fits Benchmarks & Evaluation + Security Agents #8

Description

Paper Suggestion: AgentLeak

📄 Paper

🔍 Why it fits

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions