feat: AI agent security panel — anomaly detection, audit trail, and behavior analysis

## Summary

Add a **Security & Anomaly Detection Panel** that monitors agent behavior for suspicious patterns — unusually large outputs, unexpected tool calls, potential prompt injection indicators, and statistical anomalies compared to baseline behavior.

## Motivation

AI agent security is one of the hottest topics in 2025. As teams run increasingly autonomous Claude Code agents on production codebases, the ability to detect anomalous behavior is critical for:

- Catching prompt injection attempts in agent task descriptions
- Detecting agents that have gone off-rails (infinite loops, runaway token usage)
- Providing an auditable trail for compliance-sensitive organizations
- Identifying agents that are unusually slow or producing unexpected output patterns

## Proposed Detection Signals

### Behavioral anomalies (statistical)

```js
// Flag messages significantly longer than agent's baseline
function detectLongOutput(agent, message, history) {
  const avgLength = history.reduce((s, m) => s + m.length, 0) / history.length;
  const stdDev = computeStdDev(history.map(m => m.length));
  return message.length > avgLength + 3 * stdDev; // 3-sigma threshold
}

// Detect message burst (agent flooding the inbox)
function detectMessageBurst(messages, windowMs = 60000, threshold = 20) {
  const now = Date.now();
  const recent = messages.filter(m => now - new Date(m.timestamp) < windowMs);
  return recent.length > threshold;
}
```

### Prompt injection heuristics

```js
const INJECTION_PATTERNS = [
  /ignore (all |previous |above )?instructions/i,
  /you are now a/i,
  /\[SYSTEM\]/i,
  /jailbreak/i,
  /DAN mode/i,
  /pretend you (are|have no)/i,
];

function checkPromptInjection(text) {
  return INJECTION_PATTERNS.filter(p => p.test(text));
}
```

### Audit trail

Every task state change and agent message is already stored in `~/.claude/`. The security panel surfaces this as a tamper-evident, chronological audit log.

## UI Layout

```
┌─────────────────────────────────────────────────────────┐
│  🛡️ Security Monitor           [Settings] [Export Audit] │
├─────────────────────────────────────────────────────────┤
│  Risk Level: LOW ████░░░░░░░░░░░░░░  2 alerts          │
├─────────────────────────────────────────────────────────┤
│  ⚠️  researcher-3  |  Unusually long output (4,200 tok) │
│     vs. baseline 800 tok avg  |  2 min ago  [Dismiss]   │
│                                                         │
│  ℹ️  coder-1  |  Message burst: 24 messages in 60s      │
│     Threshold: 20/min  |  8 min ago  [Dismiss]          │
├─────────────────────────────────────────────────────────┤
│  📋 Audit Trail (last 50 events)                        │
│  ──────────────────────────────                         │
│  17:23:45  researcher-3 → team-lead  [message]          │
│  17:23:12  team-lead assigned task to researcher-3      │
│  17:22:58  team run "my-research-team" started          │
└─────────────────────────────────────────────────────────┘
```

## Acceptance Criteria

- [ ] Statistical anomaly detection: long output, message burst, sudden silence
- [ ] Prompt injection pattern matching on incoming task descriptions
- [ ] Risk level indicator (Low / Medium / High) aggregating active alerts
- [ ] Alert list with dismiss button
- [ ] Chronological audit trail showing all agent actions
- [ ] Alerts fire in real-time via WebSocket
- [ ] Export audit trail as JSON/CSV
- [ ] All detection thresholds configurable in settings
- [ ] Detection is client-side (no data leaves local machine)
- [ ] `GET /api/security/alerts` returns current active alerts

## Note on Scope

This is intentionally **heuristic-based** (not ML-based) to keep it dependency-free and fast. The goal is to surface obvious anomalies, not to replace proper security tooling. Prompt injection detection is pattern-matching only — it will have false positives and false negatives.

## References

- OWASP Top 10 for LLM Applications (2025): https://owasp.org/www-project-top-10-for-large-language-model-applications/
- CISA AI security guidance: https://www.cisa.gov/ai
- Prompt injection examples: https://github.com/jthack/PIPE
- Anthropic's responsible scaling policy: https://www.anthropic.com/news/anthropics-responsible-scaling-policy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AI agent security panel — anomaly detection, audit trail, and behavior analysis #33

Summary

Motivation

Proposed Detection Signals

Behavioral anomalies (statistical)

Prompt injection heuristics

Audit trail

UI Layout

Acceptance Criteria

Note on Scope

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: AI agent security panel — anomaly detection, audit trail, and behavior analysis #33

Description

Summary

Motivation

Proposed Detection Signals

Behavioral anomalies (statistical)

Prompt injection heuristics

Audit trail

UI Layout

Acceptance Criteria

Note on Scope

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions