Skip to content

feat: AI agent security panel β€” anomaly detection, audit trail, and behavior analysisΒ #33

@mukul975

Description

@mukul975

Summary

Add a Security & Anomaly Detection Panel that monitors agent behavior for suspicious patterns β€” unusually large outputs, unexpected tool calls, potential prompt injection indicators, and statistical anomalies compared to baseline behavior.

Motivation

AI agent security is one of the hottest topics in 2025. As teams run increasingly autonomous Claude Code agents on production codebases, the ability to detect anomalous behavior is critical for:

  • Catching prompt injection attempts in agent task descriptions
  • Detecting agents that have gone off-rails (infinite loops, runaway token usage)
  • Providing an auditable trail for compliance-sensitive organizations
  • Identifying agents that are unusually slow or producing unexpected output patterns

Proposed Detection Signals

Behavioral anomalies (statistical)

// Flag messages significantly longer than agent's baseline
function detectLongOutput(agent, message, history) {
  const avgLength = history.reduce((s, m) => s + m.length, 0) / history.length;
  const stdDev = computeStdDev(history.map(m => m.length));
  return message.length > avgLength + 3 * stdDev; // 3-sigma threshold
}

// Detect message burst (agent flooding the inbox)
function detectMessageBurst(messages, windowMs = 60000, threshold = 20) {
  const now = Date.now();
  const recent = messages.filter(m => now - new Date(m.timestamp) < windowMs);
  return recent.length > threshold;
}

Prompt injection heuristics

const INJECTION_PATTERNS = [
  /ignore (all |previous |above )?instructions/i,
  /you are now a/i,
  /\[SYSTEM\]/i,
  /jailbreak/i,
  /DAN mode/i,
  /pretend you (are|have no)/i,
];

function checkPromptInjection(text) {
  return INJECTION_PATTERNS.filter(p => p.test(text));
}

Audit trail

Every task state change and agent message is already stored in ~/.claude/. The security panel surfaces this as a tamper-evident, chronological audit log.

UI Layout

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ›‘οΈ Security Monitor           [Settings] [Export Audit] β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Risk Level: LOW β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘  2 alerts          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  ⚠️  researcher-3  |  Unusually long output (4,200 tok) β”‚
β”‚     vs. baseline 800 tok avg  |  2 min ago  [Dismiss]   β”‚
β”‚                                                         β”‚
β”‚  ℹ️  coder-1  |  Message burst: 24 messages in 60s      β”‚
β”‚     Threshold: 20/min  |  8 min ago  [Dismiss]          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  πŸ“‹ Audit Trail (last 50 events)                        β”‚
β”‚  ──────────────────────────────                         β”‚
β”‚  17:23:45  researcher-3 β†’ team-lead  [message]          β”‚
β”‚  17:23:12  team-lead assigned task to researcher-3      β”‚
β”‚  17:22:58  team run "my-research-team" started          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Acceptance Criteria

  • Statistical anomaly detection: long output, message burst, sudden silence
  • Prompt injection pattern matching on incoming task descriptions
  • Risk level indicator (Low / Medium / High) aggregating active alerts
  • Alert list with dismiss button
  • Chronological audit trail showing all agent actions
  • Alerts fire in real-time via WebSocket
  • Export audit trail as JSON/CSV
  • All detection thresholds configurable in settings
  • Detection is client-side (no data leaves local machine)
  • GET /api/security/alerts returns current active alerts

Note on Scope

This is intentionally heuristic-based (not ML-based) to keep it dependency-free and fast. The goal is to surface obvious anomalies, not to replace proper security tooling. Prompt injection detection is pattern-matching only β€” it will have false positives and false negatives.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions