-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Summary
Add a Security & Anomaly Detection Panel that monitors agent behavior for suspicious patterns β unusually large outputs, unexpected tool calls, potential prompt injection indicators, and statistical anomalies compared to baseline behavior.
Motivation
AI agent security is one of the hottest topics in 2025. As teams run increasingly autonomous Claude Code agents on production codebases, the ability to detect anomalous behavior is critical for:
- Catching prompt injection attempts in agent task descriptions
- Detecting agents that have gone off-rails (infinite loops, runaway token usage)
- Providing an auditable trail for compliance-sensitive organizations
- Identifying agents that are unusually slow or producing unexpected output patterns
Proposed Detection Signals
Behavioral anomalies (statistical)
// Flag messages significantly longer than agent's baseline
function detectLongOutput(agent, message, history) {
const avgLength = history.reduce((s, m) => s + m.length, 0) / history.length;
const stdDev = computeStdDev(history.map(m => m.length));
return message.length > avgLength + 3 * stdDev; // 3-sigma threshold
}
// Detect message burst (agent flooding the inbox)
function detectMessageBurst(messages, windowMs = 60000, threshold = 20) {
const now = Date.now();
const recent = messages.filter(m => now - new Date(m.timestamp) < windowMs);
return recent.length > threshold;
}Prompt injection heuristics
const INJECTION_PATTERNS = [
/ignore (all |previous |above )?instructions/i,
/you are now a/i,
/\[SYSTEM\]/i,
/jailbreak/i,
/DAN mode/i,
/pretend you (are|have no)/i,
];
function checkPromptInjection(text) {
return INJECTION_PATTERNS.filter(p => p.test(text));
}Audit trail
Every task state change and agent message is already stored in ~/.claude/. The security panel surfaces this as a tamper-evident, chronological audit log.
UI Layout
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π‘οΈ Security Monitor [Settings] [Export Audit] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Risk Level: LOW ββββββββββββββββββ 2 alerts β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β οΈ researcher-3 | Unusually long output (4,200 tok) β
β vs. baseline 800 tok avg | 2 min ago [Dismiss] β
β β
β βΉοΈ coder-1 | Message burst: 24 messages in 60s β
β Threshold: 20/min | 8 min ago [Dismiss] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π Audit Trail (last 50 events) β
β ββββββββββββββββββββββββββββββ β
β 17:23:45 researcher-3 β team-lead [message] β
β 17:23:12 team-lead assigned task to researcher-3 β
β 17:22:58 team run "my-research-team" started β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Acceptance Criteria
- Statistical anomaly detection: long output, message burst, sudden silence
- Prompt injection pattern matching on incoming task descriptions
- Risk level indicator (Low / Medium / High) aggregating active alerts
- Alert list with dismiss button
- Chronological audit trail showing all agent actions
- Alerts fire in real-time via WebSocket
- Export audit trail as JSON/CSV
- All detection thresholds configurable in settings
- Detection is client-side (no data leaves local machine)
-
GET /api/security/alertsreturns current active alerts
Note on Scope
This is intentionally heuristic-based (not ML-based) to keep it dependency-free and fast. The goal is to surface obvious anomalies, not to replace proper security tooling. Prompt injection detection is pattern-matching only β it will have false positives and false negatives.
References
- OWASP Top 10 for LLM Applications (2025): https://owasp.org/www-project-top-10-for-large-language-model-applications/
- CISA AI security guidance: https://www.cisa.gov/ai
- Prompt injection examples: https://github.com/jthack/PIPE
- Anthropic's responsible scaling policy: https://www.anthropic.com/news/anthropics-responsible-scaling-policy