Skip to content

Commit c745d8c

Browse files
ANcpLuaclaude
andauthored
feat(claude-self-obs): self-observability plugin for Claude Code (#150)
* docs: add spec-0002 qyl Claude Code observability Comprehensive spec for building Claude Code session observability into qyl's AI telemetry dashboard. Zero-instrumentation approach using Claude Code's native OTLP telemetry export (4 env vars). Covers: OTLP data flow, DuckDB schema, 5 API endpoints, React hooks, 4 dashboard components, SSE live streaming, and 4-phase implementation plan. Correlation via prompt.id across all events. Also fixes CLAUDE.md skill count (6 → 4 after council/invoke and feature-dev/code-review skill deletions). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(qyl-instrumentation): rebuild as Teams API orchestration (v2.0.0) Restructure from 3 standalone agents to 1 Opus captain + 4 Sonnet specialists. Captain pre-reads otelwiki bundled semconv docs before spawning specialists — eliminates runtime web search/fetch. New /observe command implements full TeamCreate → spawn → cross-pollinate → synthesize → TeamDelete lifecycle. New: opus-captain agent, qyl-platform-specialist agent, /observe command. Changed: removed WebSearch/WebFetch from all specialist tools, added Team Protocol sections with SendMessage coordination patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(qyl-instrumentation): competition-ready polish (v2.1.0) Add hero scenario (proactive secretary), 8-layer trace example, attribute decision tree, multi-turn agent traces, GenAI failure modes, TypeSpec-to-dashboard flow, MCP/SSE patterns, SEMCONV_CONTEXT shape, verification checklists, and example run walkthrough across all 6 agent/command files. All stay under 250 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(claude-self-obs): add self-observability plugin for Claude Code Every tool call (Read, Edit, Bash, Grep, WebSearch, Task, …) becomes an OTLP span posted to the configured collector. Agent lifecycle events (SubagentStart/Stop) emit trace boundary spans. Zero config — silently no-ops when no collector is running, so it never blocks the agent. Enable by starting any OTLP HTTP collector on :5100 (or set QYL_COLLECTOR_URL). Disable by stopping the collector. Files: - hooks/emit-span.sh PostToolUse → OTLP span - hooks/emit-agent-start.sh SubagentStart → agent/start span - hooks/emit-agent-stop.sh SubagentStop → agent/stop span (linked to start) - hooks/hooks.json auto-loaded hook registration - commands/status.md /claude-self-obs:status command Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f8fc0d2 commit c745d8c

File tree

8 files changed

+389
-0
lines changed

8 files changed

+389
-0
lines changed

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,24 @@ and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3030

3131
### Changed
3232

33+
- **`qyl-instrumentation` (2.0.0 → 2.1.0)**: Competition-ready polish for all 6 agent/command files. Added hero scenario (proactive secretary notification handler), 8-layer trace example, attribute decision tree, performance profile, multi-turn agent trace, GenAI failure modes with 15-second async window, TypeSpec-to-dashboard end-to-end flow, MCP tool pattern, SSE consumption pattern, SEMCONV_CONTEXT shape, spawn/synthesis verification checklists, and example run walkthrough. All files stay under 250 lines
34+
35+
### Added
36+
37+
- **`docs/specs/spec-0002-qyl-claude-code-observability.md`**: Comprehensive spec for building Claude Code session observability into qyl's AI telemetry dashboard. Covers OTLP data flow (native `claude_code.*` metrics + events), DuckDB schema, 5 API endpoints, React hooks, 4 dashboard components, SSE live streaming, and 4-phase implementation plan. Zero-instrumentation approach — uses Claude Code's built-in OTLP telemetry export via 4 env vars
38+
- **`qyl-instrumentation/commands/observe.md`**: Teams API orchestration command — Opus captain pre-reads otelwiki bundled semconv docs, assembles SEMCONV_CONTEXT + SHARED_AWARENESS, spawns 4 Sonnet specialists in parallel, coordinates cross-pollination via SendMessage, synthesizes. Zero runtime web search
39+
- **`qyl-instrumentation/agents/opus-captain.md`**: Opus captain agent — orchestrates context assembly and team coordination, reads otelwiki docs before any specialist spawns
40+
- **`qyl-instrumentation/agents/qyl-platform-specialist.md`**: 4th Sonnet specialist covering MCP server, React dashboard, browser OTLP SDK, SSE streaming, and Copilot extensibility
41+
42+
### Changed
43+
44+
- **`qyl-instrumentation`**: Rebuilt from 3 standalone agents (v1.0.0) to Teams API orchestration (v2.0.0). 1 Opus captain + 4 Sonnet specialists. Captain pre-reads otelwiki bundled docs — specialists receive pre-assembled semconv context in spawn prompts instead of web searching at runtime
45+
- **`qyl-instrumentation` agents**: Removed `WebSearch` and `WebFetch` from all 3 existing specialist tool lists. Added Team Protocol sections documenting SendMessage coordination patterns and SEMCONV_CONTEXT injection
46+
- **`qyl-instrumentation/agents/otel-genai-architect.md`**: Convention verification now references captain's SEMCONV_CONTEXT instead of WebSearch
47+
- **`marketplace.json`**: Updated qyl-instrumentation description and version (1.0.0 → 2.0.0), agent count 17 → 19, command count 23 → 24
48+
49+
### Changed
50+
3351
- **`exodia/skills/hades`**: Migrated from vague Teams references to explicit Teams API. SKILL.md now uses `TeamCreate`, `TeamDelete`, `SendMessage` (shutdown_request/shutdown_response), `TaskCreate`/`TaskList`/`TaskUpdate` with explicit parameters. Removed fallback subagent path and duplicate STEP -1 block. All 4 teammate templates (auditors, eliminators, verifiers, goggles) updated: vague `MESSAGE``SendMessage (recipient: "...")`, vague task list → `TaskCreate`/`TaskUpdate`, team context preamble and shutdown protocol added
3452
- **`exodia/eight-gates` Gate 7 EXECUTE**: Removed dual Mode A (Task subagents) / Mode B (Agent Teams) pattern. Teams API is now the single execution mode. Lane workers coordinate via `SendMessage` and claim work via `TaskCreate`/`TaskUpdate`. Collision avoidance uses teammate messaging
3553
- **`exodia/skills/hades` allowed-tools**: Added `TeamCreate`, `TeamDelete`, `TaskCreate`, `TaskList`, `TaskUpdate`, `SendMessage` to frontmatter
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"name": "claude-self-obs",
3+
"version": "1.0.0",
4+
"description": "Self-observability for Claude Code: every tool call becomes an OTLP span. Watch AI agents build software in real time. Zero config — silently no-ops when no collector is running.",
5+
"author": {
6+
"name": "ANcpLua",
7+
"url": "https://github.com/ANcpLua"
8+
},
9+
"repository": "https://github.com/ANcpLua/ancplua-claude-plugins",
10+
"license": "MIT",
11+
"keywords": [
12+
"opentelemetry",
13+
"otlp",
14+
"observability",
15+
"tracing",
16+
"hooks",
17+
"claude-code",
18+
"self-observability"
19+
],
20+
"commands": "./commands"
21+
}

plugins/claude-self-obs/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# claude-self-obs
2+
3+
**Watch AI agents build software in real time.**
4+
5+
Every Claude Code tool call (Read, Edit, Bash, Grep, WebSearch, …) becomes an OTLP span.
6+
Agent lifecycle events (spawn, stop) become trace boundaries.
7+
Everything flows to your OTLP collector — zero config, zero code changes.
8+
9+
## How it works
10+
11+
```
12+
Claude Code tool call
13+
→ PostToolUse hook fires
14+
→ emit-span.sh wraps it as OTLP ExportTraceServiceRequest
15+
→ POST to localhost:5100/v1/traces
16+
→ Collector stores + streams to dashboard
17+
```
18+
19+
**Enable:** start your OTLP collector (qyl, Jaeger, any OTLP HTTP endpoint).
20+
**Disable:** stop the collector. Hook silently no-ops — never blocks the agent.
21+
22+
## Signals captured
23+
24+
| Hook | Span name | Key attributes |
25+
|------|-----------|----------------|
26+
| PostToolUse (Read) | `tool/Read` | `file.path` |
27+
| PostToolUse (Edit) | `tool/Edit` | `file.path` |
28+
| PostToolUse (Bash) | `tool/Bash` | `bash.command` |
29+
| PostToolUse (Grep) | `tool/Grep` | `search.pattern` |
30+
| PostToolUse (WebSearch) | `tool/WebSearch` | `search.query` |
31+
| PostToolUse (Task) | `tool/Task` | `task.subagent_type`, `task.prompt` |
32+
| SubagentStart | `agent/start:{name}` | `agent.name`, `agent.type` |
33+
| SubagentStop | `agent/stop:{name}` | `agent.name`, `agent.type`, `agent.id` |
34+
35+
## Trace model
36+
37+
All spans in a session share one `traceId` (derived from `session_id`).
38+
Agent start/stop spans are parent/child pairs.
39+
Tool call spans are flat (no timing yet — `startTime == endTime`).
40+
41+
## Commands
42+
43+
| Command | What it does |
44+
|---------|-------------|
45+
| `/claude-self-obs:status` | Check if collector is reachable |
46+
47+
## Configuration
48+
49+
| Variable | Default | Purpose |
50+
|----------|---------|---------|
51+
| `QYL_COLLECTOR_URL` | `http://localhost:5100` | OTLP HTTP endpoint base URL |
52+
53+
## Dependencies
54+
55+
`curl`, `jq`, `python3` — all pre-installed on macOS.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
description: Check whether the OTLP collector is reachable and self-observability is active.
3+
---
4+
5+
# /claude-self-obs:status
6+
7+
Check if the claude-self-obs plugin is actively sending spans.
8+
9+
## What this does
10+
11+
1. Reads `$QYL_COLLECTOR_URL` (default: `http://localhost:5100`)
12+
2. Attempts a health check against the collector
13+
3. Reports: **active** (spans flowing) or **standby** (silently dropping, collector unreachable)
14+
4. Shows the last few span names received if the collector has a sessions API
15+
16+
## Steps
17+
18+
Run this in a Bash tool:
19+
20+
```bash
21+
COLLECTOR="${QYL_COLLECTOR_URL:-http://localhost:5100}"
22+
23+
echo "Checking collector at $COLLECTOR..."
24+
25+
if curl -sf "$COLLECTOR/health" > /dev/null 2>&1 \
26+
|| curl -sf "$COLLECTOR/api/v1/claude-code/sessions" > /dev/null 2>&1; then
27+
echo "✓ ACTIVE — spans are flowing to $COLLECTOR"
28+
echo ""
29+
echo "Recent sessions:"
30+
curl -sf "$COLLECTOR/api/v1/claude-code/sessions" | jq -r '.[] | " \(.session_id[:8])... \(.tool_count) spans"' 2>/dev/null || true
31+
else
32+
echo "◌ STANDBY — collector unreachable at $COLLECTOR"
33+
echo " Spans are silently dropped. Start qyl to begin collecting."
34+
echo " To override URL: export QYL_COLLECTOR_URL=http://your-collector:port"
35+
fi
36+
```
37+
38+
## Enable / Disable
39+
40+
| Action | How |
41+
|--------|-----|
42+
| **Enable** | Start qyl collector (`dotnet run` in the qyl project) |
43+
| **Disable** | Stop the collector — hook silently no-ops |
44+
| **Change URL** | `export QYL_COLLECTOR_URL=http://other-host:5100` |
45+
| **Uninstall** | Disable plugin in Claude Code settings |
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/env bash
2+
# emit-agent-start.sh — SubagentStart hook
3+
# Creates an "agent/start" span when a subagent is spawned.
4+
5+
set -euo pipefail
6+
7+
COLLECTOR_URL="${QYL_COLLECTOR_URL:-http://localhost:5100}/v1/traces"
8+
9+
INPUT=$(cat)
10+
11+
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"')
12+
AGENT_NAME=$(echo "$INPUT" | jq -r '.agent_name // "unknown"')
13+
AGENT_TYPE=$(echo "$INPUT" | jq -r '.agent_type // ""')
14+
CWD=$(echo "$INPUT" | jq -r '.cwd // ""')
15+
16+
TRACE_ID=$(printf '%s' "$SESSION_ID" | md5 -q 2>/dev/null \
17+
|| printf '%s' "$SESSION_ID" | md5sum | cut -c1-32)
18+
TRACE_ID="${TRACE_ID:0:32}"
19+
20+
# Span ID from session+agent+start (deterministic, unique per agent spawn)
21+
SPAN_KEY="${SESSION_ID}:agent_start:${AGENT_NAME}"
22+
SPAN_ID=$(printf '%s' "$SPAN_KEY" | md5 -q 2>/dev/null \
23+
|| printf '%s' "$SPAN_KEY" | md5sum | cut -c1-16)
24+
SPAN_ID="${SPAN_ID:0:16}"
25+
26+
NOW_NS=$(python3 -c "import time; print(int(time.time() * 1e9))" 2>/dev/null \
27+
|| date +%s000000000)
28+
29+
OTLP_PAYLOAD=$(jq -n \
30+
--arg trace_id "$TRACE_ID" --arg span_id "$SPAN_ID" \
31+
--arg agent "$AGENT_NAME" --arg type "$AGENT_TYPE" \
32+
--arg session "$SESSION_ID" --arg cwd "$CWD" --arg now_ns "$NOW_NS" \
33+
'{
34+
resourceSpans: [{
35+
resource: { attributes: [
36+
{ key: "service.name", value: { stringValue: "claude-code" } },
37+
{ key: "session.id", value: { stringValue: $session } },
38+
{ key: "process.cwd", value: { stringValue: $cwd } }
39+
]},
40+
scopeSpans: [{
41+
scope: { name: "claude-code.hooks", version: "1.0.0" },
42+
spans: [{
43+
traceId: $trace_id, spanId: $span_id,
44+
name: ("agent/start:" + $agent), kind: 1,
45+
startTimeUnixNano: $now_ns, endTimeUnixNano: $now_ns,
46+
attributes: [
47+
{ key: "agent.name", value: { stringValue: $agent } },
48+
{ key: "agent.type", value: { stringValue: $type } },
49+
{ key: "event", value: { stringValue: "SubagentStart" } }
50+
],
51+
status: { code: 1 }
52+
}]
53+
}]
54+
}]
55+
}')
56+
57+
curl -s -X POST "$COLLECTOR_URL" \
58+
-H "Content-Type: application/json" \
59+
-d "$OTLP_PAYLOAD" \
60+
--max-time 2 > /dev/null 2>&1 || true
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
#!/usr/bin/env bash
2+
# emit-agent-stop.sh — SubagentStop hook
3+
# Creates an "agent/stop" span when a subagent finishes.
4+
5+
set -euo pipefail
6+
7+
COLLECTOR_URL="${QYL_COLLECTOR_URL:-http://localhost:5100}/v1/traces"
8+
9+
INPUT=$(cat)
10+
11+
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"')
12+
AGENT_NAME=$(echo "$INPUT" | jq -r '.agent_name // "unknown"')
13+
AGENT_TYPE=$(echo "$INPUT" | jq -r '.agent_type // ""')
14+
AGENT_ID=$(echo "$INPUT" | jq -r '.agent_id // ""')
15+
CWD=$(echo "$INPUT" | jq -r '.cwd // ""')
16+
17+
TRACE_ID=$(printf '%s' "$SESSION_ID" | md5 -q 2>/dev/null \
18+
|| printf '%s' "$SESSION_ID" | md5sum | cut -c1-32)
19+
TRACE_ID="${TRACE_ID:0:32}"
20+
21+
SPAN_KEY="${SESSION_ID}:agent_stop:${AGENT_NAME}:${AGENT_ID}"
22+
SPAN_ID=$(printf '%s' "$SPAN_KEY" | md5 -q 2>/dev/null \
23+
|| printf '%s' "$SPAN_KEY" | md5sum | cut -c1-16)
24+
SPAN_ID="${SPAN_ID:0:16}"
25+
26+
# Parent span = the start span for this agent
27+
PARENT_KEY="${SESSION_ID}:agent_start:${AGENT_NAME}"
28+
PARENT_ID=$(printf '%s' "$PARENT_KEY" | md5 -q 2>/dev/null \
29+
|| printf '%s' "$PARENT_KEY" | md5sum | cut -c1-16)
30+
PARENT_ID="${PARENT_ID:0:16}"
31+
32+
NOW_NS=$(python3 -c "import time; print(int(time.time() * 1e9))" 2>/dev/null \
33+
|| date +%s000000000)
34+
35+
OTLP_PAYLOAD=$(jq -n \
36+
--arg trace_id "$TRACE_ID" --arg span_id "$SPAN_ID" --arg parent_id "$PARENT_ID" \
37+
--arg agent "$AGENT_NAME" --arg type "$AGENT_TYPE" --arg agent_id "$AGENT_ID" \
38+
--arg session "$SESSION_ID" --arg cwd "$CWD" --arg now_ns "$NOW_NS" \
39+
'{
40+
resourceSpans: [{
41+
resource: { attributes: [
42+
{ key: "service.name", value: { stringValue: "claude-code" } },
43+
{ key: "session.id", value: { stringValue: $session } },
44+
{ key: "process.cwd", value: { stringValue: $cwd } }
45+
]},
46+
scopeSpans: [{
47+
scope: { name: "claude-code.hooks", version: "1.0.0" },
48+
spans: [{
49+
traceId: $trace_id, spanId: $span_id, parentSpanId: $parent_id,
50+
name: ("agent/stop:" + $agent), kind: 1,
51+
startTimeUnixNano: $now_ns, endTimeUnixNano: $now_ns,
52+
attributes: [
53+
{ key: "agent.name", value: { stringValue: $agent } },
54+
{ key: "agent.type", value: { stringValue: $type } },
55+
{ key: "agent.id", value: { stringValue: $agent_id } },
56+
{ key: "event", value: { stringValue: "SubagentStop" } }
57+
],
58+
status: { code: 1 }
59+
}]
60+
}]
61+
}]
62+
}')
63+
64+
curl -s -X POST "$COLLECTOR_URL" \
65+
-H "Content-Type: application/json" \
66+
-d "$OTLP_PAYLOAD" \
67+
--max-time 2 > /dev/null 2>&1 || true
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
#!/usr/bin/env bash
2+
# emit-span.sh — PostToolUse hook
3+
# Transforms Claude Code tool calls into OTLP spans and POSTs to the collector.
4+
# Silently no-ops when collector is unreachable. Never blocks the agent.
5+
6+
set -euo pipefail
7+
8+
COLLECTOR_URL="${QYL_COLLECTOR_URL:-http://localhost:5100}/v1/traces"
9+
10+
INPUT=$(cat)
11+
12+
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"')
13+
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // "unknown"')
14+
TOOL_USE_ID=$(echo "$INPUT" | jq -r '.tool_use_id // "unknown"')
15+
CWD=$(echo "$INPUT" | jq -r '.cwd // ""')
16+
AGENT_NAME=$(echo "$INPUT" | jq -r '.agent_name // ""')
17+
AGENT_TYPE=$(echo "$INPUT" | jq -r '.agent_type // ""')
18+
19+
# Derive traceId deterministically from session_id (one trace per session)
20+
TRACE_ID=$(printf '%s' "$SESSION_ID" | md5 -q 2>/dev/null \
21+
|| printf '%s' "$SESSION_ID" | md5sum | cut -c1-32)
22+
TRACE_ID="${TRACE_ID:0:32}"
23+
24+
# Derive spanId from tool_use_id (unique per tool call)
25+
SPAN_ID=$(printf '%s' "$TOOL_USE_ID" | md5 -q 2>/dev/null \
26+
|| printf '%s' "$TOOL_USE_ID" | md5sum | cut -c1-16)
27+
SPAN_ID="${SPAN_ID:0:16}"
28+
29+
NOW_NS=$(python3 -c "import time; print(int(time.time() * 1e9))" 2>/dev/null \
30+
|| date +%s000000000)
31+
32+
TOOL_ATTRS=$(echo "$INPUT" | jq -c '[
33+
if .tool_input.file_path then { key: "file.path", value: { stringValue: .tool_input.file_path } } else empty end,
34+
if .tool_input.command then { key: "bash.command", value: { stringValue: (.tool_input.command | .[0:500]) } } else empty end,
35+
if .tool_input.pattern then { key: "search.pattern", value: { stringValue: .tool_input.pattern } } else empty end,
36+
if .tool_input.query then { key: "search.query", value: { stringValue: .tool_input.query } } else empty end,
37+
if .tool_input.url then { key: "http.url", value: { stringValue: .tool_input.url } } else empty end,
38+
if .tool_input.content then { key: "file.size_bytes", value: { intValue: (.tool_input.content | length | tostring) } } else empty end,
39+
if .tool_input.prompt then { key: "task.prompt", value: { stringValue: (.tool_input.prompt | .[0:200]) } } else empty end,
40+
if .tool_input.subagent_type then { key: "task.subagent_type", value: { stringValue: .tool_input.subagent_type } } else empty end
41+
]')
42+
43+
AGENT_ATTRS=$(jq -cn \
44+
--arg name "$AGENT_NAME" --arg type "$AGENT_TYPE" \
45+
'[
46+
if $name != "" then { key: "agent.name", value: { stringValue: $name } } else empty end,
47+
if $type != "" then { key: "agent.type", value: { stringValue: $type } } else empty end
48+
]')
49+
50+
ALL_ATTRS=$(jq -cn \
51+
--arg tool "$TOOL_NAME" \
52+
--argjson tool_attrs "$TOOL_ATTRS" \
53+
--argjson agent_attrs "$AGENT_ATTRS" \
54+
'[{ key: "tool.name", value: { stringValue: $tool } }] + $tool_attrs + $agent_attrs')
55+
56+
OTLP_PAYLOAD=$(jq -n \
57+
--arg trace_id "$TRACE_ID" --arg span_id "$SPAN_ID" \
58+
--arg tool "$TOOL_NAME" --arg session "$SESSION_ID" --arg cwd "$CWD" \
59+
--arg now_ns "$NOW_NS" --argjson attrs "$ALL_ATTRS" \
60+
'{
61+
resourceSpans: [{
62+
resource: { attributes: [
63+
{ key: "service.name", value: { stringValue: "claude-code" } },
64+
{ key: "session.id", value: { stringValue: $session } },
65+
{ key: "process.cwd", value: { stringValue: $cwd } }
66+
]},
67+
scopeSpans: [{
68+
scope: { name: "claude-code.hooks", version: "1.0.0" },
69+
spans: [{
70+
traceId: $trace_id, spanId: $span_id,
71+
name: ("tool/" + $tool), kind: 3,
72+
startTimeUnixNano: $now_ns, endTimeUnixNano: $now_ns,
73+
attributes: $attrs, status: { code: 1 }
74+
}]
75+
}]
76+
}]
77+
}')
78+
79+
curl -s -X POST "$COLLECTOR_URL" \
80+
-H "Content-Type: application/json" \
81+
-d "$OTLP_PAYLOAD" \
82+
--max-time 2 > /dev/null 2>&1 || true

0 commit comments

Comments
 (0)