|
| 1 | +# Case Study: Issue #1337 - False Positive Error Detection |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +**Issue:** [#1337 - False positive error detection](https://github.com/link-assistant/hive-mind/issues/1337) |
| 6 | + |
| 7 | +**Status:** Fixed |
| 8 | + |
| 9 | +**Root Cause:** The stderr error detection logic in `src/claude.lib.mjs` only recognizes emoji-prefixed warnings (`⚠️`). When Claude Code emits structured JSON log messages (format: `{"level":"warn","message":"..."}`) to stderr, the code fails to identify them as warnings. If the JSON message contains error-related keywords ("failed", "error", "not found"), the message is incorrectly added to `stderrErrors`, triggering a false positive failure. |
| 10 | + |
| 11 | +**Impact:** |
| 12 | + |
| 13 | +1. Legitimate solve sessions flagged as failures when Claude Code emits slow pre-flight warnings |
| 14 | +2. PR logs attach "failure" outcome when the actual execution succeeded |
| 15 | +3. Wasted retries and time due to phantom failures |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## The Failing Scenario |
| 20 | + |
| 21 | +The user observed: |
| 22 | + |
| 23 | +``` |
| 24 | +❌ Command failed: No messages processed and errors detected in stderr |
| 25 | +Stderr errors: |
| 26 | + {"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests."} |
| 27 | +``` |
| 28 | + |
| 29 | +The stderr message is a **warning**, not an error. However, the system treated it as an error because: |
| 30 | + |
| 31 | +1. The message contains the word **"failed"** (in "check for **failed** or slow API requests") |
| 32 | +2. The message does NOT start with `⚠️` emoji (it's a JSON structured log) |
| 33 | +3. The detection logic only checks for `⚠️` prefix to identify warnings |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Timeline of Events |
| 38 | + |
| 39 | +| Step | What Happened | |
| 40 | +| ---- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| 41 | +| 1 | `solve.mjs` spawned Claude Code CLI | |
| 42 | +| 2 | Claude Code CLI began BashTool pre-flight check (Haiku API call for bash prefix detection) | |
| 43 | +| 3 | Pre-flight check took > 10 seconds (slow API or VM networking) | |
| 44 | +| 4 | Claude Code SDK emitted a structured JSON warning to stderr: `{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected..."}` | |
| 45 | +| 5 | `claude.lib.mjs` received the stderr chunk | |
| 46 | +| 6 | Trimmed the message → does NOT start with `⚠️` | |
| 47 | +| 7 | Checked for error keywords → found "**failed**" in the message text | |
| 48 | +| 8 | Added to `stderrErrors` array ← **BUG HERE** | |
| 49 | +| 9 | Claude Code produced no messages before completing (pre-flight delay meant no actual output) | |
| 50 | +| 10 | `messageCount === 0 && toolUseCount === 0 && stderrErrors.length > 0` → triggered "Command failed" | |
| 51 | +| 11 | False positive failure reported | |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Root Cause Analysis |
| 56 | + |
| 57 | +### Primary Root Cause: Missing JSON-structured warning detection |
| 58 | + |
| 59 | +**File:** `src/claude.lib.mjs`, lines 1069–1078 |
| 60 | + |
| 61 | +**Current Code:** |
| 62 | + |
| 63 | +```javascript |
| 64 | +const trimmed = errorOutput.trim(); |
| 65 | +// Exclude warnings (messages starting with ⚠️) from being treated as errors |
| 66 | +// Example: "⚠️ [BashTool] Pre-flight check is taking longer than expected..." |
| 67 | +const isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('⚠'); |
| 68 | +// Issue #1165: Also detect "command not found" errors |
| 69 | +if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) { |
| 70 | + stderrErrors.push(trimmed); |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +**Problem:** The `isWarning` check ONLY recognizes `⚠️` emoji-prefixed messages. The Claude Code SDK and its Anthropic SDK dependency can emit JSON-structured log messages with format: |
| 75 | + |
| 76 | +```json |
| 77 | +{ "level": "warn", "message": "[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests." } |
| 78 | +``` |
| 79 | + |
| 80 | +When this message is processed: |
| 81 | + |
| 82 | +- `trimmed.startsWith('⚠️')` → `false` (starts with `{`) |
| 83 | +- `trimmed.includes('failed')` → `true` (appears in the message text) |
| 84 | +- Result: Added to `stderrErrors` → **false positive** |
| 85 | + |
| 86 | +### Secondary Issue: ANTHROPIC_LOG=debug not enabled in verbose mode |
| 87 | + |
| 88 | +The warning message itself suggests: |
| 89 | + |
| 90 | +> "Run with ANTHROPIC_LOG=debug to check for failed or slow API requests." |
| 91 | +
|
| 92 | +Currently, `--verbose` mode does NOT set `ANTHROPIC_LOG=debug`. This means users who run with `--verbose` still don't get the detailed API request logs that would help diagnose slow API calls. |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +## Evidence |
| 97 | + |
| 98 | +### Confirmed with Experiment |
| 99 | + |
| 100 | +Running `experiments/test-issue-1337-json-warning.mjs`: |
| 101 | + |
| 102 | +``` |
| 103 | +❌ FALSE POSITIVE: JSON warn format - BashTool pre-flight warning (THE ISSUE #1337) |
| 104 | + Message: "{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected..."}" |
| 105 | + Expected: NOT error, Detected: IS error |
| 106 | +
|
| 107 | +❌ FALSE POSITIVE: JSON warn format with "error" in message (should be warn, NOT error) |
| 108 | + Message: "{"level":"warn","message":"Possible error-like condition detected, but not critical"}" |
| 109 | + Expected: NOT error, Detected: IS error |
| 110 | +
|
| 111 | +❌ FALSE POSITIVE: JSON warn with "not found" in message (should be warn) |
| 112 | + Message: "{"level":"warn","message":"Some resource not found but non-critical"}" |
| 113 | + Expected: NOT error, Detected: IS error |
| 114 | +
|
| 115 | +Current behavior: 3 false positives, 0 false negatives |
| 116 | +
|
| 117 | +❌ ISSUE #1337 CONFIRMED: JSON-format warnings are incorrectly treated as errors! |
| 118 | +``` |
| 119 | + |
| 120 | +### Source of JSON-Format Warnings |
| 121 | + |
| 122 | +The JSON format originates from the Anthropic TypeScript SDK's structured logging when `ANTHROPIC_LOG` environment variable or `DEBUG` is set. The SDK uses a logger interface that can output structured JSON: |
| 123 | + |
| 124 | +- [Issue #157 in claude-agent-sdk-typescript](https://github.com/anthropics/claude-agent-sdk-typescript/issues/157): ANTHROPIC_LOG=debug corrupts SDK protocol — debug logs should go to stderr |
| 125 | +- [Issue #4859 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/4859): code CLI debug and verbose modes do not output to stderr |
| 126 | +- [Issue #2294 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/2294): Excessive API debug logs appearing in terminal output |
| 127 | +- [Issue #25025 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/25025): [BUG] [BashTool] Pre-flight check console.warn corrupts JSON output |
| 128 | + |
| 129 | +The BashTool pre-flight check timeout is 10 seconds. When the Haiku API call for bash command prefix detection exceeds 10 seconds (e.g., slow network, VM/gvisor networking, API latency), Claude Code emits a warning. This warning appears in two formats depending on the Claude Code version and logging configuration: |
| 130 | + |
| 131 | +1. **Emoji format** (handled): `⚠️ [BashTool] Pre-flight check is taking longer than expected...` |
| 132 | +2. **JSON format** (NOT handled — this issue): `{"level":"warn","message":"[BashTool] Pre-flight check..."}` |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## Proposed Solutions |
| 137 | + |
| 138 | +### Solution 1 (Implemented): Parse JSON-structured stderr messages |
| 139 | + |
| 140 | +When a stderr message is valid JSON with a `"level"` field: |
| 141 | + |
| 142 | +- If level is `"warn"`, `"info"`, or `"debug"` → treat as non-error (isWarning = true) |
| 143 | +- If level is `"error"` or `"fatal"` → treat as error |
| 144 | +- If JSON parse fails → fall back to existing keyword matching |
| 145 | + |
| 146 | +**Code change at lines 1069–1078 of `src/claude.lib.mjs`:** |
| 147 | + |
| 148 | +```javascript |
| 149 | +const trimmed = errorOutput.trim(); |
| 150 | +// Exclude warnings from being treated as errors. |
| 151 | +// Detection 1: Emoji-prefixed warnings (existing) |
| 152 | +// Example: "⚠️ [BashTool] Pre-flight check is taking longer than expected..." |
| 153 | +// Detection 2: JSON-structured log messages (Issue #1337) |
| 154 | +// Example: {"level":"warn","message":"[BashTool] Pre-flight check..."} |
| 155 | +let isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('⚠'); |
| 156 | +if (!isWarning && trimmed.startsWith('{')) { |
| 157 | + try { |
| 158 | + const parsed = JSON.parse(trimmed); |
| 159 | + if (parsed && typeof parsed.level === 'string') { |
| 160 | + const level = parsed.level.toLowerCase(); |
| 161 | + // Only "error" and "fatal" levels should be treated as errors |
| 162 | + // "warn", "warning", "info", "debug", "trace" are non-error levels |
| 163 | + if (level !== 'error' && level !== 'fatal') { |
| 164 | + isWarning = true; |
| 165 | + } |
| 166 | + } |
| 167 | + } catch { |
| 168 | + // Not valid JSON — fall through to keyword matching |
| 169 | + } |
| 170 | +} |
| 171 | +if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) { |
| 172 | + stderrErrors.push(trimmed); |
| 173 | +} |
| 174 | +``` |
| 175 | + |
| 176 | +### Solution 2 (Implemented): Enable ANTHROPIC_LOG=debug in --verbose mode |
| 177 | + |
| 178 | +In `getClaudeEnv()` in `src/config.lib.mjs`, add `ANTHROPIC_LOG: 'debug'` when the verbose flag is set. |
| 179 | + |
| 180 | +Or in `src/claude.lib.mjs`, conditionally add to `claudeEnv`: |
| 181 | + |
| 182 | +```javascript |
| 183 | +if (argv.verbose) { |
| 184 | + claudeEnv.ANTHROPIC_LOG = 'debug'; |
| 185 | +} |
| 186 | +``` |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## Related Issues |
| 191 | + |
| 192 | +- [Issue #477](https://github.com/link-assistant/hive-mind/issues/477): Original warning detection for emoji-prefixed BashTool warnings |
| 193 | +- [Issue #873](https://github.com/link-assistant/hive-mind/issues/873): Phantom error detection for `--tool agent` |
| 194 | +- [Issue #886](https://github.com/link-assistant/hive-mind/issues/886): False positive error detection for `--agent tool` |
| 195 | +- [Issue #1276](https://github.com/link-assistant/hive-mind/issues/1276): False positive error detection when agent recovers |
| 196 | + |
| 197 | +## External References |
| 198 | + |
| 199 | +- [anthropics/claude-code #25025](https://github.com/anthropics/claude-code/issues/25025): BashTool pre-flight check console.warn corrupts JSON output |
| 200 | +- [anthropics/claude-code #4859](https://github.com/anthropics/claude-code/issues/4859): code CLI debug and verbose modes do not output to stderr |
| 201 | +- [anthropics/claude-agent-sdk-typescript #157](https://github.com/anthropics/claude-agent-sdk-typescript/issues/157): ANTHROPIC_LOG=debug corrupts SDK protocol - debug logs should go to stderr |
0 commit comments