Skip to content

Commit 3929d0b

Browse files
authored
Merge pull request #1338 from link-assistant/issue-1337-5cf704f3fe8e
fix: prevent false positive error detection for JSON-format stderr warnings (Issue #1337)
2 parents a65ea95 + 139a73c commit 3929d0b

File tree

7 files changed

+587
-15
lines changed

7 files changed

+587
-15
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
'@link-assistant/hive-mind': patch
3+
---
4+
5+
fix: prevent false positive error detection for JSON-structured stderr warnings (Issue #1337)
6+
7+
Claude Code SDK can emit structured JSON log messages to stderr with format `{"level":"warn","message":"..."}`. When these messages contained error-related keywords like "failed", the detection logic incorrectly flagged them as errors.
8+
9+
Added JSON parsing for stderr messages starting with `{`. If the parsed JSON has a `level` field that is not `"error"` or `"fatal"`, the message is treated as a warning (non-error), preserving existing emoji-prefix detection as a fallback.
10+
11+
Also enables `ANTHROPIC_LOG=debug` when running with `--verbose` flag, allowing users to see detailed API request information as suggested by the BashTool pre-flight warning.

.github/workflows/release.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -504,6 +504,11 @@ jobs:
504504
echo "Running test suite for usage limit detection and parsing..."
505505
node tests/test-usage-limit.mjs
506506
507+
- name: Run stderr warning detection test (Issue #1337)
508+
run: |
509+
echo "Running test suite for stderr warning detection (isStderrError)..."
510+
node tests/test-issue-1337-stderr-warning-detection.mjs
511+
507512
# === EXECUTION TESTS ===
508513
test-execution:
509514
runs-on: ubuntu-latest
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# Case Study: Issue #1337 - False Positive Error Detection
2+
3+
## Executive Summary
4+
5+
**Issue:** [#1337 - False positive error detection](https://github.com/link-assistant/hive-mind/issues/1337)
6+
7+
**Status:** Fixed
8+
9+
**Root Cause:** The stderr error detection logic in `src/claude.lib.mjs` only recognizes emoji-prefixed warnings (`⚠️`). When Claude Code emits structured JSON log messages (format: `{"level":"warn","message":"..."}`) to stderr, the code fails to identify them as warnings. If the JSON message contains error-related keywords ("failed", "error", "not found"), the message is incorrectly added to `stderrErrors`, triggering a false positive failure.
10+
11+
**Impact:**
12+
13+
1. Legitimate solve sessions flagged as failures when Claude Code emits slow pre-flight warnings
14+
2. PR logs attach "failure" outcome when the actual execution succeeded
15+
3. Wasted retries and time due to phantom failures
16+
17+
---
18+
19+
## The Failing Scenario
20+
21+
The user observed:
22+
23+
```
24+
❌ Command failed: No messages processed and errors detected in stderr
25+
Stderr errors:
26+
{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests."}
27+
```
28+
29+
The stderr message is a **warning**, not an error. However, the system treated it as an error because:
30+
31+
1. The message contains the word **"failed"** (in "check for **failed** or slow API requests")
32+
2. The message does NOT start with `⚠️` emoji (it's a JSON structured log)
33+
3. The detection logic only checks for `⚠️` prefix to identify warnings
34+
35+
---
36+
37+
## Timeline of Events
38+
39+
| Step | What Happened |
40+
| ---- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
41+
| 1 | `solve.mjs` spawned Claude Code CLI |
42+
| 2 | Claude Code CLI began BashTool pre-flight check (Haiku API call for bash prefix detection) |
43+
| 3 | Pre-flight check took > 10 seconds (slow API or VM networking) |
44+
| 4 | Claude Code SDK emitted a structured JSON warning to stderr: `{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected..."}` |
45+
| 5 | `claude.lib.mjs` received the stderr chunk |
46+
| 6 | Trimmed the message → does NOT start with `⚠️` |
47+
| 7 | Checked for error keywords → found "**failed**" in the message text |
48+
| 8 | Added to `stderrErrors` array ← **BUG HERE** |
49+
| 9 | Claude Code produced no messages before completing (pre-flight delay meant no actual output) |
50+
| 10 | `messageCount === 0 && toolUseCount === 0 && stderrErrors.length > 0` → triggered "Command failed" |
51+
| 11 | False positive failure reported |
52+
53+
---
54+
55+
## Root Cause Analysis
56+
57+
### Primary Root Cause: Missing JSON-structured warning detection
58+
59+
**File:** `src/claude.lib.mjs`, lines 1069–1078
60+
61+
**Current Code:**
62+
63+
```javascript
64+
const trimmed = errorOutput.trim();
65+
// Exclude warnings (messages starting with ⚠️) from being treated as errors
66+
// Example: "⚠️ [BashTool] Pre-flight check is taking longer than expected..."
67+
const isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('');
68+
// Issue #1165: Also detect "command not found" errors
69+
if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) {
70+
stderrErrors.push(trimmed);
71+
}
72+
```
73+
74+
**Problem:** The `isWarning` check ONLY recognizes `⚠️` emoji-prefixed messages. The Claude Code SDK and its Anthropic SDK dependency can emit JSON-structured log messages with format:
75+
76+
```json
77+
{ "level": "warn", "message": "[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests." }
78+
```
79+
80+
When this message is processed:
81+
82+
- `trimmed.startsWith('⚠️')``false` (starts with `{`)
83+
- `trimmed.includes('failed')``true` (appears in the message text)
84+
- Result: Added to `stderrErrors`**false positive**
85+
86+
### Secondary Issue: ANTHROPIC_LOG=debug not enabled in verbose mode
87+
88+
The warning message itself suggests:
89+
90+
> "Run with ANTHROPIC_LOG=debug to check for failed or slow API requests."
91+
92+
Currently, `--verbose` mode does NOT set `ANTHROPIC_LOG=debug`. This means users who run with `--verbose` still don't get the detailed API request logs that would help diagnose slow API calls.
93+
94+
---
95+
96+
## Evidence
97+
98+
### Confirmed with Experiment
99+
100+
Running `experiments/test-issue-1337-json-warning.mjs`:
101+
102+
```
103+
❌ FALSE POSITIVE: JSON warn format - BashTool pre-flight warning (THE ISSUE #1337)
104+
Message: "{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected..."}"
105+
Expected: NOT error, Detected: IS error
106+
107+
❌ FALSE POSITIVE: JSON warn format with "error" in message (should be warn, NOT error)
108+
Message: "{"level":"warn","message":"Possible error-like condition detected, but not critical"}"
109+
Expected: NOT error, Detected: IS error
110+
111+
❌ FALSE POSITIVE: JSON warn with "not found" in message (should be warn)
112+
Message: "{"level":"warn","message":"Some resource not found but non-critical"}"
113+
Expected: NOT error, Detected: IS error
114+
115+
Current behavior: 3 false positives, 0 false negatives
116+
117+
❌ ISSUE #1337 CONFIRMED: JSON-format warnings are incorrectly treated as errors!
118+
```
119+
120+
### Source of JSON-Format Warnings
121+
122+
The JSON format originates from the Anthropic TypeScript SDK's structured logging when `ANTHROPIC_LOG` environment variable or `DEBUG` is set. The SDK uses a logger interface that can output structured JSON:
123+
124+
- [Issue #157 in claude-agent-sdk-typescript](https://github.com/anthropics/claude-agent-sdk-typescript/issues/157): ANTHROPIC_LOG=debug corrupts SDK protocol — debug logs should go to stderr
125+
- [Issue #4859 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/4859): code CLI debug and verbose modes do not output to stderr
126+
- [Issue #2294 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/2294): Excessive API debug logs appearing in terminal output
127+
- [Issue #25025 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/25025): [BUG] [BashTool] Pre-flight check console.warn corrupts JSON output
128+
129+
The BashTool pre-flight check timeout is 10 seconds. When the Haiku API call for bash command prefix detection exceeds 10 seconds (e.g., slow network, VM/gvisor networking, API latency), Claude Code emits a warning. This warning appears in two formats depending on the Claude Code version and logging configuration:
130+
131+
1. **Emoji format** (handled): `⚠️ [BashTool] Pre-flight check is taking longer than expected...`
132+
2. **JSON format** (NOT handled — this issue): `{"level":"warn","message":"[BashTool] Pre-flight check..."}`
133+
134+
---
135+
136+
## Proposed Solutions
137+
138+
### Solution 1 (Implemented): Parse JSON-structured stderr messages
139+
140+
When a stderr message is valid JSON with a `"level"` field:
141+
142+
- If level is `"warn"`, `"info"`, or `"debug"` → treat as non-error (isWarning = true)
143+
- If level is `"error"` or `"fatal"` → treat as error
144+
- If JSON parse fails → fall back to existing keyword matching
145+
146+
**Code change at lines 1069–1078 of `src/claude.lib.mjs`:**
147+
148+
```javascript
149+
const trimmed = errorOutput.trim();
150+
// Exclude warnings from being treated as errors.
151+
// Detection 1: Emoji-prefixed warnings (existing)
152+
// Example: "⚠️ [BashTool] Pre-flight check is taking longer than expected..."
153+
// Detection 2: JSON-structured log messages (Issue #1337)
154+
// Example: {"level":"warn","message":"[BashTool] Pre-flight check..."}
155+
let isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('');
156+
if (!isWarning && trimmed.startsWith('{')) {
157+
try {
158+
const parsed = JSON.parse(trimmed);
159+
if (parsed && typeof parsed.level === 'string') {
160+
const level = parsed.level.toLowerCase();
161+
// Only "error" and "fatal" levels should be treated as errors
162+
// "warn", "warning", "info", "debug", "trace" are non-error levels
163+
if (level !== 'error' && level !== 'fatal') {
164+
isWarning = true;
165+
}
166+
}
167+
} catch {
168+
// Not valid JSON — fall through to keyword matching
169+
}
170+
}
171+
if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) {
172+
stderrErrors.push(trimmed);
173+
}
174+
```
175+
176+
### Solution 2 (Implemented): Enable ANTHROPIC_LOG=debug in --verbose mode
177+
178+
In `getClaudeEnv()` in `src/config.lib.mjs`, add `ANTHROPIC_LOG: 'debug'` when the verbose flag is set.
179+
180+
Or in `src/claude.lib.mjs`, conditionally add to `claudeEnv`:
181+
182+
```javascript
183+
if (argv.verbose) {
184+
claudeEnv.ANTHROPIC_LOG = 'debug';
185+
}
186+
```
187+
188+
---
189+
190+
## Related Issues
191+
192+
- [Issue #477](https://github.com/link-assistant/hive-mind/issues/477): Original warning detection for emoji-prefixed BashTool warnings
193+
- [Issue #873](https://github.com/link-assistant/hive-mind/issues/873): Phantom error detection for `--tool agent`
194+
- [Issue #886](https://github.com/link-assistant/hive-mind/issues/886): False positive error detection for `--agent tool`
195+
- [Issue #1276](https://github.com/link-assistant/hive-mind/issues/1276): False positive error detection when agent recovers
196+
197+
## External References
198+
199+
- [anthropics/claude-code #25025](https://github.com/anthropics/claude-code/issues/25025): BashTool pre-flight check console.warn corrupts JSON output
200+
- [anthropics/claude-code #4859](https://github.com/anthropics/claude-code/issues/4859): code CLI debug and verbose modes do not output to stderr
201+
- [anthropics/claude-agent-sdk-typescript #157](https://github.com/anthropics/claude-agent-sdk-typescript/issues/157): ANTHROPIC_LOG=debug corrupts SDK protocol - debug logs should go to stderr
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
#!/usr/bin/env node
2+
// Test for issue #1337: Ensure JSON-format warnings in stderr are not treated as errors
3+
// This tests BOTH the old behavior (which had false positives) AND the new fixed behavior
4+
5+
// CURRENT FIXED LOGIC (from claude.lib.mjs after Issue #1337 fix)
6+
function fixedDetectionLogic(message) {
7+
const trimmed = message.trim();
8+
// Detection 1: Emoji-prefixed warnings (Issue #477)
9+
let isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('⚠');
10+
// Detection 2: JSON-structured log messages (Issue #1337)
11+
if (!isWarning && trimmed.startsWith('{')) {
12+
try {
13+
const parsed = JSON.parse(trimmed);
14+
if (parsed && typeof parsed.level === 'string') {
15+
const level = parsed.level.toLowerCase();
16+
if (level !== 'error' && level !== 'fatal') {
17+
isWarning = true;
18+
}
19+
}
20+
} catch {
21+
// Not valid JSON — fall through to keyword matching
22+
}
23+
}
24+
if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) {
25+
return true;
26+
}
27+
return false;
28+
}
29+
30+
const testCases = [
31+
// === ISSUE #1337 CASES (JSON-structured warnings) ===
32+
{
33+
name: 'JSON warn format - BashTool pre-flight warning (THE ISSUE #1337)',
34+
message: '{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests."}',
35+
shouldBeError: false,
36+
},
37+
{
38+
name: 'JSON error format - real error (should still be detected)',
39+
message: '{"level":"error","message":"API Error: 500 Internal Server Error"}',
40+
shouldBeError: true,
41+
},
42+
{
43+
name: 'JSON fatal format - fatal error (should still be detected)',
44+
message: '{"level":"fatal","message":"Connection failed permanently"}',
45+
shouldBeError: true,
46+
},
47+
{
48+
name: 'JSON info format - informational (should NOT be error)',
49+
message: '{"level":"info","message":"Session started successfully"}',
50+
shouldBeError: false,
51+
},
52+
{
53+
name: 'JSON debug format - debug message (should NOT be error)',
54+
message: '{"level":"debug","message":"Sending request to API, timeout may occur"}',
55+
shouldBeError: false,
56+
},
57+
{
58+
name: 'JSON warn with "error" keyword in message',
59+
message: '{"level":"warn","message":"Possible error-like condition detected, but not critical"}',
60+
shouldBeError: false,
61+
},
62+
{
63+
name: 'JSON warn with "not found" keyword in message',
64+
message: '{"level":"warn","message":"Some resource not found but non-critical"}',
65+
shouldBeError: false,
66+
},
67+
{
68+
name: 'JSON warn with "failed" keyword in message',
69+
message: '{"level":"warn","message":"Request failed but will retry automatically"}',
70+
shouldBeError: false,
71+
},
72+
{
73+
name: 'JSON message without level field (should fall through to keyword matching)',
74+
message: '{"message":"Something failed"}',
75+
shouldBeError: true,
76+
},
77+
// === EXISTING CASES (emoji-prefixed warnings, Issue #477) ===
78+
{
79+
name: 'Emoji-prefixed warning with "failed" word (existing Issue #477 handling)',
80+
message: '⚠️ [BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests.',
81+
shouldBeError: false,
82+
},
83+
{
84+
name: 'Emoji-prefixed warning (alternative emoji)',
85+
message: '⚠ [BashTool] Another warning message',
86+
shouldBeError: false,
87+
},
88+
// === REAL ERROR CASES (should still be detected) ===
89+
{
90+
name: 'Real error with "Error:" prefix',
91+
message: 'Error: Something went wrong',
92+
shouldBeError: true,
93+
},
94+
{
95+
name: 'Real npm error',
96+
message: 'npm error code ENOENT',
97+
shouldBeError: true,
98+
},
99+
{
100+
name: 'Real failure message',
101+
message: 'Command failed with exit code 1',
102+
shouldBeError: true,
103+
},
104+
{
105+
name: 'Command not found error',
106+
message: '/bin/sh: 1: claude: not found',
107+
shouldBeError: true,
108+
},
109+
// === EDGE CASES ===
110+
{
111+
name: 'Empty message',
112+
message: '',
113+
shouldBeError: false,
114+
},
115+
{
116+
name: 'Warning with spaces before emoji',
117+
message: ' ⚠️ Some warning text with failed in it',
118+
shouldBeError: false,
119+
},
120+
{
121+
name: 'Invalid JSON that starts with { (falls through to keyword matching)',
122+
message: '{not valid json: failed}',
123+
shouldBeError: true,
124+
},
125+
];
126+
127+
console.log('Testing stderr warning detection for issue #1337 (FIXED BEHAVIOR)...\n');
128+
129+
let passed = 0;
130+
let failed = 0;
131+
132+
for (const testCase of testCases) {
133+
const detected = fixedDetectionLogic(testCase.message);
134+
const expected = testCase.shouldBeError;
135+
const correct = detected === expected;
136+
const result = correct ? '✅ PASS' : '❌ FAIL';
137+
138+
if (correct) {
139+
passed++;
140+
} else {
141+
failed++;
142+
}
143+
144+
console.log(`${result}: ${testCase.name}`);
145+
if (!correct) {
146+
console.log(` Message: "${testCase.message.substring(0, 100)}${testCase.message.length > 100 ? '...' : ''}"`);
147+
console.log(` Expected error: ${expected}, Detected as error: ${detected}`);
148+
}
149+
console.log('');
150+
}
151+
152+
console.log(`\nResults: ${passed} passed, ${failed} failed`);
153+
154+
if (failed > 0) {
155+
console.log('❌ Some tests failed!');
156+
process.exit(1);
157+
} else {
158+
console.log('✅ All tests passed! Issue #1337 fix verified.');
159+
process.exit(0);
160+
}

0 commit comments

Comments
 (0)