link-assistant
diff --git a/‎.changeset/fix-json-stderr-warning-detection.md‎
Lines changed: 11 additions & 0 deletions b/‎.changeset/fix-json-stderr-warning-detection.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎.github/workflows/release.yml‎
Lines changed: 5 additions & 0 deletions b/‎.github/workflows/release.yml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/case-studies/issue-1337/README.md‎
Lines changed: 201 additions & 0 deletions b/‎docs/case-studies/issue-1337/README.md‎
Lines changed: 201 additions & 0 deletions
diff --git a/‎experiments/test-issue-1337-json-warning.mjs‎
Lines changed: 160 additions & 0 deletions b/‎experiments/test-issue-1337-json-warning.mjs‎
Lines changed: 160 additions & 0 deletions
@@ -0,0 +1,11 @@
+---
+'@link-assistant/hive-mind': patch
+---
+
+fix: prevent false positive error detection for JSON-structured stderr warnings (Issue #1337)
+
+Claude Code SDK can emit structured JSON log messages to stderr with format `{"level":"warn","message":"..."}`. When these messages contained error-related keywords like "failed", the detection logic incorrectly flagged them as errors.
+
+Added JSON parsing for stderr messages starting with `{`. If the parsed JSON has a `level` field that is not `"error"` or `"fatal"`, the message is treated as a warning (non-error), preserving existing emoji-prefix detection as a fallback.
+
+Also enables `ANTHROPIC_LOG=debug` when running with `--verbose` flag, allowing users to see detailed API request information as suggested by the BashTool pre-flight warning.
@@ -504,6 +504,11 @@ jobs:
         echo "Running test suite for usage limit detection and parsing..."
         node tests/test-usage-limit.mjs
 
+    - name: Run stderr warning detection test (Issue #1337)
+      run: |
+        echo "Running test suite for stderr warning detection (isStderrError)..."
+        node tests/test-issue-1337-stderr-warning-detection.mjs
+
   # === EXECUTION TESTS ===
   test-execution:
     runs-on: ubuntu-latest
 
@@ -0,0 +1,201 @@
+# Case Study: Issue #1337 - False Positive Error Detection
+
+## Executive Summary
+
+**Issue:** [#1337 - False positive error detection](https://github.com/link-assistant/hive-mind/issues/1337)
+
+**Status:** Fixed
+
+**Root Cause:** The stderr error detection logic in `src/claude.lib.mjs` only recognizes emoji-prefixed warnings (`⚠️`). When Claude Code emits structured JSON log messages (format: `{"level":"warn","message":"..."}`) to stderr, the code fails to identify them as warnings. If the JSON message contains error-related keywords ("failed", "error", "not found"), the message is incorrectly added to `stderrErrors`, triggering a false positive failure.
+
+**Impact:**
+
+1. Legitimate solve sessions flagged as failures when Claude Code emits slow pre-flight warnings
+2. PR logs attach "failure" outcome when the actual execution succeeded
+3. Wasted retries and time due to phantom failures
+
+---
+
+## The Failing Scenario
+
+The user observed:
+
+```
+❌ Command failed: No messages processed and errors detected in stderr
+Stderr errors:
+   {"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests."}
+```
+
+The stderr message is a **warning**, not an error. However, the system treated it as an error because:
+
+1. The message contains the word **"failed"** (in "check for **failed** or slow API requests")
+2. The message does NOT start with `⚠️` emoji (it's a JSON structured log)
+3. The detection logic only checks for `⚠️` prefix to identify warnings
+
+---
+
+## Timeline of Events
+
+| Step | What Happened                                                                                                                                             |
+| ---- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 1    | `solve.mjs` spawned Claude Code CLI                                                                                                                       |
+| 2    | Claude Code CLI began BashTool pre-flight check (Haiku API call for bash prefix detection)                                                                |
+| 3    | Pre-flight check took > 10 seconds (slow API or VM networking)                                                                                            |
+| 4    | Claude Code SDK emitted a structured JSON warning to stderr: `{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected..."}` |
+| 5    | `claude.lib.mjs` received the stderr chunk                                                                                                                |
+| 6    | Trimmed the message → does NOT start with `⚠️`                                                                                                            |
+| 7    | Checked for error keywords → found "**failed**" in the message text                                                                                       |
+| 8    | Added to `stderrErrors` array ← **BUG HERE**                                                                                                              |
+| 9    | Claude Code produced no messages before completing (pre-flight delay meant no actual output)                                                              |
+| 10   | `messageCount === 0 && toolUseCount === 0 && stderrErrors.length > 0` → triggered "Command failed"                                                        |
+| 11   | False positive failure reported                                                                                                                           |
+
+---
+
+## Root Cause Analysis
+
+### Primary Root Cause: Missing JSON-structured warning detection
+
+**File:** `src/claude.lib.mjs`, lines 1069–1078
+
+**Current Code:**
+
+```javascript
+const trimmed = errorOutput.trim();
+// Exclude warnings (messages starting with ⚠️) from being treated as errors
+// Example: "⚠️  [BashTool] Pre-flight check is taking longer than expected..."
+const isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('⚠');
+// Issue #1165: Also detect "command not found" errors
+if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) {
+  stderrErrors.push(trimmed);
+}
+```
+
+**Problem:** The `isWarning` check ONLY recognizes `⚠️` emoji-prefixed messages. The Claude Code SDK and its Anthropic SDK dependency can emit JSON-structured log messages with format:
+
+```json
+{ "level": "warn", "message": "[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests." }
+```
+
+When this message is processed:
+
+- `trimmed.startsWith('⚠️')` → `false` (starts with `{`)
+- `trimmed.includes('failed')` → `true` (appears in the message text)
+- Result: Added to `stderrErrors` → **false positive**
+
+### Secondary Issue: ANTHROPIC_LOG=debug not enabled in verbose mode
+
+The warning message itself suggests:
+
+> "Run with ANTHROPIC_LOG=debug to check for failed or slow API requests."
+
+Currently, `--verbose` mode does NOT set `ANTHROPIC_LOG=debug`. This means users who run with `--verbose` still don't get the detailed API request logs that would help diagnose slow API calls.
+
+---
+
+## Evidence
+
+### Confirmed with Experiment
+
+Running `experiments/test-issue-1337-json-warning.mjs`:
+
+```
+❌ FALSE POSITIVE: JSON warn format - BashTool pre-flight warning (THE ISSUE #1337)
+   Message: "{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected..."}"
+   Expected: NOT error, Detected: IS error
+
+❌ FALSE POSITIVE: JSON warn format with "error" in message (should be warn, NOT error)
+   Message: "{"level":"warn","message":"Possible error-like condition detected, but not critical"}"
+   Expected: NOT error, Detected: IS error
+
+❌ FALSE POSITIVE: JSON warn with "not found" in message (should be warn)
+   Message: "{"level":"warn","message":"Some resource not found but non-critical"}"
+   Expected: NOT error, Detected: IS error
+
+Current behavior: 3 false positives, 0 false negatives
+
+❌ ISSUE #1337 CONFIRMED: JSON-format warnings are incorrectly treated as errors!
+```
+
+### Source of JSON-Format Warnings
+
+The JSON format originates from the Anthropic TypeScript SDK's structured logging when `ANTHROPIC_LOG` environment variable or `DEBUG` is set. The SDK uses a logger interface that can output structured JSON:
+
+- [Issue #157 in claude-agent-sdk-typescript](https://github.com/anthropics/claude-agent-sdk-typescript/issues/157): ANTHROPIC_LOG=debug corrupts SDK protocol — debug logs should go to stderr
+- [Issue #4859 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/4859): code CLI debug and verbose modes do not output to stderr
+- [Issue #2294 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/2294): Excessive API debug logs appearing in terminal output
+- [Issue #25025 in anthropics/claude-code](https://github.com/anthropics/claude-code/issues/25025): [BUG] [BashTool] Pre-flight check console.warn corrupts JSON output
+
+The BashTool pre-flight check timeout is 10 seconds. When the Haiku API call for bash command prefix detection exceeds 10 seconds (e.g., slow network, VM/gvisor networking, API latency), Claude Code emits a warning. This warning appears in two formats depending on the Claude Code version and logging configuration:
+
+1. **Emoji format** (handled): `⚠️  [BashTool] Pre-flight check is taking longer than expected...`
+2. **JSON format** (NOT handled — this issue): `{"level":"warn","message":"[BashTool] Pre-flight check..."}`
+
+---
+
+## Proposed Solutions
+
+### Solution 1 (Implemented): Parse JSON-structured stderr messages
+
+When a stderr message is valid JSON with a `"level"` field:
+
+- If level is `"warn"`, `"info"`, or `"debug"` → treat as non-error (isWarning = true)
+- If level is `"error"` or `"fatal"` → treat as error
+- If JSON parse fails → fall back to existing keyword matching
+
+**Code change at lines 1069–1078 of `src/claude.lib.mjs`:**
+
+```javascript
+const trimmed = errorOutput.trim();
+// Exclude warnings from being treated as errors.
+// Detection 1: Emoji-prefixed warnings (existing)
+// Example: "⚠️  [BashTool] Pre-flight check is taking longer than expected..."
+// Detection 2: JSON-structured log messages (Issue #1337)
+// Example: {"level":"warn","message":"[BashTool] Pre-flight check..."}
+let isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('⚠');
+if (!isWarning && trimmed.startsWith('{')) {
+  try {
+    const parsed = JSON.parse(trimmed);
+    if (parsed && typeof parsed.level === 'string') {
+      const level = parsed.level.toLowerCase();
+      // Only "error" and "fatal" levels should be treated as errors
+      // "warn", "warning", "info", "debug", "trace" are non-error levels
+      if (level !== 'error' && level !== 'fatal') {
+        isWarning = true;
+      }
+    }
+  } catch {
+    // Not valid JSON — fall through to keyword matching
+  }
+}
+if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) {
+  stderrErrors.push(trimmed);
+}
+```
+
+### Solution 2 (Implemented): Enable ANTHROPIC_LOG=debug in --verbose mode
+
+In `getClaudeEnv()` in `src/config.lib.mjs`, add `ANTHROPIC_LOG: 'debug'` when the verbose flag is set.
+
+Or in `src/claude.lib.mjs`, conditionally add to `claudeEnv`:
+
+```javascript
+if (argv.verbose) {
+  claudeEnv.ANTHROPIC_LOG = 'debug';
+}
+```
+
+---
+
+## Related Issues
+
+- [Issue #477](https://github.com/link-assistant/hive-mind/issues/477): Original warning detection for emoji-prefixed BashTool warnings
+- [Issue #873](https://github.com/link-assistant/hive-mind/issues/873): Phantom error detection for `--tool agent`
+- [Issue #886](https://github.com/link-assistant/hive-mind/issues/886): False positive error detection for `--agent tool`
+- [Issue #1276](https://github.com/link-assistant/hive-mind/issues/1276): False positive error detection when agent recovers
+
+## External References
+
+- [anthropics/claude-code #25025](https://github.com/anthropics/claude-code/issues/25025): BashTool pre-flight check console.warn corrupts JSON output
+- [anthropics/claude-code #4859](https://github.com/anthropics/claude-code/issues/4859): code CLI debug and verbose modes do not output to stderr
+- [anthropics/claude-agent-sdk-typescript #157](https://github.com/anthropics/claude-agent-sdk-typescript/issues/157): ANTHROPIC_LOG=debug corrupts SDK protocol - debug logs should go to stderr
@@ -0,0 +1,160 @@
+#!/usr/bin/env node
+// Test for issue #1337: Ensure JSON-format warnings in stderr are not treated as errors
+// This tests BOTH the old behavior (which had false positives) AND the new fixed behavior
+
+// CURRENT FIXED LOGIC (from claude.lib.mjs after Issue #1337 fix)
+function fixedDetectionLogic(message) {
+  const trimmed = message.trim();
+  // Detection 1: Emoji-prefixed warnings (Issue #477)
+  let isWarning = trimmed.startsWith('⚠️') || trimmed.startsWith('⚠');
+  // Detection 2: JSON-structured log messages (Issue #1337)
+  if (!isWarning && trimmed.startsWith('{')) {
+    try {
+      const parsed = JSON.parse(trimmed);
+      if (parsed && typeof parsed.level === 'string') {
+        const level = parsed.level.toLowerCase();
+        if (level !== 'error' && level !== 'fatal') {
+          isWarning = true;
+        }
+      }
+    } catch {
+      // Not valid JSON — fall through to keyword matching
+    }
+  }
+  if (trimmed && !isWarning && (trimmed.includes('Error:') || trimmed.includes('error') || trimmed.includes('failed') || trimmed.includes('not found'))) {
+    return true;
+  }
+  return false;
+}
+
+const testCases = [
+  // === ISSUE #1337 CASES (JSON-structured warnings) ===
+  {
+    name: 'JSON warn format - BashTool pre-flight warning (THE ISSUE #1337)',
+    message: '{"level":"warn","message":"[BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests."}',
+    shouldBeError: false,
+  },
+  {
+    name: 'JSON error format - real error (should still be detected)',
+    message: '{"level":"error","message":"API Error: 500 Internal Server Error"}',
+    shouldBeError: true,
+  },
+  {
+    name: 'JSON fatal format - fatal error (should still be detected)',
+    message: '{"level":"fatal","message":"Connection failed permanently"}',
+    shouldBeError: true,
+  },
+  {
+    name: 'JSON info format - informational (should NOT be error)',
+    message: '{"level":"info","message":"Session started successfully"}',
+    shouldBeError: false,
+  },
+  {
+    name: 'JSON debug format - debug message (should NOT be error)',
+    message: '{"level":"debug","message":"Sending request to API, timeout may occur"}',
+    shouldBeError: false,
+  },
+  {
+    name: 'JSON warn with "error" keyword in message',
+    message: '{"level":"warn","message":"Possible error-like condition detected, but not critical"}',
+    shouldBeError: false,
+  },
+  {
+    name: 'JSON warn with "not found" keyword in message',
+    message: '{"level":"warn","message":"Some resource not found but non-critical"}',
+    shouldBeError: false,
+  },
+  {
+    name: 'JSON warn with "failed" keyword in message',
+    message: '{"level":"warn","message":"Request failed but will retry automatically"}',
+    shouldBeError: false,
+  },
+  {
+    name: 'JSON message without level field (should fall through to keyword matching)',
+    message: '{"message":"Something failed"}',
+    shouldBeError: true,
+  },
+  // === EXISTING CASES (emoji-prefixed warnings, Issue #477) ===
+  {
+    name: 'Emoji-prefixed warning with "failed" word (existing Issue #477 handling)',
+    message: '⚠️  [BashTool] Pre-flight check is taking longer than expected. Run with ANTHROPIC_LOG=debug to check for failed or slow API requests.',
+    shouldBeError: false,
+  },
+  {
+    name: 'Emoji-prefixed warning (alternative emoji)',
+    message: '⚠ [BashTool] Another warning message',
+    shouldBeError: false,
+  },
+  // === REAL ERROR CASES (should still be detected) ===
+  {
+    name: 'Real error with "Error:" prefix',
+    message: 'Error: Something went wrong',
+    shouldBeError: true,
+  },
+  {
+    name: 'Real npm error',
+    message: 'npm error code ENOENT',
+    shouldBeError: true,
+  },
+  {
+    name: 'Real failure message',
+    message: 'Command failed with exit code 1',
+    shouldBeError: true,
+  },
+  {
+    name: 'Command not found error',
+    message: '/bin/sh: 1: claude: not found',
+    shouldBeError: true,
+  },
+  // === EDGE CASES ===
+  {
+    name: 'Empty message',
+    message: '',
+    shouldBeError: false,
+  },
+  {
+    name: 'Warning with spaces before emoji',
+    message: '  ⚠️  Some warning text with failed in it',
+    shouldBeError: false,
+  },
+  {
+    name: 'Invalid JSON that starts with { (falls through to keyword matching)',
+    message: '{not valid json: failed}',
+    shouldBeError: true,
+  },
+];
+
+console.log('Testing stderr warning detection for issue #1337 (FIXED BEHAVIOR)...\n');
+
+let passed = 0;
+let failed = 0;
+
+for (const testCase of testCases) {
+  const detected = fixedDetectionLogic(testCase.message);
+  const expected = testCase.shouldBeError;
+  const correct = detected === expected;
+  const result = correct ? '✅ PASS' : '❌ FAIL';
+
+  if (correct) {
+    passed++;
+  } else {
+    failed++;
+  }
+
+  console.log(`${result}: ${testCase.name}`);
+  if (!correct) {
+    console.log(`   Message: "${testCase.message.substring(0, 100)}${testCase.message.length > 100 ? '...' : ''}"`);
+    console.log(`   Expected error: ${expected}, Detected as error: ${detected}`);
+  }
+  console.log('');
+}
+
+console.log(`\nResults: ${passed} passed, ${failed} failed`);
+
+if (failed > 0) {
+  console.log('❌ Some tests failed!');
+  process.exit(1);
+} else {
+  console.log('✅ All tests passed! Issue #1337 fix verified.');
+  process.exit(0);
+}