Skip to content

SessionStart hooks fail on cold start: worker-start exits 137, context hook races worker #1505

@nimesh-kumar-sh

Description

@nimesh-kumar-sh

Summary

Two of the three SessionStart hooks produce non-zero exit codes on cold start (worker not running), causing Claude Code to display SessionStart:startup hook error twice on every fresh session.

Root Cause

Since Claude Code runs all matching hooks in parallel, the three SessionStart hooks race each other:

  1. smart-install — always succeeds (exit 0)
  2. worker-start (bun-runner.js worker-service.cjs start) — the start subcommand forks a daemon, then sends SIGKILL to its own process group. This kills both the bun parent AND the node wrapper (bun-runner.js) before any close event handler can fire. Result: exit 137, no stdout output.
  3. context (bun-runner.js worker-service.cjs hook claude-code context) — runs simultaneously with worker-start, tries to reach localhost:37777 before the worker is listening. Result: connection refused, non-zero exit.

Despite the errors, the worker daemon does start successfully — the exit code is a false positive from the parent process being killed.

Reproduction

# Kill the worker
kill -9 $(lsof -t -i :37777)
sleep 3

# Run worker-start — exits 137 despite starting the daemon
_R="$HOME/.claude/plugins/cache/thedotmack/claude-mem/10.6.2"
CLAUDE_PLUGIN_ROOT="$_R" node "$_R/scripts/bun-runner.js" "$_R/scripts/worker-service.cjs" start
echo "EXIT: $?"  # prints 137

# Worker IS running:
lsof -i :37777  # shows bun listening

Suggested Fix

Option A: Fix worker-service.cjs start exit behavior

The start command should not SIGKILL its own process group. After forking the daemon, the parent should exit cleanly with code 0. This is the proper fix.

Option B: Fix the hooks.json commands (workaround)

Wrap the worker-start command with ; continuation and a health-check wait loop so the shell continues after the SIGKILL and returns 0:

{
  "type": "command",
  "command": "... node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" start; for i in 1 2 3 4 5 6 7 8; do curl -sf http://localhost:37777/health >/dev/null 2>&1 && break; sleep 1; done; echo '{\"continue\":true,\"suppressOutput\":true}'",
  "timeout": 60
}

And add a similar health-check wait to the context hook so it doesn't race the worker:

{
  "type": "command",  
  "command": "... for i in 1 2 3 4 5 6 7 8; do curl -sf http://localhost:37777/health >/dev/null 2>&1 && break; sleep 1; done; node \"$_R/scripts/bun-runner.js\" \"$_R/scripts/worker-service.cjs\" hook claude-code context",
  "timeout": 60
}

Environment

  • Claude Code 2.1.85
  • claude-mem 10.6.2
  • macOS (Darwin 25.3.0, arm64)
  • Bun 1.3.11, Node v25.8.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    root:worker-lifecycleWorker startup, shutdown, zombie processes

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions