Skip to content

Latest commit

 

History

History
899 lines (649 loc) · 16.8 KB

File metadata and controls

899 lines (649 loc) · 16.8 KB

Troubleshooting Guide

Version: 1.0 Last Updated: 2026-01-25 Category: Reference

This guide provides systematic troubleshooting procedures for common Babysitter issues. Each section includes symptoms, diagnosis steps, solutions, and prevention tips.


Table of Contents


Installation Issues

Node.js Not Found

Symptoms:

command not found: npm
command not found: node

Diagnosis:

which node
which npm
node --version

Solutions:

  1. Install Node.js via nvm (recommended):

    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
    source ~/.bashrc  # or ~/.zshrc
    nvm install 22
    nvm use 22
  2. Direct installation:

    • Download from nodejs.org
    • Install the LTS version (22.x)
  3. Verify PATH:

    echo $PATH | grep -E "(node|npm)"

Prevention: Use nvm to manage Node.js versions.


Permission Denied During npm Install

Symptoms:

npm ERR! EACCES: permission denied, mkdir '/usr/local/lib/node_modules'

Diagnosis:

npm config get prefix
ls -la $(npm config get prefix)/lib/node_modules/

Solutions:

  1. Fix npm permissions (recommended):

    mkdir ~/.npm-global
    npm config set prefix '~/.npm-global'
    echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.zshrc
    source ~/.zshrc
  2. Reinstall packages:

    npm install -g @a5c-ai/babysitter-sdk@latest

Prevention: Never use sudo npm install -g. Configure npm for user installs.


SDK Module Not Found

Symptoms:

Error: Cannot find module '@a5c-ai/babysitter-sdk'

Diagnosis:

npm list -g @a5c-ai/babysitter-sdk
which babysitter
echo $PATH

Solutions:

  1. Install globally:

    npm install -g @a5c-ai/babysitter-sdk@latest
  2. Use npx (always works):

    npx -y @a5c-ai/babysitter-sdk@latest --version
  3. Check PATH includes npm global bin:

    npm bin -g
    # Add this to your PATH if not included

Prevention: Verify installation with npm list -g after installing.


Version Mismatch Errors

Symptoms:

Error: Incompatible version: sdk@0.0.120 requires babysitter@^0.0.120

Diagnosis:

npm list -g @a5c-ai/babysitter-sdk

Solutions:

Update all packages to the latest versions:

npm install -g @a5c-ai/babysitter-sdk@latest

Prevention: Update all packages together, not individually.


jq Not Found

Symptoms:

command not found: jq
jq: command not found

Many Babysitter diagnostic commands use jq for JSON processing. Without it, commands like journal analysis and status inspection will fail.

Diagnosis:

which jq
jq --version

Solutions:

Install jq for your platform:

# macOS
brew install jq

# Ubuntu/Debian
sudo apt-get install jq

# Fedora/RHEL/CentOS
sudo dnf install jq

# Arch Linux
sudo pacman -S jq

# Windows (Chocolatey)
choco install jq

# Windows (Scoop)
scoop install jq

# Windows (WSL) - use Ubuntu instructions
sudo apt-get install jq

Verify installation:

jq --version
# Expected: jq-1.6 or higher

Prevention: Install jq as part of your initial setup. It's a standard tool for working with JSON on the command line.


Plugin Issues

Plugin Not Appearing in /skills

Symptoms:

  • /skills command doesn't show "babysit"
  • "Skill not found: babysit" error

Diagnosis:

claude plugin list
claude plugin list | grep babysitter

Step-by-Step Solution:

  1. Add marketplace (if not added):

    claude plugin marketplace add a5c-ai/babysitter
  2. Install plugin:

    claude plugin install --scope user babysitter@a5c.ai
  3. Enable plugin:

    claude plugin enable --scope user babysitter@a5c.ai
  4. Restart Claude Code completely:

    • Close all Claude Code windows
    • Wait a few seconds
    • Reopen Claude Code
  5. Verify:

    /skills
    # Should show "babysit"

Plugin Install Fails

Symptoms:

Error: Plugin 'babysitter@a5c.ai' not found

Diagnosis:

claude plugin marketplace list

Solutions:

  1. Add marketplace first:

    claude plugin marketplace add a5c-ai/babysitter
  2. Check network connectivity:

    curl -I https://github.com/a5c-ai/babysitter
  3. Try updating marketplace:

    claude plugin marketplace update a5c.ai

Plugin Conflicts

Symptoms:

  • Claude Code behaves unexpectedly
  • Other plugins stop working after Babysitter install

Diagnosis:

claude plugin list --all

Solutions:

  1. Disable and re-enable:

    claude plugin disable babysitter@a5c.ai
    claude plugin enable babysitter@a5c.ai
  2. Check for conflicts:

    • Temporarily disable other plugins
    • Test Babysitter alone
    • Re-enable plugins one by one

Run Execution Issues

Run Encountered an Error

Symptoms:

Error: Run encountered an error
  at processIteration (process.js:123)

Diagnosis:

# Check journal for error details
babysitter run:events <runId> --filter-type RUN_FAILED --json

# View last events
babysitter run:events <runId> --limit 10 --reverse --json

Solutions:

  1. Identify the cause:

    jq '.data.error' .a5c/runs/<runId>/journal/$(ls -t .a5c/runs/<runId>/journal/ | head -1)
  2. If journal conflict:

    • Ensure no other sessions are running the same workflow
    • Delete state cache and rebuild:
      rm .a5c/runs/<runId>/state/state.json
      babysitter run:status <runId>
  3. If task failure:

    • Fix the underlying issue (missing deps, file not found, etc.)
    • Resume:
      /babysitter:call resume --run-id <runId>
  4. Ask Claude to analyze:

    Analyze the babysitter run error for <runId> and try to recover
    

Task Execution Failed: ENOENT

Symptoms:

Task failed: test-task-001
Error: ENOENT: no such file or directory

Diagnosis:

# Check task details
cat .a5c/runs/<runId>/tasks/<effectId>/task.json | jq .

# Check working directory
pwd
ls -la

Solutions:

  1. Verify file paths:

    • Check if referenced files exist
    • Ensure paths are correct for your system
  2. Install dependencies:

    npm install
  3. Check working directory:

    • Tasks run relative to the run directory
    • Verify paths in task definition

Agent Task Timeout

Symptoms:

Task timeout: agent-task-001
Execution exceeded 120s

Diagnosis:

# Check task that timed out
cat .a5c/runs/<runId>/tasks/<effectId>/task.json | jq .

Solutions:

  1. Reduce task scope:

    • Analyze specific files instead of entire codebase
    • Break large tasks into smaller ones
  2. Increase timeout (if necessary):

    await ctx.task(analyzeTask, { /* args */ }, {
      timeout: 300000  // 5 minutes
    });
  3. Check API status:

    • Verify Claude API is accessible
    • Check for rate limiting

Session Ended Unexpectedly

Symptoms:

Claude Code session terminated
Run ID: 01KFFTSF8TK8C9GT3YM9QYQ6WG
Status: interrupted

Diagnosis:

babysitter run:status <runId> --json

Solutions:

Resume the run:

/babysitter:call resume --run-id 01KFFTSF8TK8C9GT3YM9QYQ6WG

Babysitter is designed to be resumable. All progress is preserved.


Quality Convergence Issues

Quality Score Not Improving

Symptoms:

Iteration 1: Quality 65/100
Iteration 2: Quality 66/100
Iteration 3: Quality 65/100
...
Iteration 5: Quality 66/100
Target not met: 85/100

Diagnosis:

# View effect resolutions
babysitter run:events <runId> --filter-type EFFECT_RESOLVED --json

# Check recommendations
jq '.recommendations' .a5c/runs/<runId>/tasks/*/result.json

Solutions:

  1. Review feedback:

    • Check what's blocking improvement
    • Look for recurring issues
  2. Lower target:

    Use babysitter with 75% quality target
    
  3. Increase iterations:

    Use babysitter with max 10 iterations
    
  4. Fix blocking issues:

    • Run tests manually: npm test
    • Check lint: npm run lint
    • Fix any consistent failures
  5. Review scoring weights:

    • Some criteria may be weighted too heavily
    • Adjust in process definition if needed

Iterations Never Converge

Symptoms:

  • Quality score oscillates without improvement
  • Maximum iterations reached without meeting target

Diagnosis:

# Check quality trend
jq -s '[.[] | select(.type == "EFFECT_RESOLVED")] | map({effectId: .data.effectId, status: .data.status})' \
  .a5c/runs/<runId>/journal/*.json

Solutions:

  1. Implement early exit on plateau:

    const recentScores = iterationResults.slice(-3).map(r => r.quality);
    if (Math.max(...recentScores) - Math.min(...recentScores) < 2) {
      ctx.log('Quality plateaued, stopping early');
      break;
    }
  2. Lower target to achievable level

  3. Review if target is realistic for the codebase


Inconsistent Quality Scores

Symptoms:

  • Quality scores vary significantly between iterations
  • No clear trend

Diagnosis:

# Check variance in scores
jq -s '[.[] | select(.type == "EFFECT_RESOLVED")] | length' \
  .a5c/runs/<runId>/journal/*.json

Solutions:

  1. Use more deterministic scoring criteria

  2. Ensure tests are stable (no flaky tests)

  3. Review agent scoring prompts for consistency

  4. Use ctx.now() instead of Date.now() for timestamps


Resumption Issues

Cannot Resume Completed Run

Symptoms:

  • Resume has no effect
  • Run state shows "completed"

Diagnosis:

babysitter run:status <runId> --json | jq '.state'

Solutions:

A completed run cannot be resumed - it's already finished. Create a new run instead.


State Corruption After Manual Edits

Symptoms:

  • Run behaves unexpectedly
  • Events don't match expected behavior

Diagnosis:

# Check journal integrity
jq empty .a5c/runs/<runId>/journal/*.json 2>&1

Solutions:

  1. Never edit journal files manually

  2. If state cache corrupted:

    rm .a5c/runs/<runId>/state/state.json
    babysitter run:status <runId>  # Rebuilds state
  3. If journal corrupted:

    • May need to start a new run
    • Backup journal first for analysis

Missing Run Directory

Symptoms:

Error: Run not found: 01KFFTSF8TK8C9GT3YM9QYQ6WG

Diagnosis:

ls -la .a5c/runs/ | grep <runId>

Solutions:

  • Run directory was deleted or never created
  • Create a new run instead
  • Check if you're in the correct working directory

Performance Issues

Slow Iteration Execution

Symptoms:

  • Each iteration takes several minutes
  • Tasks run sequentially when they could be parallel

Diagnosis:

# Find slow tasks
jq -s '[.[] | select(.type == "EFFECT_RESOLVED")] | map({effectId: .data.effectId, duration: ((.data.finishedAt | fromdateiso8601) - (.data.startedAt | fromdateiso8601))}) | sort_by(.duration) | reverse | .[0:5]' \
  .a5c/runs/<runId>/journal/*.json

Solutions:

  1. Use parallel execution:

    const [coverage, lint, security] = await ctx.parallel.all([
      () => ctx.task(coverageTask, {}),
      () => ctx.task(lintTask, {}),
      () => ctx.task(securityTask, {})
    ]);
  2. Reduce agent task scope

  3. Set lower iteration limits

  4. Cache research results between iterations


High Disk Usage

Symptoms:

  • .a5c/ directory growing large
  • Disk space warnings

Diagnosis:

du -sh .a5c/
du -h .a5c/runs/* | sort -h | tail -10

Solutions:

  1. Clean old runs:

    # List runs by size
    du -h .a5c/runs/* | sort -h
    
    # Delete old runs
    rm -rf .a5c/runs/<old-run-id>
  2. Archive completed runs:

    tar -czf run-archive.tar.gz .a5c/runs/<runId>
    rm -rf .a5c/runs/<runId>
  3. Add cleanup to workflow:

    • Periodically clean old runs
    • Set up automated cleanup

Memory Issues

Symptoms:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

Solutions:

  1. Increase Node.js memory:

    export NODE_OPTIONS="--max-old-space-size=4096"
  2. Reduce concurrent tasks

  3. Process smaller batches of files


Journal and State Issues

Journal Conflict

Symptoms:

Error: Journal conflict detected

Diagnosis:

# Check if multiple processes running
ps aux | grep babysitter

Solutions:

  1. Ensure single writer:

    • Only one session should run a workflow at a time
  2. Wait and retry:

    • If previous session crashed, wait a moment
    • Resume the run
  3. Rebuild state:

    rm .a5c/runs/<runId>/state/state.json
    babysitter run:status <runId>

Events Out of Order

Symptoms:

  • Events appear in unexpected order
  • State doesn't match expectations

Diagnosis:

# Check sequence numbers
jq '.seq' .a5c/runs/<runId>/journal/*.json | sort -n

Solutions:

  • Always use CLI or sort by seq field:

    jq -s 'sort_by(.seq)' .a5c/runs/<runId>/journal/*.json
  • File system listing may not be sorted correctly


State Cache Missing

Symptoms:

  • CLI commands are slow on first access
  • State needs rebuilding

Diagnosis:

ls .a5c/runs/<runId>/state/

Solutions:

This is normal. The state cache is derived from the journal:

# Trigger rebuild
babysitter run:status <runId>

The cache will be created automatically.


Diagnostic Commands Reference

Run Status and Events

# Check run status
babysitter run:status <runId> --json

# View all events
babysitter run:events <runId> --json

# View last N events
babysitter run:events <runId> --limit 10 --reverse --json

# Filter by event type
babysitter run:events <runId> --filter-type EFFECT_RESOLVED --json

# Find failures
babysitter run:events <runId> --filter-type RUN_FAILED --json

Task Information

# List pending tasks
babysitter task:list <runId> --pending --json

# View task details
cat .a5c/runs/<runId>/tasks/<effectId>/task.json | jq .

# View task result
cat .a5c/runs/<runId>/tasks/<effectId>/result.json | jq .

System Checks

# Check SDK version
npx -y @a5c-ai/babysitter-sdk@latest --version

# Check installed packages
npm list -g @a5c-ai/babysitter @a5c-ai/babysitter-sdk

# Check plugin status
claude plugin list | grep babysitter

Journal Analysis

# Count events by type
jq -s 'group_by(.type) | map({type: .[0].type, count: length})' \
  .a5c/runs/<runId>/journal/*.json

# Find failed tasks
jq 'select(.type == "EFFECT_RESOLVED" and .data.status == "error")' \
  .a5c/runs/<runId>/journal/*.json

# Check effect resolutions
jq 'select(.type == "EFFECT_RESOLVED") | {effectId: .data.effectId, status: .data.status}' \
  .a5c/runs/<runId>/journal/*.json

When to Contact Support

Contact support if you experience:

  1. Persistent crashes that cannot be resolved with troubleshooting
  2. Data corruption that affects the journal or state
  3. Security concerns about the tool behavior
  4. Bugs that are reproducible with clear steps

How to Report Issues

  1. Gather information:

    • OS and version
    • Node.js version: node --version
    • Claude Code version: claude --version
    • Babysitter SDK version: npx @a5c-ai/babysitter-sdk --version
    • Full error message and stack trace
    • Steps to reproduce
  2. Include diagnostic output:

    babysitter run:status <runId> --json > diagnostic.json
    babysitter run:events <runId> --limit 50 --reverse --json >> diagnostic.json
  3. Create issue at: GitHub Issues

  4. For discussions: GitHub Discussions


Related Documentation


Document Status: Complete Last Updated: 2026-01-25