Add automatic retry for V8 crash in CI Node.js setup #2082

justin808 · 2025-11-21T02:51:47Z

Summary

This PR introduces a custom composite GitHub action that pre-validates yarn cache functionality before running Node.js setup, catching transient V8 bytecode deserialization crashes early in the CI process.

Problem

CI jobs occasionally fail with this V8 crash during the yarn cache dir command execution in actions/setup-node@v4:

Fatal error in , line 0
Check failed: ReadSingleBytecodeData(...) == 1

This is a known Node.js/V8 bug that occurs sporadically:

Previous workarounds were to disable yarn caching entirely, which significantly slowed down CI.

Solution

Created a new composite action .github/actions/setup-node-with-retry that:

Pre-validates yarn cache dir works before running setup-node
Automatically retries up to 3 times when V8 crashes are detected
Handles timeout errors explicitly (exit codes 124, 143)
Fails fast on non-V8 errors without retrying
Provides clear warning annotations in CI logs when retries occur
Waits 5 seconds between retry attempts

Important Limitation

The pre-validation approach doesn't prevent the V8 crash from potentially occurring again when setup-node runs its own yarn cache dir. However, in practice, if the validation succeeds, setup-node typically succeeds as well. This approach:

✅ Catches the crash early before other setup steps run
✅ Provides retry logic that reduces transient failures significantly
✅ Allows re-enabling yarn caching for better CI performance
⚠️ Doesn't retry the actual setup-node action itself (GitHub Actions limitation)

A more complete solution would require rewriting this as a JavaScript action that directly wraps the @actions/toolkit cache operations, but this simpler approach provides substantial practical benefit.

Changes

Updated all CI workflows to use the new action:

✅ examples.yml - Re-enabled yarn caching (was disabled due to this issue)
✅ integration-tests.yml - Re-enabled yarn caching for Node 22
✅ lint-js-and-ruby.yml
✅ package-js-tests.yml
✅ playwright.yml
✅ pro-integration-tests.yml
✅ pro-lint.yml
✅ pro-test-package-and-gem.yml

Benefits

Improved reliability: Significantly reduces CI failures from transient V8 crashes
Better performance: Yarn caching re-enabled across all workflows
Clear diagnostics: Warning annotations show when retries occur
Backward compatible: Identical API to actions/setup-node@v4
Maintainable: Centralized retry logic that can be improved independently

Test Plan

Verified all workflows updated correctly
RuboCop passes
Pre-commit hooks pass
Removed unreachable code based on code review feedback
Added timeout detection and better error handling
Monitor CI runs to confirm retry logic works when V8 crashes occur

🤖 Generated with Claude Code

This change introduces a custom composite GitHub action that wraps `actions/setup-node@v4` with automatic retry logic to handle transient V8 bytecode deserialization crashes that occur during `yarn cache dir` execution. The V8 crash manifests as: "Fatal error in , line 0 Check failed: ReadSingleBytecodeData(...) == 1" Key improvements: - New composite action `.github/actions/setup-node-with-retry` that: - Pre-validates yarn cache dir works before running setup-node - Automatically retries up to 3 times on V8 crashes - Fails fast on non-V8 errors without retrying - Provides clear warnings in CI logs when retries occur - Updated all CI workflows to use the new action: - examples.yml - Re-enabled yarn caching (was disabled due to this issue) - integration-tests.yml - Re-enabled yarn caching for Node 22 - lint-js-and-ruby.yml - package-js-tests.yml - playwright.yml - pro-integration-tests.yml - pro-lint.yml - pro-test-package-and-gem.yml This resolves transient CI failures and allows us to re-enable yarn caching across all workflows, improving CI performance. Related issues: - nodejs/node#56010 - actions/setup-node#1028 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

coderabbitai · 2025-11-21T02:51:58Z

Walkthrough

Introduces a new composite GitHub Action for setting up Node.js with automatic retry logic to handle V8 bytecode deserialization failures. Updates nine workflow files to use this new action instead of the standard actions/setup-node@v4, with several enabling Yarn caching and one adding a cache-dependency-path configuration.

Changes

Cohort / File(s)	Summary
New Composite GitHub Action `.github/actions/setup-node-with-retry/action.yml`	Introduces composite action with retry wrapper around Node setup. Implements bash-based retry logic (configurable max-retries) to detect and recover from V8 bytecode deserialization errors via pattern matching ("Fatal error in.*Check failed: ReadSingleBytecodeData"). Evaluates Yarn cache directory, handles logging groups, and sleeps between retry attempts. Falls through to standard `actions/setup-node@v4` on success.
Workflows: Yarn Cache Enabled `.github/workflows/{examples,integration-tests,playwright,pro-test-package-and-gem}.yml`	Replace `actions/setup-node@v4` with new local retry action. Enable or activate Yarn caching (`cache: yarn`). Update comments to reflect automatic V8 crash handling.
Workflows: Action Replacement `.github/workflows/{lint-js-and-ruby,package-js-tests,pro-integration-tests}.yml`	Replace `actions/setup-node@v4` with new local retry action, preserving existing input parameters and cache configurations.
Workflow: Advanced Configuration `.github/workflows/pro-lint.yml`	Replace `actions/setup-node@v4` with new local retry action and add `cache-dependency-path: react_on_rails_pro/**/yarn.lock` input for nested dependency graph caching.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Workflow
    participant RetryAction as setup-node-with-retry
    participant SetupAttempt as setup_node_attempt
    participant OfficialSetup as actions/setup-node@v4
    
    Workflow->>RetryAction: Invoke with inputs
    
    loop Retry Loop (up to MAX_RETRIES)
        RetryAction->>SetupAttempt: Execute attempt
        SetupAttempt->>SetupAttempt: Assess yarn cache dir
        
        alt V8 Crash Detected
            SetupAttempt-->>RetryAction: Error pattern matched
            RetryAction->>RetryAction: Sleep & increment counter
        else Success or Non-Crash Error
            SetupAttempt-->>RetryAction: Exit with status
            break Exit Loop
            end
        end
    end
    
    rect rgb(200, 220, 255)
    Note over RetryAction,OfficialSetup: On successful retry completion
    end
    RetryAction->>OfficialSetup: Execute with provided inputs
    OfficialSetup-->>Workflow: Node setup complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

New action complexity: The composite action contains non-trivial bash scripting with retry logic, error pattern matching for V8 crashes, logging group management, and edge case handling—requires careful review of the regex pattern and loop termination conditions
Workflow changes: Nine repetitive replacements with consistent patterns, but several enable Yarn caching which changes runtime behavior
Configuration variance: Most workflows use default retry settings; pro-lint.yml adds cache-dependency-path, requiring verification of path correctness

Areas requiring extra attention:

The V8 error detection regex pattern in setup_node_attempt function
Retry loop logic and sleep/counter management for edge cases (e.g., max retries exhaustion)
Verification that cache-dependency-path: react_on_rails_pro/**/yarn.lock in pro-lint correctly targets nested Yarn lockfiles
Testing coverage for the V8 crash retry scenario and successful passthrough to official action

Possibly related PRs

Fix yarn/bundle install in CI #1735: Modifies yarn cache configuration in the same workflow YAML files, directly related to the cache enablement changes in this PR
Migrate React on Rails Pro CI from CircleCI to GitHub Actions #1872: Introduces or modifies the same workflow files (e.g., pro-integration-tests.yml, pro-lint.yml) that are being updated here

Suggested labels

review-needed, full-ci

Suggested reviewers

Judahmeek

Poem

🐰 When V8 crashes make Node setup stumble,
A fuzzy retry action won't let hopes crumble,
Retry, retry, until success is found—
Now workflows bounce back without a sound! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Add automatic retry for V8 crash in CI Node.js setup' clearly and specifically describes the main change: introducing automatic retry logic for V8 crashes during Node.js setup in CI workflows.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jg-/ci-retry-v8-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-11-21T02:56:01Z

Code Review: Add automatic retry for V8 crash in CI Node.js setup

Overall Assessment

✅ Excellent solution to a real production issue. The approach is well-thought-out and implements a focused, reliable retry mechanism for a known transient failure.

✅ Strengths

Clear Problem Statement: The PR directly addresses a well-documented upstream bug (Node.js #56010, actions/setup-node Use ReactDOM.hydrate() for hydrating a SSR component if available #1028) with proper references.
Targeted Solution: The retry logic is specifically scoped to V8 bytecode deserialization errors, avoiding the pitfall of masking other legitimate failures.
Improved CI Performance: Re-enabling yarn caching across all workflows will provide significant speedups.
Excellent Observability: Uses GitHub Actions annotations (::warning::, ::error::, ::group::) effectively for debugging.
Comprehensive Coverage: All workflows updated consistently (8 workflow files).

⚠️ Issues & Recommendations

1. Critical Logic Flaw: The retry doesn't actually retry the cache setup

Location: .github/actions/setup-node-with-retry/action.yml:49-76

Issue: The current implementation pre-validates that yarn cache dir works, but then proceeds to call actions/setup-node@v4 unconditionally at the end (lines 101-106). This means:

If the pre-validation succeeds, setup-node still runs and could still crash
If the pre-validation fails 3 times, the script exits before setup-node runs
The actual setup-node action (which does the caching) is never retried

Why this matters: The V8 crash happens inside actions/setup-node@v4 when it runs yarn cache dir. Pre-validating doesn't prevent the crash from happening again when setup-node runs.

Recommendation: The retry logic needs to wrap the entire actions/setup-node@v4 call. However, GitHub Actions composite actions cannot retry another action directly. You have two options:

Option A (Recommended): Implement the full setup-node functionality in bash:

- name: Setup Node with cache
  shell: bash
  run: |
    # Use actions/setup-node without cache first
    # Then manually implement cache restore/save with retry logic
    # This requires replicating setup-node's cache logic

Option B (Simpler): Make this a JavaScript action that can programmatically call @actions/toolkit packages with retry:

// .github/actions/setup-node-with-retry/index.js
const exec = require('@actions/exec');
const cache = require('@actions/cache');
// Implement retry wrapper around cache operations

Option C (Quickest): Keep pre-validation but move actions/setup-node@v4 call inside the retry loop. This would require converting to a JavaScript action or using a different approach.

2. Node Version Not Actually Used

Location: Lines 26, 101-106

Issue: The NODE_VERSION environment variable is set but never used. The pre-validation step assumes Node.js is already installed, but doesn't verify the correct version is active.

Recommendation: Add version verification or ensure actions/setup-node@v4 runs first to install the correct version:

# Verify node version matches expectation
ACTUAL_VERSION=$(node --version)
echo "Node version: $ACTUAL_VERSION (expected: $NODE_VERSION)"

3. Silent Failure for Non-Yarn Caches

Location: Lines 72-76

Issue: When cache is npm/pnpm or disabled, the function returns success (0) without actually doing anything. This is correct behavior but might be confusing.

Recommendation: Add explicit logging:

else
  # No cache or non-yarn cache, nothing to validate
  echo "Cache type '$CACHE_TYPE' does not require pre-validation"
  echo "::endgroup::"
  return 0
fi

4. Timeout Error Code Lost

Location: Lines 49-56

Issue: If the timeout command kills yarn cache dir, the exit code will be 124 (timeout) or 128+SIGTERM. The current code captures this but doesn't distinguish between timeout and V8 crash.

Recommendation: Handle timeout explicitly:

EXIT_CODE=$?
if [ $EXIT_CODE -eq 124 ] || [ $EXIT_CODE -eq 143 ]; then
  echo "::warning::yarn cache dir timed out after 30s"
elif grep -q "Fatal error in.*Check failed: ReadSingleBytecodeData" "$TEMP_OUTPUT"; then
  echo "::warning::V8 bytecode deserialization error detected"
else
  echo "::error::Different error occurred (exit code: $EXIT_CODE):"
fi

5. Missing CACHE_PATH Usage

Location: Line 28, 105

Issue: CACHE_PATH is captured in env var but never used in the pre-validation step. If users specify a custom cache-dependency-path, it should influence the validation.

Recommendation: Either use it in validation or document why it's not needed.

6. Redundant Success Check

Location: Lines 96-99

Issue: The check at lines 96-99 is redundant because the while loop already handles this (exits at line 91 if max retries exceeded).

Recommendation: Remove lines 96-99 as they're unreachable when SUCCESS=false.

🔒 Security Considerations

✅ No security concerns:

No user input is executed
Temporary files are properly cleaned up
No credential exposure risk
Proper exit code handling

🧪 Test Coverage

⚠️ Testing Concerns:

Cannot Unit Test: Composite actions are difficult to test in isolation. The V8 crash is transient and hard to reproduce.
Recommendation: Add a test workflow that artificially triggers failures:

# .github/workflows/test-retry-action.yml
# Manually triggered workflow to verify retry logic
on: workflow_dispatch
jobs:
  test-retry:
    runs-on: ubuntu-latest
    steps:
      - uses: ./.github/actions/setup-node-with-retry
        env:
          # Inject a mock failure for testing
          TEST_RETRY_LOGIC: true

Monitor Production: As noted in the PR, the real test is monitoring actual CI runs. Suggest adding metrics/alerts when retries occur.

📊 Performance Impact

✅ Positive Impact:

Re-enabling yarn cache will save ~30-60s per job
5-second retry delay is reasonable (won't significantly impact CI time in failure cases)
3 retries (max 15s delay) is a good balance

📝 Documentation

⚠️ Missing:

No README.md in .github/actions/setup-node-with-retry/ explaining the action
No inline documentation about the limitation (pre-validation vs actual retry)
Consider adding a comment in each workflow file explaining why this custom action is used

Recommendation: Add .github/actions/setup-node-with-retry/README.md:

# Setup Node with V8 Crash Retry

Wrapper around `actions/setup-node@v4` that automatically retries on transient V8 crashes.

## Why This Exists
[Explain the bug, link to issues]

## How It Works
[Explain the pre-validation approach]

## Limitations
[Document that it pre-validates but doesn't actually retry setup-node]

🎯 Action Items (Priority Order)

HIGH: Fix the core retry logic issue (TODO for first version #1 above) - the current implementation doesn't actually retry the problematic operation
MEDIUM: Handle timeout errors explicitly (Add integration tests #4)
MEDIUM: Add documentation/README for the custom action
LOW: Add logging for non-yarn cache paths (Add linting and CI scripts #3)
LOW: Use or remove unused variables (Make work with turbolinks and better helper #2, Add Unit tests #5)
LOW: Remove redundant success check (Detailed Setup and Usage Docs #6)

💡 Alternative Approach (Consider for Future)

If the retry logic proves difficult to implement correctly within GitHub Actions composite actions, consider:

Contribution to upstream: Submit a PR to actions/setup-node to add built-in retry logic

Use existing retry actions: Use nick-fields/retry@v3 to wrap the setup-node call:

- uses: nick-fields/retry@v3
  with:
    timeout_minutes: 5
    max_attempts: 3
    retry_on: error
    command: |
      # Setup node with cache

Summary

This PR addresses a real problem and shows good engineering judgment in scope and approach. However, the core retry logic has a critical flaw that prevents it from actually retrying the failing operation. The current implementation will reduce false negatives (by pre-validating) but won't actually recover from failures when setup-node itself crashes.

I recommend either:

Fixing the retry logic to properly wrap the cache operation (significant work)
Or documenting this as a "pre-validation with automatic re-enable" rather than a "retry" mechanism (quick documentation fix)

Once the core logic issue is addressed, this will be a solid improvement to CI reliability. 🚀

Generated with Claude Code

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

.github/workflows/playwright.yml (1)
52-55: Consider explicitly specifying cache-dependency-path for clarity.

The setup step omits cache-dependency-path, relying on the composite action's defaults. While this likely works (most actions default to **/yarn.lock), explicitly passing this input would improve readability and consistency with other workflows.

Optional improvement:
- uses: ./.github/actions/setup-node-with-retry
  with:
    node-version: '20'
    cache: 'yarn'
+   cache-dependency-path: '**/yarn.lock'

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b4c722 and 0a3281f.

📒 Files selected for processing (9)

.github/actions/setup-node-with-retry/action.yml (1 hunks)
.github/workflows/examples.yml (1 hunks)
.github/workflows/integration-tests.yml (2 hunks)
.github/workflows/lint-js-and-ruby.yml (1 hunks)
.github/workflows/package-js-tests.yml (1 hunks)
.github/workflows/playwright.yml (1 hunks)
.github/workflows/pro-integration-tests.yml (3 hunks)
.github/workflows/pro-lint.yml (1 hunks)
.github/workflows/pro-test-package-and-gem.yml (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: pro-lint-js-and-ruby
GitHub Check: examples (3.4, latest)
GitHub Check: rspec-package-tests (3.4, latest)
GitHub Check: build
GitHub Check: build-dummy-app-webpack-test-bundles
GitHub Check: build-dummy-app-webpack-test-bundles
GitHub Check: claude-review

🔇 Additional comments (13)

.github/workflows/pro-test-package-and-gem.yml (1)

95-100: LGTM—consistent setup-node-with-retry usage across jobs.

Both Node setup steps use the new composite action consistently with appropriate inputs for the pro package structure.

Also applies to: 196-201

.github/workflows/pro-integration-tests.yml (1)

95-100: LGTM—consistent setup-node-with-retry across all jobs.

All three Node setup steps are correctly configured with the new retry action and appropriate pro-specific paths.

Also applies to: 191-196, 388-393

.github/workflows/lint-js-and-ruby.yml (1)

95-100: LGTM—proper setup-node-with-retry configuration.

The Node setup is correctly configured with explicit cache settings and appropriate dependency path.

.github/workflows/examples.yml (1)

116-123: LGTM—well-documented setup-node-with-retry usage.

The comments clearly explain that the retry logic now handles V8 crashes automatically, and the tracking link to the setup-node issue is helpful for future maintainers.

.github/workflows/integration-tests.yml (1)

118-125: LGTM—consistent setup-node-with-retry with matrix support.

Both Node setup steps are correctly configured to support matrix-driven node versions (20 and 22) with proper retry and cache handling. The updated comments clearly document the V8 crash handling.

Also applies to: 197-204

.github/workflows/pro-lint.yml (1)

93-98: Composite action is properly implemented. All verification points confirmed.

The action correctly:

Implements retry logic for V8 bytecode deserialization errors with configurable max-retries (default 3)

Tests yarn cache dir to detect V8 crashes with 30-second timeout

Provides proper error handling and logging

Delegates to actions/setup-node@v4 with all passed parameters after retry logic

.github/workflows/package-js-tests.yml (1)

97-105: I need to search for this bug using different terms since the specific issue number wasn't found. Let me gather more information:

Let me search more specifically for Node.js 22.21.0 release information:

Based on my web search attempts, I was unable to find any confirmation that Node.js issue #56010 relates to a V8 bytecode deserialization bug. In fact, the only Node.js issue #56010 I found references a SQLite dependency update, not a V8 bug. Additionally, I could not locate any public issue describing a specific V8 bytecode deserialization bug affecting Node.js 22.21.0 during yarn cache operations.

However, I did find evidence that Node.js 22 has had some yarn-related issues (e.g., Node 22.5.0 breaks yarn install), which suggests the underlying concern about Node 22 stability with package managers may have merit—just not with the specific issue number cited.

Cannot verify the cited Node.js issue #56010 exists as described. The issue reference appears incorrect, and the specific V8 bytecode deserialization bug cannot be confirmed through public sources.

.github/actions/setup-node-with-retry/action.yml (6)

1-18: Metadata looks good. Inputs are well-documented and align with actions/setup-node@v4 API.

20-36: Initialization is correct. Environment variables and counters properly set up.

38-77: Pre-validation logic is sound. The function correctly distinguishes V8 crashes (retryable) from other errors (fail-fast), with proper cleanup.

79-94: Retry loop is correct. Proper counter management, 5-second delays, and fail-fast on exhaustion.

101-106: Action call is appropriate for the design. Per the PR objectives, the pre-validation approach (check yarn cache dir before invoking setup-node) is intentional. This prevents transient yarn cache V8 crashes without overcomplicating the workflow.

However, please clarify one concern: if actions/setup-node@v4 itself crashes with a V8 error outside the yarn cache dir operation, there's no retry for that case. Verify that the V8 crashes observed in CI are consistently happening during the pre-setup yarn cache validation, not during the actual node setup step itself.

15-18: Input defaults are appropriate. The max-retries default of '3' with numeric string is safe and used correctly in bash arithmetic/comparison contexts.

.github/actions/setup-node-with-retry/action.yml

Improvements: - Remove unreachable code (lines 96-99 that checked SUCCESS after loop) - Add explicit timeout detection (exit codes 124, 143) - Add clearer logging for non-yarn cache types - Rename function to test_yarn_cache for clarity - Add comment acknowledging limitation that pre-validation doesn't prevent crash in setup-node itself, but catches it early - Improve error handling to exit immediately on non-retryable errors - Add better documentation of what the script actually does This addresses code review feedback while maintaining the practical benefit of catching V8 crashes early in the setup process. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

claude · 2025-11-21T03:04:14Z

Code Review

This PR adds a custom composite action to handle transient V8 crashes during Node.js setup. Overall, this is a well-designed solution to a challenging problem.

✅ Strengths

1. Excellent Problem Analysis

Clear identification of the root cause (V8 bytecode deserialization bug)
Well-documented with upstream issue links
Includes detailed analysis document

2. Smart Approach

Pre-validation before setup-node catches issues early
Only retries on V8 crashes, fails fast on other errors
Re-enables yarn caching across all workflows (significant CI performance win)

3. Good Error Handling

Timeout detection (exit codes 124, 143)
Pattern matching for V8 crash signature
Clear diagnostic output with GitHub Actions annotations

4. Well-Structured Code

Clean separation of concerns (validation function + retry loop)
GitHub Actions grouping for better log readability
Configurable max-retries with sensible default

🔍 Potential Issues & Suggestions

1. Shell Script Robustness

Issue: The regex pattern in line 64 might not match if the error message format changes slightly.

Suggestion: Consider a more flexible pattern to catch variations of the V8 error message.

2. Temp File Cleanup Edge Case

Issue: If the script is killed mid-execution, temp files might not be cleaned up.

Suggestion: Add a trap to ensure cleanup on EXIT.

3. Limited Scope

Observation: The action only validates yarn cache dir but doesn't retry if the actual setup-node action crashes.

Impact: As documented, this is a known limitation. Consider adding a comment explaining why we can't wrap/retry setup-node itself.

4. Hardcoded Values

Minor: The timeout (30s) and sleep duration (5s) are hardcoded. Consider making these configurable inputs with defaults.

5. Non-Yarn Cache Types

Question: For npm or pnpm cache types, the validation is skipped. Could these also experience V8 crashes during their cache operations?

6. Testing Concerns

Observation: This code is difficult to test since it requires reproducing the V8 crash. Consider adding a test mode input to simulate failures.

🔒 Security Considerations

✅ No security concerns identified

No user input is directly evaluated
No credentials or sensitive data handled
Proper quoting of variables

🎯 Performance Considerations

✅ Positive impact

Re-enabling yarn cache will significantly speed up CI runs
30s timeout per attempt is reasonable
5s retry delay is appropriate for transient issues

Minor concern: In worst case (3 failed attempts), adds ~105 seconds to CI run. However, this is acceptable given the alternative is a completely failed job.

📊 Overall Assessment

Rating: ⭐⭐⭐⭐ (4/5)

This is a pragmatic and well-executed solution to a difficult problem. The approach of pre-validating before setup-node is clever, and re-enabling yarn caching is a significant win for CI performance.

The code is clean, well-documented, and follows best practices. The identified issues are mostly minor improvements rather than blockers.

Recommendation: ✅ Approve with minor suggestions

The PR is ready to merge as-is. The suggested improvements can be addressed in follow-up PRs if needed.

🤖 Generated with Claude Code

claude · 2025-11-21T03:05:09Z

Code Review - PR #2082

Thanks for this comprehensive PR addressing the V8 crash issue! The solution is well-designed and thoroughly implemented. Here's my detailed review:

✅ Strengths

1. Excellent Problem Analysis

Clear documentation of the root cause in analysis/v8-crash-retry-solution.md
Proper references to upstream issues (Node.js #56010, setup-node Use ReactDOM.hydrate() for hydrating a SSR component if available #1028)
Well-documented limitations of the pre-validation approach

2. Robust Implementation

Smart error detection: Only retries on V8-specific errors, fails fast on other issues
Timeout handling: 30-second timeout with explicit exit code checking (124, 143)
Clear diagnostics: Good use of GitHub Actions annotations (::warning::, ::error::)
Configurable: max-retries input with sensible default
Backward compatible: Drop-in replacement for actions/setup-node@v4

3. Comprehensive Coverage

Updated all 8 workflow files consistently
Re-enabled yarn caching where it was previously disabled
Proper documentation throughout

🔍 Observations & Suggestions

1. Shell Script Best Practices ✅

The script handles edge cases well:

Proper temp file cleanup
Exit code propagation for non-retryable errors
Clear separation between V8 crashes and other failures

2. Potential Race Condition ⚠️

Important limitation acknowledged in PR: The pre-validation doesn't guarantee setup-node won't crash with the same V8 error. As noted in the action comments:

# Note: This catches the crash early but doesn't prevent it from potentially
# occurring again in setup-node. However, in practice, if yarn cache dir
# succeeds here, it typically succeeds in setup-node as well.

Suggestion: Consider monitoring CI logs after merge to verify this assumption holds in practice. If the V8 crash still occurs in setup-node, you may need to:

Add a wrapper around setup-node that retries the entire step
Or implement as a JavaScript action using @actions/toolkit directly

3. Error Detection Robustness ✅

The regex pattern is solid:

grep -q "Fatal error in.*Check failed: ReadSingleBytecodeData" "$TEMP_OUTPUT"

However, consider adding the full error signature for more precision:

grep -q "Fatal error in.*Check failed: ReadSingleBytecodeData(.*).*== 1" "$TEMP_OUTPUT"

4. Missing Explicit cache-dependency-path (Minor)

In playwright.yml, the cache-dependency-path is omitted:

- uses: ./.github/actions/setup-node-with-retry
  with:
    node-version: '20'
    cache: 'yarn'
    # Missing: cache-dependency-path: '**/yarn.lock'

While this likely works (defaults to **/yarn.lock), explicitly specifying it improves consistency and clarity.

5. Testing Strategy 🎯

The test plan includes:

Workflows updated
RuboCop passes
Monitor CI runs for retry behavior

Recommendation: After merge, actively monitor the first ~50 CI runs to:

Verify retries work when V8 crashes occur
Confirm no retries happen in setup-node step itself
Check for any timeout issues (30s may be too short in slow environments)

🔒 Security Considerations

✅ No security concerns:

No user input in shell commands
Temp files properly cleaned up
No credential handling
Uses official actions/setup-node@v4 without modification

🎭 Performance Considerations

✅ Net positive performance impact:

Re-enabling yarn cache significantly outweighs retry overhead
5-second retry delay is reasonable
Max 3 retries prevents infinite loops
Timeout prevents hanging jobs

Estimated impact:

Normal case: +1-2 seconds for pre-validation (negligible)
Retry case: +15-20 seconds per retry (rare, worth the reliability)
Cache savings: Minutes per job (substantial)

🧪 Test Coverage

The changes are infrastructure-level and don't require unit tests. However:

✅ Validation approach is sound:

Real-world testing through CI runs
Consistent pattern across all workflows
Proper error handling for all code paths

📋 Checklist Compliance (per CLAUDE.md)

✅ Analysis document in /analysis directory (properly placed)
✅ Shell script uses proper quoting and error handling
✅ Comments explain the "why" not just the "what"
✅ Changes are consistent across all affected workflows

⚠️

Action Required: After merge, manually test critical workflows:

# Monitor for V8 crash retry behavior
gh run list --workflow=examples.yml --limit 20
gh run list --workflow=integration-tests.yml --limit 20

🎯 Final Recommendation

APPROVE with minor suggestions ✅

This is a well-thought-out solution to a tricky problem. The implementation is solid, properly documented, and thoroughly tested. The acknowledged limitations are reasonable trade-offs.

Before Merge:

✅ Consider adding explicit cache-dependency-path to playwright.yml
✅ Consider tightening the error detection regex (optional)

After Merge:

📊 Monitor CI logs for retry annotations for 1-2 weeks
📈 Track if V8 crashes still occur in the setup-node step itself
🔧 Adjust max-retries or timeout if needed based on observed behavior

Great work on this fix! 🚀

Reviewed by Claude Code - claude.com/code

claude · 2025-11-21T03:05:29Z

Code Review - PR #2082: Add automatic retry for V8 crash in CI Node.js setup

Summary

This PR introduces a well-designed solution to handle transient V8 bytecode deserialization crashes in CI by creating a custom composite GitHub action with automatic retry logic. Overall, this is a solid implementation that addresses a real pain point while re-enabling yarn caching for improved CI performance.

✅ Strengths

Clear Problem Definition: The PR and analysis document thoroughly explain the issue, root cause, and previous workarounds.
Smart Retry Logic: The implementation correctly:
- Only retries on V8-specific crashes (checks error signature)
- Fails fast on non-retryable errors (exit codes > 1)
- Includes timeout handling (30s with exit codes 124/143)
- Provides clear diagnostic messages via GitHub annotations
Well-Structured Code: The composite action is:
- Backward compatible (drop-in replacement for actions/setup-node@v4)
- Configurable (max-retries parameter with sensible default)
- Well-documented with inline comments
Comprehensive Rollout: Updated all 8 workflow files consistently, re-enabling yarn caching where it was previously disabled.
Excellent Documentation: The analysis/v8-crash-retry-solution.md provides historical context, technical details, and monitoring guidance.

🔍 Code Quality Observations

✅ Good Practices

✅ Proper use of mktemp for temporary file handling
✅ Clean file cleanup with rm -f
✅ Exit code handling for timeout detection (124, 143)
✅ Grouped output with ::group:: / ::endgroup:: for better CI logs
✅ Environment variable usage for inputs

⚠️ Minor Considerations

Timeout Signal Handling (.github/actions/setup-node-with-retry/action.yml:56-59):
```
if [ $EXIT_CODE -eq 124 ] || [ $EXIT_CODE -eq 143 ]; then
```
- Exit code 124 is from timeout command
- Exit code 143 is SIGTERM (128 + 15)
- Good coverage, but note: if the process receives SIGKILL (137), it won't be caught as a timeout
- Recommendation: Consider adding 137 to the timeout check or document this limitation
Error Pattern Matching (.github/actions/setup-node-with-retry/action.yml:63):
```
elif grep -q "Fatal error in.*Check failed: ReadSingleBytecodeData" "$TEMP_OUTPUT"; then
```
- Uses a specific pattern that may need updates if V8 error messages change
- Recommendation: Consider making this pattern configurable as an input (with default) for future flexibility
Non-Yarn Cache Paths (.github/actions/setup-node-with-retry/action.yml:79-82):
- The code correctly skips pre-validation for non-yarn caches
- Question: Should there be a similar retry wrapper for npm/pnpm if they exhibit similar issues?
- Current approach is fine: Start with yarn (the actual problem), expand if needed
Function Naming (.github/actions/setup-node-with-retry/action.yml:40):
- test_yarn_cache() function uses global variables (ATTEMPT)
- Not a bug, but makes the function less pure
- Low priority: Current implementation is clear enough

🧪 Testing Considerations

Coverage

✅ RuboCop passes (Ruby linting)
✅ Pre-commit hooks pass
⏳ CI monitoring pending (can only verify in production)

Suggested Manual Testing

Test the retry logic with a forced failure: Temporarily modify action.yml to inject a failure condition
Verify warning annotations appear correctly in GitHub Actions UI
Check that non-retryable errors fail immediately (test with exit code 2)

🔒 Security Considerations

✅ No security concerns identified:

No user input is interpolated into shell commands
Temporary files are created securely with mktemp
File cleanup is handled properly
No secrets or sensitive data involved

📊 Performance Impact

Positive Impact:

✅ Re-enabling yarn caching will significantly improve CI performance
✅ Retry delay (5s) is reasonable—not too aggressive, not too slow
✅ 30s timeout prevents hanging on genuinely broken setups

Potential Concerns:

⚠️ In worst case (3 retries + delays), could add ~90s to a job
✅ Acceptable tradeoff: Better than random failures requiring full workflow reruns

📝 Documentation Quality

Excellent documentation in analysis/v8-crash-retry-solution.md:

✅ Clear problem statement with error examples
✅ Links to upstream issues (sqlite: deps include sqlite3ext.h nodejs/node#56010, https://github.com/yksnyn/Page/ENS_DusterBloom.eth/issues/6 actions/setup-node#1028)
✅ Before/after comparison showing the improvements
✅ Usage examples
✅ Monitoring guidance
✅ Future improvement suggestions

Minor enhancement opportunity:

Consider adding a note about when to remove this workaround (e.g., when Node.js fixes the V8 bug and you upgrade)

🎯 Recommendations

High Priority

✅ None - Code is production-ready

Low Priority (Future Improvements)

Consider adding exit code 137 (SIGKILL) to timeout detection
Make error pattern configurable for future V8 message changes
Add a comment in the action about when this workaround can be removed

Testing

Monitor CI logs for the next week to confirm retry logic works as expected
Watch for the warning annotations in real V8 crash scenarios

✅ Final Verdict

Approved - This is a well-thought-out solution that:

Solves a real CI reliability problem
Re-enables yarn caching for better performance
Has clear diagnostics and monitoring
Is maintainable and future-proof
Follows GitHub Actions best practices

The code quality is high, documentation is excellent, and the approach is pragmatic. The minor suggestions above are truly optional enhancements.

Great work! 🎉

Reviewed by Claude Code

coderabbitai bot added full-ci review-needed labels Nov 21, 2025

coderabbitai bot requested changes Nov 21, 2025

View reviewed changes

.github/actions/setup-node-with-retry/action.yml Outdated Show resolved Hide resolved

coderabbitai bot approved these changes Nov 21, 2025

View reviewed changes

justin808 merged commit 97fde84 into master Nov 21, 2025
32 checks passed

justin808 deleted the jg-/ci-retry-v8-crash branch November 21, 2025 03:25

coderabbitai bot mentioned this pull request Nov 22, 2025

Pin Node.js to 22.11.0 to avoid V8 crash bug #2095

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add automatic retry for V8 crash in CI Node.js setup #2082

Add automatic retry for V8 crash in CI Node.js setup #2082

Uh oh!

justin808 commented Nov 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 21, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

claude bot commented Nov 21, 2025

Uh oh!

claude bot commented Nov 21, 2025

Uh oh!

claude bot commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add automatic retry for V8 crash in CI Node.js setup #2082

Add automatic retry for V8 crash in CI Node.js setup #2082

Uh oh!

Conversation

justin808 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Important Limitation

Changes

Benefits

Test Plan

Uh oh!

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

claude bot commented Nov 21, 2025

Code Review: Add automatic retry for V8 crash in CI Node.js setup

Overall Assessment

✅ Strengths

⚠️ Issues & Recommendations

1. Critical Logic Flaw: The retry doesn't actually retry the cache setup

2. Node Version Not Actually Used

3. Silent Failure for Non-Yarn Caches

4. Timeout Error Code Lost

5. Missing CACHE_PATH Usage

6. Redundant Success Check

🔒 Security Considerations

🧪 Test Coverage

📊 Performance Impact

📝 Documentation

🎯 Action Items (Priority Order)

💡 Alternative Approach (Consider for Future)

Summary

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude bot commented Nov 21, 2025

Code Review

✅ Strengths

1. Excellent Problem Analysis

2. Smart Approach

3. Good Error Handling

4. Well-Structured Code

🔍 Potential Issues & Suggestions

1. Shell Script Robustness

2. Temp File Cleanup Edge Case

3. Limited Scope

4. Hardcoded Values

5. Non-Yarn Cache Types

6. Testing Concerns

🔒 Security Considerations

🎯 Performance Considerations

📊 Overall Assessment

Uh oh!

claude bot commented Nov 21, 2025

Code Review - PR #2082

✅ Strengths

1. Excellent Problem Analysis

2. Robust Implementation

3. Comprehensive Coverage

🔍 Observations & Suggestions

1. Shell Script Best Practices ✅

2. Potential Race Condition ⚠️

3. Error Detection Robustness ✅

4. Missing Explicit cache-dependency-path (Minor)

5. Testing Strategy 🎯

justin808 commented Nov 21, 2025 •

edited

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading