Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,26 @@ jobs:
- name: 🧪 Test
run: yarn test-ci --minWorkers=1 --maxWorkers=${{ steps.cpu-cores.outputs.count }}

# Run timeout diagnostics if tests failed on Windows
- name: 🔍 Diagnose Timeout Issues (Windows)
if: failure() && matrix.os == 'windows-latest'
run: |
echo "::group::Running timeout diagnostics for Windows"
cd packages/api-server
yarn diagnose:timeouts 10 30000
echo "::endgroup::"
continue-on-error: true

# Upload diagnostic report if it was generated
- name: 📋 Upload Timeout Diagnostic Report
if: failure() && matrix.os == 'windows-latest'
uses: actions/upload-artifact@v4
with:
name: timeout-diagnostic-report-${{ matrix.os }}
path: packages/api-server/timeout-diagnostic-report.json
retention-days: 30
continue-on-error: true

build-lint-test-skip:
needs: detect-changes
if: needs.detect-changes.outputs.code == 'false'
Expand Down
119 changes: 119 additions & 0 deletions .github/workflows/diagnose-timeouts.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
name: 🔍 Diagnose Timeout Issues

on:
workflow_dispatch:
inputs:
os:
description: 'Operating system to test'
required: true
default: 'windows-latest'
type: choice
options:
- ubuntu-latest
- windows-latest
- macos-latest
iterations:
description: 'Number of test iterations to run'
required: true
default: '20'
type: string
timeout_ms:
description: 'Timeout in milliseconds'
required: true
default: '30000'
type: string
enable_debug:
description: 'Enable detailed debug logging'
required: false
default: true
type: boolean

permissions:
contents: read

jobs:
timeout-diagnostics:
name: 🔍 Timeout Diagnostics / ${{ inputs.os }}
runs-on: ${{ inputs.os }}

steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4

- name: Set up job
uses: ./.github/actions/set-up-job

- name: Display test configuration
run: |
echo "::notice::Running timeout diagnostics with the following configuration:"
echo "::notice::OS: ${{ inputs.os }}"
echo "::notice::Iterations: ${{ inputs.iterations }}"
echo "::notice::Timeout: ${{ inputs.timeout_ms }}ms"
echo "::notice::Debug logging: ${{ inputs.enable_debug }}"
echo "::notice::Platform: ${{ runner.os }}"
echo "::notice::Node version: $(node --version)"

- name: Run timeout diagnostics
id: diagnostics
run: |
cd packages/api-server
yarn diagnose:timeouts ${{ inputs.iterations }} ${{ inputs.timeout_ms }}
env:
CEDAR_DEBUG_TIMEOUT: ${{ inputs.enable_debug && '1' || '0' }}
continue-on-error: true

- name: Upload diagnostic report
if: always()
uses: actions/upload-artifact@v4
with:
name: timeout-diagnostic-report-${{ inputs.os }}-${{ github.run_number }}
path: packages/api-server/timeout-diagnostic-report.json
retention-days: 30

- name: Display quick summary
if: always()
run: |
if [ -f packages/api-server/timeout-diagnostic-report.json ]; then
echo "::group::Diagnostic Summary"
cd packages/api-server
node -e "
const report = JSON.parse(require('fs').readFileSync('timeout-diagnostic-report.json', 'utf8'));
console.log('Total iterations:', report.metadata.iterations);
console.log('Success rate:', report.summary.successRate + '%');
console.log('Timeout rate:', report.summary.timeoutRate + '%');
console.log('Average time:', report.summary.avgTime.toFixed(2) + 'ms');
console.log('Max time:', report.summary.maxTime.toFixed(2) + 'ms');
if (report.summary.timeoutRate > 0) {
console.log('::warning::Timeouts detected - this confirms the issue exists');
} else {
console.log('::notice::No timeouts detected - issue may be environment-specific');
}
"
echo "::endgroup::"
else
echo "::error::No diagnostic report generated"
fi
shell: bash

- name: Set job conclusion
if: always()
run: |
if [ -f packages/api-server/timeout-diagnostic-report.json ]; then
cd packages/api-server
TIMEOUT_RATE=$(node -e "
const report = JSON.parse(require('fs').readFileSync('timeout-diagnostic-report.json', 'utf8'));
console.log(report.summary.timeoutRate);
")

if (( $(echo "$TIMEOUT_RATE > 0" | bc -l) )); then
echo "::warning::Timeout diagnostics completed with $TIMEOUT_RATE% timeout rate"
exit 1
else
echo "::notice::Timeout diagnostics completed successfully - no timeouts detected"
exit 0
fi
else
echo "::error::Diagnostic script failed to generate report"
exit 1
fi
shell: bash
1 change: 1 addition & 0 deletions packages/api-server/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
"build:pack": "yarn pack -o cedarjs-api-server.tgz",
"build:types": "tsc --build --verbose tsconfig.build.json",
"build:watch": "nodemon --watch src --ext \"js,jsx,ts,tsx\" --ignore dist --exec \"yarn build && yarn fix:permissions\"",
"diagnose:timeouts": "yarn build && node scripts/diagnose-timeouts.mjs",
"fix:permissions": "chmod +x dist/index.js; chmod +x dist/watch.js",
"prepublishOnly": "NODE_ENV=production yarn build",
"test": "vitest run",
Expand Down
218 changes: 218 additions & 0 deletions packages/api-server/scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# API Server Scripts

This directory contains diagnostic and utility scripts for the CedarJS API Server package.

## Timeout Diagnostics

### Overview

The `diagnose-timeouts.mjs` script helps identify and analyze intermittent timeout issues that can occur during server creation, particularly on Windows environments.

### Background

We've observed intermittent timeouts in CI, specifically:

- Occurs primarily on Windows
- Happens during `createServer()` calls in tests
- Manifests as 10-second timeouts in `beforeAll` hooks
- Is sporadic rather than consistent

### Usage

#### Local Development

```bash
# Run basic diagnostics (10 iterations)
yarn diagnose:timeouts

# Run with custom parameters
yarn diagnose:timeouts [iterations] [timeout-ms]

# Examples:
yarn diagnose:timeouts 20 30000 # 20 iterations, 30s timeout
yarn diagnose:timeouts 5 15000 # 5 iterations, 15s timeout
```

#### With Debug Logging

```bash
# Enable detailed logging to see where timeouts occur
CEDAR_DEBUG_TIMEOUT=1 yarn diagnose:timeouts 10
```

#### In CI/CD

The diagnostic script is automatically triggered when:

1. Tests fail on Windows in the main CI workflow
2. Manual execution via the "Diagnose Timeout Issues" workflow

### Script Behavior

The diagnostic script:

1. **Creates multiple server instances** in sequence
2. **Measures timing** for each creation attempt
3. **Detects timeouts** and other errors
4. **Generates a detailed report** with statistics
5. **Provides recommendations** based on findings

### Output

#### Console Output

- Real-time progress for each iteration
- Success/timeout/error status
- Final summary with statistics
- Platform-specific recommendations

#### Report File

- `timeout-diagnostic-report.json` (CI)
- `timeout-diagnostic-TIMESTAMP.json` (local)

Contains:

```json
{
"metadata": {
"platform": "win32",
"nodeVersion": "v20.x.x",
"iterations": 10,
"timeoutMs": 30000
},
"summary": {
"successful": 8,
"timeouts": 2,
"errors": 0,
"successRate": 80.0,
"timeoutRate": 20.0,
"avgTime": 1250.45,
"maxTime": 2100.23
},
"details": [...]
}
```

### Debug Logging

When `CEDAR_DEBUG_TIMEOUT=1` is set, detailed logs show:

- Server creation steps
- Plugin registration timing
- Function loading progress
- GraphQL import status
- Hook registration timing

Example debug output:

```
[CEDAR_DEBUG] 2024-01-15T10:30:00.000Z - createServer: Starting
[CEDAR_DEBUG] 2024-01-15T10:30:00.100Z - createServer: Options resolved
[CEDAR_DEBUG] 2024-01-15T10:30:00.150Z - redwoodFastifyAPI: Loading functions from dist
[CEDAR_DEBUG] 2024-01-15T10:30:01.200Z - setLambdaFunctions: Import of hello completed
```

### CI Integration

#### Automatic Triggering

The diagnostic runs automatically when tests fail on Windows:

```yaml
- name: 🔍 Diagnose Timeout Issues (Windows)
if: failure() && matrix.os == 'windows-latest'
run: yarn diagnose:timeouts 10 30000
```

#### Manual Workflow

Use the "Diagnose Timeout Issues" workflow dispatch to:

- Test specific operating systems
- Adjust iteration count and timeout values
- Enable/disable debug logging
- Run diagnostics on-demand

#### Artifacts

Failed CI runs upload diagnostic reports as artifacts:

- Retention: 30 days
- Name: `timeout-diagnostic-report-{os}-{run-number}`
- Location: GitHub Actions artifacts

### Interpreting Results

#### Success Rate < 100%

- **High timeout rate (>10%)**: Likely environment-specific timing issue
- **Occasional timeouts (<5%)**: May be acceptable, consider increasing timeouts
- **Consistent errors**: Check for configuration or dependency issues

#### Common Patterns

- **Windows timeout clusters**: Often related to file system or port binding
- **Slow function imports**: May indicate disk I/O or module resolution issues
- **GraphQL import hangs**: Check for missing exports or circular dependencies

### Recommendations

#### For Windows Timeout Issues

1. **Increase hook timeout** in `vitest.config.mts`
2. **Check antivirus settings** (real-time scanning can slow file operations)
3. **Verify system resources** (CPU, memory, disk I/O)
4. **Consider retry logic** for CI environments

#### For Consistent Failures

1. **Review debug logs** to identify hanging operations
2. **Check fixture files** for missing exports or imports
3. **Verify test environment** setup and teardown
4. **Monitor resource cleanup** between test iterations

### Troubleshooting

#### Script Won't Run

```bash
# Ensure the package is built
yarn build

# Check Node.js version
node --version

# Verify fixture files exist
ls -la src/__tests__/fixtures/graphql/cedar-app/
```

#### No Report Generated

- Check file permissions in output directory
- Verify script has write access
- Look for uncaught exceptions in console output

#### High Error Rate

- Review error details in console output
- Check test fixture integrity
- Verify all dependencies are installed

### Contributing

When modifying the diagnostic script:

1. **Test locally** on multiple platforms
2. **Verify CI integration** doesn't break existing workflows
3. **Update documentation** for new features or parameters
4. **Add appropriate logging** for new diagnostic points

### Related Files

- `../src/createServer.ts` - Main server creation logic with debug logging
- `../src/plugins/api.ts` - API plugin with debug logging
- `../src/plugins/lambdaLoader.ts` - Function loading with debug logging
- `../vitest.config.mts` - Test configuration with hook timeout
- `../../.github/workflows/ci.yml` - CI integration
- `../../.github/workflows/diagnose-timeouts.yml` - Manual diagnostic workflow
Loading
Loading