Skip to content

Commit 8dcbbc0

Browse files
authored
fix: workflows cancelling on wrong scope of timeout (#3023)
<!-- Thank you for your Pull Request! Please describe the problem this PR fixes and a summary of the changes made. Link to any relevant issues, code snippets, or other PRs. For trivial changes, this template can be ignored in favor of a short description of the changes. --> ## Problem <!-- Describe the issue this PR is solving --> <img width="1439" height="755" alt="Screenshot 2025-10-14 at 11 14 03" src="https://github.com/user-attachments/assets/ce006e1b-3cbc-4064-bd5c-254f28e46c59" /> Some workflows have timeouts that are too short to account for retries in GitHub action `run_with_e2e_account`. That is causing test runs with retries to occasionally cancel themselves early. ## Changes <!-- Summarize the changes introduced in this PR. This is a good place to call out critical or potentially problematic parts of the change. --> Implemented per-attempt timeouts within the `e2e` runner and increased overall timeouts. Acknowledged risk: since we increase the total timeout of the e2e runs - if a task within the job hangs (and isn't the job being retried), it will hang for the rest of the timeout. This was true before as well, but increasing this number exacerbates it. ## Validation <!-- Describe how changes in this PR have been validated. This may include added or updated unit, integration and/or E2E tests, test workflow runs, or manual verification. If manual verification is the only way changes in this PR have been validated, you will need to write some automated tests before this PR is ready to merge. For changes to test infra, or non-functional changes, tests are not always required. Instead, you should call out _why_ you think tests are not required here. If changes affect a GitHub workflow that is not included in the PR checks, include a link to a passing test run of the modified workflow. ---> The workflow run (with the `run-e2e` label) on this PR will validate that the change doesn't break anything, but validation that it works in the long term remains to be seen. ## Checklist <!-- These items must be completed before a PR is ready to be merged. Feel free to publish a draft PR before these items are complete. --> - [X] If this PR includes a functional change to the runtime behavior of the code, I have added or updated automated test coverage for this change. - [X] If this PR requires a change to the [Project Architecture README](../PROJECT_ARCHITECTURE.md), I have included that update in this PR. - [X] If this PR requires a docs update, I have linked to that docs PR above. - [X] If this PR modifies E2E tests, makes changes to resource provisioning, or makes SDK calls, I have run the PR checks with the `run-e2e` label set. _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license._
1 parent 603e831 commit 8dcbbc0

File tree

2 files changed

+82
-22
lines changed

2 files changed

+82
-22
lines changed

.github/actions/run_with_e2e_account/action.yml

Lines changed: 68 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ inputs:
2424
default: false
2525
cdk-lib-version:
2626
required: true
27+
attempt_timeout_minutes:
28+
description: Timeout for each attempt in minutes
29+
required: false
30+
default: '25'
2731
retry_attempts:
2832
description: Number of retry attempts for the main script
2933
required: false
@@ -98,31 +102,80 @@ runs:
98102
# Retry logic
99103
RETRY_ATTEMPTS=${{ inputs.retry_attempts }}
100104
RETRY_DELAY=${{ inputs.retry_delay }}
105+
TIMEOUT_MINUTES=${{ inputs.attempt_timeout_minutes }}
101106
ATTEMPT=1
102107
108+
# Function to run command with timeout (cross-platform)
109+
run_with_timeout() {
110+
local cmd="$1"
111+
local shell_type="$2"
112+
local timeout_seconds=$((TIMEOUT_MINUTES * 60))
113+
114+
echo "Running command with timeout of $TIMEOUT_MINUTES minutes ($timeout_seconds seconds)"
115+
116+
# Create a temporary file for the exit code
117+
local temp_exit_code=$(mktemp)
118+
119+
# Run the command in background
120+
if [ "$shell_type" = "bash" ]; then
121+
(bash -c "$cmd"; echo $? > "$temp_exit_code") &
122+
elif [ "$shell_type" = "pwsh" ]; then
123+
(pwsh -c "$cmd"; echo $? > "$temp_exit_code") &
124+
else
125+
# Default to bash for other shells
126+
(bash -c "$cmd"; echo $? > "$temp_exit_code") &
127+
fi
128+
129+
local cmd_pid=$!
130+
local elapsed=0
131+
local sleep_interval=5
132+
133+
# Monitor the process
134+
while [ $elapsed -lt $timeout_seconds ]; do
135+
if ! kill -0 $cmd_pid 2>/dev/null; then
136+
# Process has finished
137+
wait $cmd_pid 2>/dev/null
138+
local exit_code
139+
if [ -f "$temp_exit_code" ]; then
140+
exit_code=$(cat "$temp_exit_code")
141+
rm -f "$temp_exit_code"
142+
else
143+
exit_code=$?
144+
fi
145+
return $exit_code
146+
fi
147+
148+
sleep $sleep_interval
149+
elapsed=$((elapsed + sleep_interval))
150+
done
151+
152+
# Timeout reached, kill the process
153+
echo "Timeout reached, terminating process..."
154+
kill -TERM $cmd_pid 2>/dev/null
155+
sleep 2
156+
kill -KILL $cmd_pid 2>/dev/null
157+
wait $cmd_pid 2>/dev/null
158+
rm -f "$temp_exit_code"
159+
return 124 # Standard timeout exit code
160+
}
161+
103162
while [ $ATTEMPT -le $RETRY_ATTEMPTS ]; do
104163
echo "Attempt $ATTEMPT of $RETRY_ATTEMPTS"
105164
106-
if [ "$SHELL_CMD" = "bash" ]; then
107-
if bash -c '${{ inputs.run }}'; then
108-
echo "Script succeeded on attempt $ATTEMPT"
109-
exit 0
110-
fi
111-
elif [ "$SHELL_CMD" = "pwsh" ]; then
112-
if pwsh -c '${{ inputs.run }}'; then
113-
echo "Script succeeded on attempt $ATTEMPT"
114-
exit 0
115-
fi
165+
if run_with_timeout '${{ inputs.run }}' "$SHELL_CMD"; then
166+
echo "Script succeeded on attempt $ATTEMPT"
167+
exit 0
116168
else
117-
# Default to bash for other shells
118-
if bash -c '${{ inputs.run }}'; then
119-
echo "Script succeeded on attempt $ATTEMPT"
120-
exit 0
169+
EXIT_CODE=$?
170+
if [ $EXIT_CODE -eq 124 ]; then
171+
echo "Script timed out after $TIMEOUT_MINUTES minutes on attempt $ATTEMPT"
172+
else
173+
echo "Script failed with exit code $EXIT_CODE on attempt $ATTEMPT"
121174
fi
122175
fi
123176
124177
if [ $ATTEMPT -lt $RETRY_ATTEMPTS ]; then
125-
echo "Script failed on attempt $ATTEMPT. Retrying in $RETRY_DELAY seconds..."
178+
echo "Retrying in $RETRY_DELAY seconds..."
126179
sleep $RETRY_DELAY
127180
else
128181
echo "Script failed after $RETRY_ATTEMPTS attempts"

.github/workflows/health_checks.yml

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@ jobs:
357357
e2e_iam_access_drift:
358358
if: needs.do_include_e2e.outputs.run_e2e == 'true'
359359
runs-on: ubuntu-latest
360-
timeout-minutes: 25
360+
timeout-minutes: 120
361361
needs:
362362
- do_include_e2e
363363
- build
@@ -384,13 +384,14 @@ jobs:
384384
e2e_test_accounts: ${{ vars.E2E_TEST_ACCOUNTS }}
385385
node_version: 18
386386
cdk-lib-version: ${{ needs.resolve_inputs.outputs.cdk_lib_version }}
387+
attempt_timeout_minutes: 25
387388
run: npm run test:dir packages/integration-tests/lib/test-e2e/iam_access_drift.test.js
388389
env:
389390
BASELINE_DIR: ${{ steps.setup_baseline_version.outputs.baseline_dir }}
390391
e2e_amplify_outputs_backwards_compatibility:
391392
if: needs.do_include_e2e.outputs.run_e2e == 'true'
392393
runs-on: ubuntu-latest
393-
timeout-minutes: 25
394+
timeout-minutes: 120
394395
needs:
395396
- do_include_e2e
396397
- build
@@ -417,13 +418,14 @@ jobs:
417418
e2e_test_accounts: ${{ vars.E2E_TEST_ACCOUNTS }}
418419
node_version: 18
419420
cdk-lib-version: ${{ needs.resolve_inputs.outputs.cdk_lib_version }}
421+
attempt_timeout_minutes: 25
420422
run: npm run test:dir packages/integration-tests/lib/test-e2e/amplify_outputs_backwards_compatibility.test.js
421423
env:
422424
BASELINE_DIR: ${{ steps.setup_baseline_version.outputs.baseline_dir }}
423425
e2e_hosting:
424426
if: needs.do_include_e2e.outputs.run_e2e == 'true'
425427
runs-on: ubuntu-latest
426-
timeout-minutes: 25
428+
timeout-minutes: 120
427429
needs:
428430
- do_include_e2e
429431
- build
@@ -440,6 +442,7 @@ jobs:
440442
e2e_test_accounts: ${{ vars.E2E_TEST_ACCOUNTS }}
441443
node_version: 18
442444
cdk-lib-version: ${{ needs.resolve_inputs.outputs.cdk_lib_version }}
445+
attempt_timeout_minutes: 120
443446
run: npm run test:dir packages/integration-tests/lib/test-e2e/hosting.test.js
444447
env:
445448
AMPLIFY_BACKEND_TESTS_HOSTING_TEST_BRANCH_NAME: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.ref || github.ref_name }}
@@ -477,7 +480,7 @@ jobs:
477480
matrix: ${{ fromJson(needs.e2e_generate_deployment_tests_matrix.outputs.matrix) }}
478481
runs-on: ${{ matrix.os }}
479482
name: e2e_deployment ${{ matrix.displayNames }} ${{ matrix.node-version }} ${{ matrix.os }}
480-
timeout-minutes: ${{ matrix.os == 'windows-2025' && 35 || 25 }}
483+
timeout-minutes: ${{ matrix.os == 'windows-2025' && 160 || 120 }}
481484
needs:
482485
- do_include_e2e
483486
- build
@@ -506,6 +509,7 @@ jobs:
506509
node_version: ${{ matrix.node-version }}
507510
cdk-lib-version: ${{ needs.resolve_inputs.outputs.cdk_lib_version }}
508511
link_cli: true
512+
attempt_timeout_minutes: ${{ matrix.os == 'windows-2025' && 35 || 25 }}
509513
run: npm run test:dir ${{ matrix.testPaths }}
510514
e2e_generate_sandbox_tests_matrix:
511515
if: needs.do_include_e2e.outputs.run_e2e == 'true'
@@ -540,7 +544,7 @@ jobs:
540544
matrix: ${{ fromJson(needs.e2e_generate_sandbox_tests_matrix.outputs.matrix) }}
541545
runs-on: ${{ matrix.os }}
542546
name: e2e_sandbox ${{ matrix.displayNames }} ${{ matrix.node-version }} ${{ matrix.os }}
543-
timeout-minutes: ${{ matrix.os == 'windows-2025' && 35 || 25 }}
547+
timeout-minutes: ${{ matrix.os == 'windows-2025' && 160 || 120 }}
544548
needs:
545549
- do_include_e2e
546550
- build
@@ -564,6 +568,7 @@ jobs:
564568
node_version: ${{ matrix.node-version }}
565569
cdk-lib-version: ${{ needs.resolve_inputs.outputs.cdk_lib_version }}
566570
link_cli: true
571+
attempt_timeout_minutes: ${{ matrix.os == 'windows-2025' && 35 || 25 }}
567572
run: npm run test:dir ${{ matrix.testPaths }}
568573
e2e_notices:
569574
if: needs.do_include_e2e.outputs.run_e2e == 'true'
@@ -603,7 +608,7 @@ jobs:
603608
e2e_backend_output:
604609
if: needs.do_include_e2e.outputs.run_e2e == 'true'
605610
runs-on: ubuntu-latest
606-
timeout-minutes: 25
611+
timeout-minutes: 120
607612
needs:
608613
- do_include_e2e
609614
- build
@@ -621,6 +626,7 @@ jobs:
621626
node_version: 18
622627
cdk-lib-version: ${{ needs.resolve_inputs.outputs.cdk_lib_version }}
623628
link_cli: true
629+
attempt_timeout_minutes: 25
624630
run: npm run test:dir packages/integration-tests/lib/test-e2e/backend_output.test.js
625631
e2e_create_amplify:
626632
if: needs.do_include_e2e.outputs.run_e2e == 'true' && needs.do_include_e2e.outputs.include_create_amplify_e2e == 'true'
@@ -670,7 +676,7 @@ jobs:
670676
env:
671677
PACKAGE_MANAGER: ${{ matrix.pkg-manager }}
672678
runs-on: ${{ matrix.os }}
673-
timeout-minutes: ${{ matrix.os == 'windows-2025' && 40 || 25 }}
679+
timeout-minutes: ${{ matrix.os == 'windows-2025' && 180 || 120 }}
674680
needs:
675681
- build
676682
- do_include_e2e
@@ -694,6 +700,7 @@ jobs:
694700
node_version: ${{ matrix.node-version }}
695701
cdk-lib-version: ${{ needs.resolve_inputs.outputs.cdk_lib_version }}
696702
shell: bash
703+
attempt_timeout_minutes: ${{ matrix.os == 'windows-2025' && 40 || 25 }}
697704
run: PACKAGE_MANAGER=${{matrix.pkg-manager}} npm run test:dir packages/integration-tests/src/package_manager_sanity_checks.test.ts
698705
lint:
699706
runs-on: ubuntu-latest

0 commit comments

Comments
 (0)