Skip to content

Commit ce9e398

Browse files
committed
hack/ginkgo-e2e.sh: forward TERM/INT to Ginkgo
What happens at the moment in e.g. pull-kubernetes-e2e-kind in case of a timeout is that ginkgo-e2e.sh gets killed with SIGTERM. This is not propagated to the E2E test suite processes, therefore there is no "Interrupted by User" report and no JUnit file, depending on timing during the process shutdown. Running the Ginkgo CLI with job control enabled creates a new process group, which then can be used to kill the Ginko CLI and the E2E test suite processes. With these changes, more information is produced. Some of it seems a bit redundant, but it's better than none: *** hack/ginkgo-e2e.sh: received termination signal -> asking Ginkgo to stop. *** *** Beware that a timeout may have been caused by some earlier test, *** not necessarily the one which gets interrupted now. *** See the "Spec runtime" for information about how long the *** interrupted test was running. ------------------------------ Interrupted by User First interrupt received; Ginkgo will run any cleanup and reporting nodes but will skip all remaining specs. Interrupt again to skip cleanup. Here's a current progress report: [sig-node] DRA [Feature:DynamicResourceAllocation] [FeatureGate:DynamicResourceAllocation] [Beta] ResourceSlice Controller creates slices (Spec Runtime: 9 .065s) k8s.io/kubernetes/test/e2e/dra/dra.go:812 In [It] (Node Runtime: 9.044s) k8s.io/kubernetes/test/e2e/dra/dra.go:812 At [By Step] Creating slices (Step Runtime: 8.884s) k8s.io/kubernetes/test/e2e/dra/dra.go:847 ... Begin Additional Progress Reports >> There is no failure as the matcher passed to Consistently has not yet failed << End Additional Progress Reports ------------------------------ • [INTERRUPTED] [11.955 seconds] [sig-node] DRA [Feature:DynamicResourceAllocation] [FeatureGate:DynamicResourceAllocation] [Beta] ResourceSlice Controller [It] creates slices [sig-node, Feature:DynamicResourceAllocation, FeatureGate:DynamicResourceAllocation, Feature:Beta] k8s.io/kubernetes/test/e2e/dra/dra.go:812 Timeline >> STEP: Creating a kubernetes client @ 01/09/25 17:18:59.769 ... [FAILED] in [It] - k8s.io/kubernetes/test/e2e/dra/dra.go:881 @ 01/09/25 17:19:08.835 I0109 17:19:11.703212 302727 helper.go:125] Waiting up to 7m0s for all (but 0) nodes to be ready STEP: dump namespace information after failure @ 01/09/25 17:19:11.706 STEP: Collecting events from namespace "dra-7998". @ 01/09/25 17:19:11.706 STEP: Found 0 events. @ 01/09/25 17:19:11.708 ... STEP: Destroying namespace "dra-7998" for this suite. @ 01/09/25 17:19:11.72 << Timeline [INTERRUPTED] Interrupted by User In [It] at: k8s.io/kubernetes/test/e2e/dra/dra.go:812 @ 01/09/25 17:19:08.833 This is the Progress Report generated when the interrupt was received: [sig-node] DRA [Feature:DynamicResourceAllocation] [FeatureGate:DynamicResourceAllocation] [Beta] ResourceSlice Controller creates slices (Spec Runtime: 9 .065s) ... [FAILED] An interrupt occurred and then the following failure was recorded in the interrupted node before it exited: Context was cancelled (cause: Interrupted by User) after 0.329s. There is no failure as the matcher passed to Consistently has not yet failed In [It] at: k8s.io/kubernetes/test/e2e/dra/dra.go:881 @ 01/09/25 17:19:08.835 ------------------------------ Checking for custom logdump instances, if any ---------------------------------------------------------------------------------------------------- k/k version of the log-dump.sh script is deprecated! Please migrate your test job to use test-infra's repo version of log-dump.sh! Migration steps can be found in the readme file. ---------------------------------------------------------------------------------------------------- Sourcing kube-util.sh Detecting project Skeleton Provider: detect-project not implemented Dumping logs from master locally to '/tmp/test' Master SSH not supported for local Dumping logs from nodes locally to '/tmp/test' Node SSH not supported for local Summarizing 1 Failure: [INTERRUPTED] [sig-node] DRA [Feature:DynamicResourceAllocation] [FeatureGate:DynamicResourceAllocation] [Beta] ResourceSlice Controller [It] creates slices [sig-node, Feature:DynamicResourceAllocation, FeatureGate:DynamicResourceAllocation, Feature:Beta] k8s.io/kubernetes/test/e2e/dra/dra.go:812 Ran 1 of 6644 Specs in 12.208 seconds FAIL! - Interrupted by User -- 0 Passed | 1 Failed | 0 Pending | 6643 Skipped --- FAIL: TestE2E (12.74s) FAIL Ginkgo ran 1 suite in 13.379078611s
1 parent 2d0a4f7 commit ce9e398

File tree

1 file changed

+48
-1
lines changed

1 file changed

+48
-1
lines changed

hack/ginkgo-e2e.sh

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,49 @@ fi
204204
# is not used.
205205
suite_args+=(--report-complete-ginkgo --report-complete-junit)
206206

207+
# When SIGTERM doesn't reach the E2E test suite binaries, ginkgo will exit
208+
# without collecting information from about the currently running and
209+
# potentially stuck tests. This seems to happen when Prow shuts down a test
210+
# job because of a timeout.
211+
#
212+
# It's useful to print one final progress report in that case,
213+
# so GINKGO_PROGRESS_REPORT_ON_SIGTERM (enabled by default when CI=true)
214+
# catches SIGTERM and forwards it to all processes spawned by ginkgo.
215+
#
216+
# Manual invocations can trigger a similar report with `killall -USR1 e2e.test`
217+
# without having to kill the test run.
218+
GINKGO_CLI_PID=
219+
signal_handler() {
220+
if [ -n "${GINKGO_CLI_PID}" ]; then
221+
cat <<EOF
222+
223+
*** $0: received $1 signal -> asking Ginkgo to stop.
224+
***
225+
*** Beware that a timeout may have been caused by some earlier test,
226+
*** not necessarily the one which gets interrupted now.
227+
*** See the "Spec runtime" for information about how long the
228+
*** interrupted test was running.
229+
230+
EOF
231+
# This goes to the process group, which is important because we
232+
# need to reach the e2e.test processes forked by the Ginkgo CLI.
233+
kill -TERM "-${GINKGO_CLI_PID}" || true
234+
235+
echo "Waiting for Ginkgo with pid ${GINKGO_CLI_PID}..."
236+
wait "{$GINKGO_CLI_PID}"
237+
echo "Ginkgo terminated."
238+
fi
239+
}
240+
case "${GINKGO_PROGRESS_REPORT_ON_SIGTERM:-${CI:-no}}" in
241+
y|yes|true)
242+
kube::util::trap_add "signal_handler INT" INT
243+
kube::util::trap_add "signal_handler TERM" TERM
244+
# Job control is needed to make the Ginkgo CLI and all workers run
245+
# in their own process group.
246+
set -m
247+
;;
248+
esac
249+
207250
# The following invocation is fairly complex. Let's dump it to simplify
208251
# determining what the final options are. Enabled by default in CI
209252
# environments like Prow.
@@ -236,4 +279,8 @@ case "${GINKGO_SHOW_COMMAND:-${CI:-no}}" in y|yes|true) set -x ;; esac
236279
${E2E_REPORT_DIR:+"--report-dir=${E2E_REPORT_DIR}"} \
237280
${E2E_REPORT_PREFIX:+"--report-prefix=${E2E_REPORT_PREFIX}"} \
238281
"${suite_args[@]:+${suite_args[@]}}" \
239-
"${@}"
282+
"${@}" &
283+
284+
set +x
285+
GINKGO_CLI_PID=$!
286+
wait "${GINKGO_CLI_PID}"

0 commit comments

Comments
 (0)