Fix hang in ServiceConsoleTests.serviceShutdown #673

jakepetroules · 2025-07-25T04:06:18Z

This hang occurred only in CI environments and only on Linux. Here's the sequence of events:

Test terminates swbuild using SIGKILL
OS reparents SWBBuildService (a subprocess of swbuild) to launchd (Darwin) / init (others)
OS closes the file descriptors for the I/O pipes swbuild has connected to SWBBuildService
SWBBuildService's read() loop indicates EOF due to the broken pipe
SWBBuildService causes itself to exit

At this point, the getpgid loop should return ERSCH and terminate the test. However, SWBBuildService is sticking around as a zombie for an extended period of time without init reaping the pid, causing getpgid to never hit the termination state. This causes the test to hang indefinitely.

To fix this, there are two aspects:

A timeout is added around the termination monitoring loop that forces the exit promise to be fulfilled with an error if a 30-second interval elapses without the process exiting
We switch from using a getpgid loop to using a waitid loop, where the terminal state is that the process has exited... we don't care if the zombie hasn't been collected by init, only that it's not in a running state

This fixes the hang for both the Jenkins based CI as well as GitHub actions, and also insulates us against future hangs by ensuring the test will terminate with a timeout error instead of hanging indefinitely, so that we at least know which test is the problem.

jakepetroules · 2025-07-25T04:06:25Z

@swift-ci test

jakepetroules · 2025-07-25T06:29:32Z

@swift-ci test

jakepetroules · 2025-07-25T08:45:23Z

@swift-ci test

jakepetroules · 2025-07-26T00:24:11Z

@swift-ci test

This hang occurred only in CI environments and only on Linux. Here's the sequence of events: - Test terminates swbuild using SIGKILL - OS reparents SWBBuildService (a subprocess of swbuild) to launchd (Darwin) / init (others) - OS closes the file descriptors for the I/O pipes swbuild has connected to SWBBuildService - SWBBuildService's read() loop indicates EOF due to the broken pipe - SWBBuildService causes itself to exit At this point, the getpgid loop should return ERSCH and terminate the test. However, SWBBuildService is sticking around as a zombie for an extended period of time without init reaping the pid, causing getpgid to never hit the termination state. This causes the test to hang indefinitely. To fix this, there are two aspects: - A timeout is added around the termination monitoring loop that forces the exit promise to be fulfilled with an error if a 30-second interval elapses without the process exiting - We switch from using a getpgid loop to using a waitid loop, where the terminal state is that the process has _exited_... we don't care if the zombie hasn't been collected by init, only that it's not in a running state This fixes the hang for both the Jenkins based CI as well as GitHub actions, and also insulates us against future hangs by ensuring the test will terminate with a timeout error instead of hanging indefinitely, so that we at least know _which_ test is the problem.

jakepetroules · 2025-07-26T07:33:10Z

@swift-ci test

cmcgee1024 · 2025-07-28T17:50:25Z

Tests/SwiftBuildTests/ConsoleCommands/ServiceConsoleTests.swift

+import SystemPackage
+#endif
+
+@Suite(.skipHostOS(.windows))


question: Will this cause the entire suite to be skipped on Windows, including individual test functions that aren't marked as skip for Windows?

Yes, but note that this isn't new code. Many of these were broken on Windows which is why they were skipped en masse. We should work on getting them passing.

Some of these were formerly skipped in GitHub actions, but are passing now. Likely the culprit was ServiceConsoleTests.serviceShutdown all along, which is fixed in swiftlang#673

Some of these were formerly skipped in GitHub actions, but are passing now. Likely the culprit was ServiceConsoleTests.serviceShutdown all along, which is fixed in #673

jakepetroules force-pushed the eng/PR-hanging-tests branch from 8677081 to 8a8a566 Compare July 25, 2025 06:29

jakepetroules force-pushed the eng/PR-hanging-tests branch from 8a8a566 to d59ebe0 Compare July 25, 2025 08:42

jakepetroules force-pushed the eng/PR-hanging-tests branch 12 times, most recently from 48448c2 to c95aeae Compare July 26, 2025 00:23

jakepetroules changed the title ~~Experiment: disable hanging tests~~ Fix hang in ServiceConsoleTests.serviceShutdown Jul 26, 2025

jakepetroules marked this pull request as ready for review July 26, 2025 00:24

jakepetroules requested review from aciidgh, mhrawdon, mirza-garibovic, neonichu and owenv as code owners July 26, 2025 00:24

jakepetroules force-pushed the eng/PR-hanging-tests branch from c95aeae to 996bcb1 Compare July 26, 2025 07:06

jakepetroules force-pushed the eng/PR-hanging-tests branch from 996bcb1 to d852b97 Compare July 26, 2025 07:33

jakepetroules enabled auto-merge (rebase) July 26, 2025 08:14

cmcgee1024 reviewed Jul 28, 2025

View reviewed changes

jakepetroules mentioned this pull request Jul 28, 2025

[SWUtil] PropertyListItem: Clarify use of init overloads that take arra… #679

Merged

neonichu approved these changes Jul 28, 2025

View reviewed changes

jakepetroules merged commit bb039da into swiftlang:main Jul 28, 2025
22 of 24 checks passed

jakepetroules deleted the eng/PR-hanging-tests branch July 28, 2025 18:35

jakepetroules mentioned this pull request Jul 29, 2025

Enable some skipped tests #685

Merged

jakepetroules added a commit that referenced this pull request Jul 30, 2025

Enable some skipped tests

a678e46

Some of these were formerly skipped in GitHub actions, but are passing now. Likely the culprit was ServiceConsoleTests.serviceShutdown all along, which is fixed in #673

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix hang in ServiceConsoleTests.serviceShutdown #673

Fix hang in ServiceConsoleTests.serviceShutdown #673

jakepetroules commented Jul 25, 2025 •

edited

Loading

Uh oh!

jakepetroules commented Jul 25, 2025

Uh oh!

jakepetroules commented Jul 25, 2025

Uh oh!

jakepetroules commented Jul 25, 2025

Uh oh!

jakepetroules commented Jul 26, 2025

Uh oh!

jakepetroules commented Jul 26, 2025

Uh oh!

cmcgee1024 Jul 28, 2025

Uh oh!

jakepetroules Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix hang in ServiceConsoleTests.serviceShutdown #673

Fix hang in ServiceConsoleTests.serviceShutdown #673

Conversation

jakepetroules commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakepetroules commented Jul 25, 2025

Uh oh!

jakepetroules commented Jul 25, 2025

Uh oh!

jakepetroules commented Jul 25, 2025

Uh oh!

jakepetroules commented Jul 26, 2025

Uh oh!

jakepetroules commented Jul 26, 2025

Uh oh!

cmcgee1024 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

jakepetroules Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jakepetroules commented Jul 25, 2025 •

edited

Loading