Skip to content

Commit ee9c89e

Browse files
committed
Fix linux-sandbox actions being killed prematurely when using virtual threads
This change fixes a bug where linux-sandbox actions that take longer than 30 seconds can be killed prematurely and cause the build to fail. There are two things that contribute to this bug: Part one: When using virtual threads with the JavaSubprocessFactory, the process is forked from the currently running virtual thread's carrier thread. The virtual thread then waits for the subprocess to complete, which causes the carrier thread to detach. If there is no other work for the carrier thread to perform, it becomes idle. If it is idle for long enough it can be killed by the ForkJoinPool it is a member of. The default virtual thread scheduler in JDK 25 uses a ForkJoinPool with an idle TTL of 30 seconds. Part two: The linux-sandbox process sends SIGKILL to the child process when its parent dies. This is setup like so: `prctl(PR_SET_PDEATHSIG, SIGKILL)`. These two things combined cause problems for Bazel builds if you: * Use the linux-sandbox strategy * Use virtual threads via --experimental_async_execution * Have an action which takes longer than 30 seconds to complete If the Bazel server isn't very busy while that 30 second action is running, the parent carrier thread can be killed due to inactivity, which causes the linux-sandbox process to die, which causes the build to fail. This is only an issue if you're using virtual threads. When using platform threads they directly wait on the subprocess to complete. The thread is considered busy while waiting, which prevents it from being killed. This change fixes the issue by always forking from a long-lived platform thread. That way there is no risk of the platform thread being killed prematurely. The fix does so while still enabling the linux-sandbox processes to be properly killed when the Bazel server is killed.
1 parent 01cacec commit ee9c89e

File tree

1 file changed

+43
-1
lines changed

1 file changed

+43
-1
lines changed

src/main/java/com/google/devtools/build/lib/shell/JavaSubprocessFactory.java

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@
1515
package com.google.devtools.build.lib.shell;
1616

1717
import com.google.common.base.Preconditions;
18+
import com.google.common.base.Throwables;
1819
import com.google.common.collect.Lists;
20+
import com.google.common.util.concurrent.Uninterruptibles;
1921
import com.google.devtools.build.lib.shell.SubprocessBuilder.StreamAction;
2022
import com.google.devtools.build.lib.util.OS;
2123
import com.google.devtools.build.lib.util.StringEncoding;
@@ -24,6 +26,10 @@
2426
import java.io.InputStream;
2527
import java.io.OutputStream;
2628
import java.lang.ProcessBuilder.Redirect;
29+
import java.util.concurrent.ExecutionException;
30+
import java.util.concurrent.ExecutorService;
31+
import java.util.concurrent.Executors;
32+
import java.util.concurrent.Future;
2733
import java.util.concurrent.TimeUnit;
2834
import java.util.concurrent.atomic.AtomicBoolean;
2935
import java.util.concurrent.locks.ReentrantLock;
@@ -118,6 +124,17 @@ public long getProcessId() {
118124
public static final JavaSubprocessFactory INSTANCE = new JavaSubprocessFactory();
119125
private final ReentrantLock lock = new ReentrantLock();
120126

127+
// Subprocesses fork from this platform thread to ensure the subprocess's parent is long lived,
128+
// which prevents actions from being killed erroneously. See the doc comment on
129+
// {@link startOnPlatformThread} for more details.
130+
private static final ExecutorService forkExecutor =
131+
Executors.newSingleThreadExecutor(
132+
r -> {
133+
Thread t = new Thread(r, "subprocess-fork");
134+
t.setDaemon(true);
135+
return t;
136+
});
137+
121138
private JavaSubprocessFactory() {
122139
// We are a singleton
123140
}
@@ -143,7 +160,7 @@ private JavaSubprocessFactory() {
143160
private Process start(ProcessBuilder builder) throws IOException {
144161
lock.lock();
145162
try {
146-
return builder.start();
163+
return startOnPlatformThread(builder);
147164
} catch (IOException e) {
148165
if (e.getMessage().contains("Failed to exec spawn helper")) {
149166
// Detect permanent failures due to an upgrade of the underlying JDK version,
@@ -159,6 +176,31 @@ private Process start(ProcessBuilder builder) throws IOException {
159176
}
160177
}
161178

179+
/**
180+
* Starts a subprocess by forking from a long-lived platform thread, which prevents the parent
181+
* thread from being killed when using virtual threads.
182+
*
183+
* <p>A virtual thread forks from its carrier thread. The carrier thread is detached while the
184+
* virtual thread waits for the forked process to complete. If there isn't other work for the
185+
* carrier thread to do, then it is idle and can be killed due to inactivity. The parent carrier
186+
* thread being killed causes the linux-sandbox to die because it self destructs when its parent
187+
* dies.
188+
*
189+
* <p>The default virtual thread scheduler uses a {@link java.util.concurrent.ForkJoinPool} for
190+
* its carrier threads with an idle TTL of 30 seconds. Without a long-lived platform thread as
191+
* the parent, builds can fail unexpectedly when actions take longer than 30 seconds.
192+
*/
193+
private Process startOnPlatformThread(ProcessBuilder builder) throws IOException {
194+
Future<Process> future = forkExecutor.submit(builder::start);
195+
try {
196+
return Uninterruptibles.getUninterruptibly(future);
197+
} catch (ExecutionException e) {
198+
Throwables.throwIfInstanceOf(e.getCause(), IOException.class);
199+
Throwables.throwIfUnchecked(e.getCause());
200+
throw new IllegalStateException("Unexpected error starting subprocess", e.getCause());
201+
}
202+
}
203+
162204
@Override
163205
public Subprocess create(SubprocessBuilder params) throws IOException {
164206
Preconditions.checkState(

0 commit comments

Comments
 (0)