Skip to content

Commit 7889f62

Browse files
committed
Implement GNU jobserver pool mode.
Make Ninja provide a pool of GNU Make jobserver slots when invoked with the `--jobserver-pool` command-line option. - Introduce JobserverState class to manage the state of the jobserver pool and client instances for a given Ninja build. In particular, the methods ShouldSetupClient() and ShouldSetupPool() clarify under which conditions the pool or client should be created, and provide explanations for the decision. - All jobserver-related info / warnings are moved to the VERBOSE level, keeping the output of normal invocations small, and prevents modifying the unit-tests accordingly. - Update manual accordingly, detailing how everything works.
1 parent e19520c commit 7889f62

File tree

5 files changed

+309
-92
lines changed

5 files changed

+309
-92
lines changed

doc/manual.asciidoc

Lines changed: 44 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -192,12 +192,33 @@ GNU Jobserver support
192192
193193
Since version 1.13., Ninja builds can follow the
194194
https://www.gnu.org/software/make/manual/html_node/Job-Slots.html[GNU Make jobserver]
195-
client protocol. This is useful when Ninja is invoked as part of a larger
196-
build system controlled by a top-level GNU Make instance, or any other
197-
jobserver pool implementation, as it allows better coordination between
198-
concurrent build tasks.
195+
protocol.
199196
200-
This feature is automatically enabled under the following conditions:
197+
The protocol is useful to efficiently control parallelism across a set of
198+
concurrent and cooperating processes. This is useful when Ninja is invoked
199+
as part of a larger build system controlled by a top-level Ninja or
200+
GNU Make instance, or any other jobserver pool implementation.
201+
202+
Ninja becomes a protocol client automatically if it detects the right
203+
values in the `MAKEFLAGS` environment variable (see exact conditions below).
204+
205+
Since version 1.14, Ninja can also be a protocol server, if needed, using
206+
the `--jobserver-pool` command-line flag.
207+
208+
In jobserver-enabled builds, there is one top-level "server" process which:
209+
210+
- Sets up a shared pool of job tokens.
211+
- Sets the `MAKEFLAGS` environment variable with special values
212+
to reference the pool.
213+
- Launches child processes (concurrent sub-commands).
214+
215+
Said child processes can be protocol clients if they:
216+
217+
- Recognize the special `MAKEFLAGS` values specific to the protocol.
218+
- Use it to access the shared pool to acquire and release job tokens
219+
during the build.
220+
221+
Ninja automatically becomes a protocol client during builds when:
201222
202223
- Dry-run (i.e. `-n` or `--dry-run`) is not enabled.
203224
@@ -208,18 +229,30 @@ This feature is automatically enabled under the following conditions:
208229
jobserver mode using `--jobserver-auth=SEMAPHORE_NAME` on Windows, or
209230
`--jobserver-auth=fifo:PATH` on Posix.
210231
211-
In this case, Ninja will use the jobserver pool of job slots to control
212-
parallelism, instead of its default parallel implementation.
213-
214-
Note that load-average limitations (i.e. when using `-l<count>`)
215-
are still being enforced in this mode.
216-
217232
IMPORTANT: On Posix, only the FIFO-based version of the protocol, which is
218233
implemented by GNU Make 4.4 and higher, is supported. Ninja will detect
219234
when a pipe-based jobserver is being used (i.e. when `MAKEFLAGS` contains
220235
`--jobserver-auth=<read>,<write>`) and will print a warning, but will
221236
otherwise ignore it.
222237
238+
Using `--jobserver-pool` will make Ninja act as a protocol server, unless
239+
any of these are true:
240+
241+
- An existing pool was detected, as this keeps all processes cooperating
242+
properly.
243+
244+
- `-j1` is used on the command-line, as this is asking Ninja to explicitly
245+
not perform parallel builds.
246+
247+
- Dry-run is enabled.
248+
249+
The size of the pool setup by Ninja matches its parallel count, determined
250+
by the `-j<COUNT>` option, or auto-detected if that one is not provided.
251+
252+
The load-average limitations (i.e. when using `-l<count>`) are still being
253+
enforced in both modes.
254+
255+
223256
Environment variables
224257
~~~~~~~~~~~~~~~~~~~~~
225258

misc/jobserver_test.py

Lines changed: 77 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131

3232
# Set this to True to debug command invocations.
3333
_DEBUG = False
34+
_DEBUG = True
3435

3536
default_env = dict(os.environ)
3637
default_env.pop("NINJA_STATUS", None)
@@ -272,7 +273,7 @@ def run_ninja_with_jobserver_pipe(args):
272273
ret.check_returncode()
273274
return ret.stdout, ret.stderr
274275

275-
output, error = run_ninja_with_jobserver_pipe(["all"])
276+
output, error = run_ninja_with_jobserver_pipe(["-v", "all"])
276277
if _DEBUG:
277278
print(f"OUTPUT [{output}]\nERROR [{error}]\n", file=sys.stderr)
278279
self.assertTrue(error.find("Pipe-based protocol is not supported!") >= 0)
@@ -282,14 +283,88 @@ def run_ninja_with_jobserver_pipe(args):
282283

283284
# Using an explicit -j<N> ignores the jobserver pool.
284285
b.ninja_clean()
285-
output, error = run_ninja_with_jobserver_pipe(["-j1", "all"])
286+
output, error = run_ninja_with_jobserver_pipe(["-v", "-j1", "all"])
286287
if _DEBUG:
287288
print(f"OUTPUT [{output}]\nERROR [{error}]\n", file=sys.stderr)
288289
self.assertFalse(error.find("Pipe-based protocol is not supported!") >= 0)
289290

290291
max_overlaps = compute_max_overlapped_spans(b.path, task_count)
291292
self.assertEqual(max_overlaps, 1)
292293

294+
def test_jobserver_pool_mode(self):
295+
task_count = 4
296+
build_plan = generate_build_plan(task_count)
297+
with BuildDir(build_plan) as b:
298+
# First, run the full tasks with with {task_count} tokens, this should allow all
299+
# tasks to run in parallel.
300+
ret = b.ninja_run(
301+
ninja_args=["--jobserver-pool", "all"],
302+
)
303+
max_overlaps = compute_max_overlapped_spans(b.path, task_count)
304+
self.assertEqual(max_overlaps, task_count)
305+
306+
# Second, use 2 tokens only, and verify that this was enforced by Ninja and
307+
# that both a pool and a client were setup by Ninja.
308+
b.ninja_clean()
309+
ret = b.ninja_spawn(
310+
["-j2", "--jobserver-pool", "--verbose", "all"],
311+
capture_output=True,
312+
)
313+
self.assertEqual(ret.returncode, 0)
314+
self.assertTrue(
315+
"ninja: Creating jobserver pool for 2 parallel jobs" in ret.stdout,
316+
msg="Ninja failed to setup jobserver pool!",
317+
)
318+
self.assertTrue(
319+
"ninja: Jobserver mode detected: " in ret.stdout,
320+
msg="Ninja failed to setup jobserver client!",
321+
)
322+
max_overlaps = compute_max_overlapped_spans(b.path, task_count)
323+
self.assertEqual(max_overlaps, 2)
324+
325+
# Third, verify that --jobs=1 serializes all tasks.
326+
b.ninja_clean()
327+
b.ninja_run(
328+
["--jobserver-pool", "-j1", "all"],
329+
)
330+
max_overlaps = compute_max_overlapped_spans(b.path, task_count)
331+
self.assertEqual(max_overlaps, 1)
332+
333+
# On Linux, use taskset to limit the number of available cores to 1
334+
# and verify that the jobserver overrides the default Ninja parallelism
335+
# and that {task_count} tasks are still spawned in parallel.
336+
if platform.system() == "Linux":
337+
# First, run without a jobserver, with a single CPU, Ninja will
338+
# use a parallelism of 2 in this case (GuessParallelism() in ninja.cc)
339+
b.ninja_clean()
340+
b.ninja_run(
341+
["all"],
342+
prefix_args=["taskset", "-c", "0"],
343+
)
344+
max_overlaps = compute_max_overlapped_spans(b.path, task_count)
345+
self.assertEqual(max_overlaps, 2)
346+
347+
# Now with a jobserver with {task_count} tasks.
348+
b.ninja_clean()
349+
b.ninja_run(
350+
["--jobserver-pool", f"-j{task_count}", "all"],
351+
prefix_args=["taskset", "-c", "0"],
352+
)
353+
max_overlaps = compute_max_overlapped_spans(b.path, task_count)
354+
self.assertEqual(max_overlaps, task_count)
355+
356+
def test_jobserver_pool_mode_ignored_with_existing_pool(self):
357+
task_count = 4
358+
build_plan = generate_build_plan(task_count)
359+
with BuildDir(build_plan) as b:
360+
# Setup a top-level pool with 2 jobs, and verify that `--jobserver-pool` respected it.
361+
ret = b.ninja_run(
362+
ninja_args=["--jobserver-pool", "all"],
363+
prefix_args=[sys.executable, "-S", _JOBSERVER_POOL_SCRIPT, "--jobs=2"],
364+
)
365+
max_overlaps = compute_max_overlapped_spans(b.path, task_count)
366+
self.assertEqual(max_overlaps, 2)
367+
293368
def _test_MAKEFLAGS_value(
294369
self, ninja_args: T.List[str] = [], prefix_args: T.List[str] = []
295370
):

src/build.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,8 +184,12 @@ struct BuildConfig {
184184
};
185185
Verbosity verbosity = NORMAL;
186186
bool dry_run = false;
187+
/// Number of concurrent jobs, auto-detected or specified explicitly.
187188
int parallelism = 1;
188-
bool disable_jobserver_client = false;
189+
/// True if -j<count> was used on the command line.
190+
bool explicit_parallelism = false;
191+
/// True if --jobserver-pool was used on the command line.
192+
bool jobserver_pool = false;
189193
int failures_allowed = 1;
190194
/// The maximum load average we must not exceed. A negative value
191195
/// means that we do not have any limit.

src/jobserver_pool.cc

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -142,20 +142,22 @@ class PosixJobserverPool : public JobserverPool {
142142
// implicit job slot requirement.
143143
bool FillSlots(size_t slot_count, std::string* error) {
144144
job_count_ = slot_count;
145-
for (; slot_count > 1; --slot_count) {
145+
while (slot_count > 1) {
146146
// Write '+' into the pipe, just like GNU Make. Note that some
147147
// implementations write '|' instead, but so far no client or pool
148148
// implementation cares about the exact value, though the official spec
149149
// says this might change in the future.
150150
const char slot_char = '+';
151151
ssize_t ret = ::write(write_fd_, &slot_char, 1);
152-
if (ret != 1) {
153-
if (ret < 0 && errno == EINTR)
154-
continue;
155-
*error =
156-
std::string("Could not fill job slots pool: ") + strerror(errno);
157-
return false;
152+
if (ret == 1) {
153+
slot_count--;
154+
continue;
158155
}
156+
if (ret < 0 && errno == EINTR)
157+
continue;
158+
159+
*error = std::string("Could not fill job slots pool: ") + strerror(errno);
160+
return false;
159161
}
160162
return true;
161163
}

0 commit comments

Comments
 (0)