Implement jobserver pool mode in Ninja with new `--jobserver-pool` flag. by digit-google · Pull Request #2634 · ninja-build/ninja

digit-google · 2025-07-01T15:54:24Z

Enable GNU jobserver "server mode" with a new Ninja flag named --jobserver-pool .

This sets up a jobserver pool of job tokens, whose size is determined by the current parallel count,
and makes it available both to Ninja and the sub-commands it launches.

There are no other configuration knobs. On Posix, this only implements the FIFO-based scheme.
On Windows, this implements the semaphore-based one. Both schemes are already supported by
Ninja when it acts as a jobserver client.

eli-schwartz · 2025-07-01T16:03:10Z

doc/manual.asciidoc

+Pool mode is useful when Ninja is the top-level build tool that
+invokes sub-builds recursively in a similar setup.
+
+To enable pool mode, use `--jobserver-pool` on the command line. This also
+enables client mode.


It is trivially knowable when build.ninja is written, whether pool mode is useful. So it would be nice if we didn't need to rely on a CLI flag to enable it.

You mean a new dedicated variable in the build.ninja file like enable_jobserver_pool = 1? That sounds like a good idea, let me check if this can be implemented simply....

This seems to work, so I uploaded a new PR with a new commit to implement this specifically. Looking for @jhasse's opinion on this though.

eli-schwartz · 2025-07-01T16:46:51Z

With --jobserver-pool, the docs say: "This also enables client mode."

What exactly does that mean? Let's say I run ninja --jobserver-pool all from a Makefile that already started its own pool. What happens?

digit-google · 2025-07-01T16:53:01Z

This means it sets the MAKEFLAGS variable to point to the new pool, which will then be used by this instance of Ninja and all commands it launches, as if it was set by another program. I.e. if you use --verbose --jobserver-pool you'll see messages such as:

...
ninja: Creating jobserver pool for 32 parallel jobs
ninja: Jobserver mode detected:  -j32 --jobserver-auth=fifo:/tmp/NinjaFIFO4148659

And if a MAKEFLAGS value was already defined by another pool implementation, that one will be ignored (you are telling Ninja to implement its own pool after all).

eli-schwartz · 2025-07-01T17:07:33Z

I cannot conceive of any scenario where a user would desire that behavior. It is outright counterproductive. The whole point of a pool is to pool jobs, so having the flag produce isolated islands of separately tracked job counts means we are right back to where we were in ninja 1.12, every tool insisting on tracking its own jobs and overcommitting to system resources.

At any rate, please delete the confusingly incorrect claim "This also enables client mode" -- since it doesn't do so, and saying it does will mislead the user into thinking this option is useful. Given all jobs participate in the (classic) internal pool, not the one from the parent process, calling ninja a client of something else seems patently false.

digit-google · 2025-07-01T17:42:54Z

Can you clarify / suggest the alternative behavior you'd like to see. E.g. ignoring --jobserver-pool / enable_jobserver_pool = 1 if MAKEFLAGS is already set (in which case should a warning be printed, or should this be silently ignored?).

Otherwise thank you. I'll try to rephrase the "This also enables client mode", to me it means that it makes Ninja act a client for the pool it just created, but that may be to close to implementation details for someone who is not familiar with them.

TheBrokenRail · 2025-07-01T17:58:57Z

What does this flag do if the Ninja instance is already in a pool?

I'm worried tools like Meson will place enable_jobserver_pool = 1 into their generated Ninja files, and prevent using the parent jobserver (if launched using CMake's ExternalProject for instance).

eli-schwartz · 2025-07-01T19:06:07Z

Can you clarify / suggest the alternative behavior you'd like to see. E.g. ignoring --jobserver-pool / enable_jobserver_pool = 1 if MAKEFLAGS is already set (in which case should a warning be printed, or should this be silently ignored?).

So my expectations when trying to use this would be, if an existing job server is detected I would always want to use that. Putting --jobserver-pool into ninja invocations (including by configuring shell scripts or cargo) is something I would want to do to indicate "this codebase should be parallelized if it isn't already".

Similarly if I (as a maintainer of a project that generates ninja files) can set a variable inside build.ninja, my intent would be "I have detected that this build is going to run other programs that themselves support the jobserver". It doesn't actually mean I specifically require ninja to be the server -- it just means that I want at least one server somewhere.

If I set it because a configured build uses LTO and thus GCC will either listen for an existing jobserver or fork its own make -j$(nproc), my intent won't be that my userbase, which includes people on a 32-core system building six FOSS projects at the same time inside jobserver-supporting worker pool (four of which use GNU Make and 2 of which use ninja) suddenly get each ninja project using its own pool. All the GNU Makes share a pool, since they gracefully detect an existing one, but each ninja makes its own. Now instead of 32 jobs with LTO seeing the existing pool, I have 96 jobs (and LTO only sees the per-ninja pool). Ouch! My actual goal was to make sure my userbase of people who run ninja in an interactive developer shell don't get clobbered by 32 ninja build edges, 2 of which are internal LTO jobs running make -j32. But as a ninja file generator I don't have a way to detect the difference between these two groups, so making any changes would be a breaking change and regress an existing user experience.

Remember that the primary purpose of a jobserver is to limit parallism of children, not enable more parallelism. I can always run fully unconstrained, and I don't need a job server to do so. A job server tells children to be subordinate to it and not overcommit beyond allowed resources.

I just want to tell commands that I run as ninja rules (such as GCC) that they can't run more often than ninja would otherwise allow. That is all a job server does. I need a way to unambiguously tell that to GCC without running the risk that ninja is going to go rogue and define a second job server.

tl;dr

I expect that it should be ignored. I don't have strong feelings about whether it also prints a warning. (If the idea behind a warning is to alert me that something might be wrong then a warning is totally misguided because nothing is wrong, but if the intent was to print a status message rather than a warning, telling me verbosely that something went right, I have no issues with such a status message.)

Otherwise thank you. I'll try to rephrase the "This also enables client mode", to me it means that it makes Ninja act a client for the pool it just created, but that may be to close to implementation details for someone who is not familiar with them.

Right, pretty much the point of confusion from my side. As a user, I would look at client mode as meaning "communicates with a server run by another program" rather than "implementation detail: passes messages internally by reusing the pool code". When reading the documentation, I'm just trying to figure out what changes for me as a user -- and for that, I need to know how my jobs are going to be divided up and how many of them can run in parallel.

Neustradamus · 2025-07-02T02:38:03Z

Nice PR!

digit-google · 2025-07-02T13:02:18Z

Thanks. I just push a new PR with a considerable refactor of the logic being used to decide when to setup a pool and/or a client (it became ... less simple), plus clarifications in the manual. Can I ask you to review this cautiously and let me know of anything that strikes you as incorrect or suggest improvements.

Side note: the current failed builds seem unrelated to the PR (failures to install dependencies).

sw · 2025-07-02T18:39:03Z

src/jobserver_pool.cc

+        if (ret < 0 && errno == EINTR)
+          continue;


Is it correct to still decrement slot_count when we encounter EINTR? I believe that means that the character wasn't written.

Great point. And this is the type of thing that is really hard to test properly. Fixed.

sw · 2025-07-02T18:39:17Z

doc/manual.asciidoc

+Since version 1.14, Ninja can also be used as a top-level protocol server
+by using the `--jobserver-pool` flag on the command-line, or by setting
+`enable_jobserver_pool = 1` in the Ninja build file. Note that this


This more or less duplicates lines 205-207

Indeed, I simplified that a bit in the next push. Thank you.

eli-schwartz · 2025-07-04T21:36:32Z

I took a look at this, trying a relatively heavy project using ninja -j5 --jobserver-pool with a fair number of longish-running jobs and watching htop. It looks like ninja runs 5 jobs, and each of those runs an LTO link, environment DOES have MAKEFLAGS="--jobserver-auth=fifo:/tmp/NinjaFIFO****" which is good. And make is running e.g one to three LTO subprocesses per job, not clobbering with 5 each.

But, it looks like ninja itself isn't reserving slots? I have 5 link jobs running, and they are in turn pooling 10 lto jobs (each link job gets one "free" task it can do, and borrows from a shared pool of 5 more).

digit-google · 2025-07-04T23:10:18Z

Weird. Can you run with the --verbose flag, it will print how Ninja did setup the pool and the client in your invocation?
What happens if you run the same build plan with jobserver_pool.py --jobs=5?

eli-schwartz · 2025-07-06T23:04:33Z

ninja: Creating jobserver pool for 5 parallel jobs
ninja: warning: ignoring jobserver: Explicit parallelism specified []

It looks like if I run ninja -v from a Makefile as make -j5, it is handling it fine:

ninja -v
ninja: not creating a jobserver pool: external pool detected
ninja: Jobserver mode detected:  -j5 --jobserver-auth=fifo:/tmp/GMfifo2372017

htop seems to indicate only 5 total jobs across both ninja and lto-wrapper.

I haven't tried jobserver_pool.py, my test involved passing the url of your PR branch to Gentoo Portage and having it build a modified distro package for ninja using your tree. The package doesn't install jobserver_pool.py by default. If you want me to test with that too, I suppose I can. :)

digit-google · 2025-07-07T06:00:15Z

Thank you, the ignoring jobserver: Explicit parallelism specified [] message indicates that Ninja indeed does not acts as a client because of the explicit -j5 on your command line (the default behavior before this CL). It is a big strange there is nothing between the brackets though. This is a bug, I'll fix it and add a unit-test for this.

digit-google · 2025-07-07T12:29:50Z

The latest push fixes the issue. What happened was that Ninja did setup a pool, that was passed to subcommands, but did not use it and instead just scheduled 5 parallel jobs as usual. The latest patch includes a unit-test to verify that both pool and clients are setup properly with -j<COUNT> --parallel-pool is used.

digit-google · 2025-08-19T07:24:35Z

Pinging @jhasse and @eli-schwartz to look at the latest commits if they have time (no pressure, it's August, may people are on vacation :-))

T-Wainwright · 2025-08-27T18:18:02Z

Thanks for the great work in this @digit-google! Just a note from my experience of testing and implementing this- there's a compatibility piece with make- we were running 4.2 on our docker container but needed to build and bump to 4.4.1 to guarantee compatibility with --jobserver-auth.

4.4.1 and this PR built works exactly as expected though- set a maximum pool of workers on a shared machine, and the build and all nested sub-builds respect it. Nice work!

Andarwinux · 2025-08-29T17:41:30Z

Is there any way to set the jobserver pool separately instead of following the current parallel count? This would be useful for cmake ExternalProject, I want to be able to run as many ExternalProjects in parallel as possible to hide the delay in configure steps, but still be able to limit the parallelism in build steps.

A new class implementing a GNU Jobserver pool of job slots. This only supports FIFO mode on Posix, to match the implementation of the client Jobserver class that uses the pool.

Make Ninja provide a pool of GNU Make jobserver slots when invoked with the `--jobserver-pool` command-line option. - Introduce JobserverState class to manage the state of the jobserver pool and client instances for a given Ninja build. In particular, the methods ShouldSetupClient() and ShouldSetupPool() clarify under which conditions the pool or client should be created, and provide explanations for the decision. - All jobserver-related info / warnings are moved to the VERBOSE level, keeping the output of normal invocations small, and prevents modifying the unit-tests accordingly. - Update manual accordingly, detailing how everything works.

Another way to enable jobserver pool mode is to set `enable_jobserver_pool = 1` in the `build.ninja` file directly, which can be determined by the generator directly. This is equivalent to using `--jobserver-pool` on the command-line. Note that: - Any value other than 1 is ignored, intentionally (the size of the pool is determined when Ninja is invoked only). - There is no way to disable the feature from the command-line when enabled in the build plan. - Just like `--jobserver-pool`, this is ignored if a parent jobserver pool is detected when Ninja is invoked.

digit-google · 2025-10-04T10:43:22Z

Thanks for the great work in this @digit-google! Just a note from my experience of testing and implementing this- there's a compatibility piece with make- we were running 4.2 on our docker container but needed to build and bump to 4.4.1 to guarantee compatibility with --jobserver-auth.

4.4.1 and this PR built works exactly as expected though- set a maximum pool of workers on a shared machine, and the build and all nested sub-builds respect it. Nice work!

This is intentional, unfortunately. For more details see the long thread in #2506 where @jhasse specifically asked not to implement support for --jobserver-fds.

This is also already in line 233 of doc/manual.asciidoc (thought the corresponding section is not touched by this PR).

digit-google · 2025-10-04T10:45:58Z

Is there any way to set the jobserver pool separately instead of following the current parallel count? This would be useful for cmake ExternalProject, I want to be able to run as many ExternalProjects in parallel as possible to hide the delay in configure steps, but still be able to limit the parallelism in build steps.

You can start any jobserver pool implementation yourself before invoking your parallel builds, as long as it implements the --jobserver-auth=fifo:.. protocol on Posix. This will be picked up automatically by Ninja (and Make 4.4+).

For example, there is a misc/jobserver_pool.py script in the Ninja source tree to do that.
Another way is to make a minimal build.ninja that spawns the sub-builds, and invoke it with --jobserver-pool.
Or could do the same with a GNUMakefile and make 4.4+.

digit-google · 2025-10-04T10:58:58Z

Friendly ping for @jhasse to approve or comment on this PR. Thanks in advance.

digit-google · 2025-11-26T14:02:06Z

Another friendly pong to @jhasse

digit-google mentioned this pull request Jul 1, 2025

Implement GNU Make jobserver client protocol support in Ninja #2506

Merged

eli-schwartz reviewed Jul 1, 2025

View reviewed changes

digit-google force-pushed the jobserver-pool branch 2 times, most recently from 59cc767 to 7161925 Compare July 2, 2025 12:50

sw reviewed Jul 2, 2025

View reviewed changes

digit-google force-pushed the jobserver-pool branch from 7161925 to 546bff4 Compare July 3, 2025 09:41

digit-google force-pushed the jobserver-pool branch from 546bff4 to b3d934a Compare July 7, 2025 12:13

digit-google force-pushed the jobserver-pool branch from b3d934a to eb1825b Compare August 11, 2025 14:14

digit-google force-pushed the jobserver-pool branch from eb1825b to 6d7d18f Compare September 4, 2025 10:12

digit-google added 3 commits September 23, 2025 08:25

Add JobserverPool class.

32ad81c

A new class implementing a GNU Jobserver pool of job slots. This only supports FIFO mode on Posix, to match the implementation of the client Jobserver class that uses the pool.

bgemmill mentioned this pull request Sep 25, 2025

Feature request: GNU jobserver support davidlattimore/wild#1108

Closed

digit-google force-pushed the jobserver-pool branch from 6d7d18f to 454271e Compare October 4, 2025 10:47

digit-google mentioned this pull request Nov 26, 2025

[GNU Make Jobserver] Add newline when printing recipes when Make jobserver is active #2694

Open

haampie mentioned this pull request Jan 27, 2026

Jobserver wrongly assumes that it will be able to read tokens for all available jobs on setup llvm/llvm-project#170184

Open

Conversation

digit-google commented Jul 1, 2025

Uh oh!

eli-schwartz Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

digit-google Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

digit-google Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

eli-schwartz commented Jul 1, 2025

Uh oh!

digit-google commented Jul 1, 2025

Uh oh!

eli-schwartz commented Jul 1, 2025

Uh oh!

digit-google commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBrokenRail commented Jul 1, 2025

Uh oh!

eli-schwartz commented Jul 1, 2025

Uh oh!

Neustradamus commented Jul 2, 2025

Uh oh!

digit-google commented Jul 2, 2025

Uh oh!

sw Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

digit-google Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

sw Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

digit-google Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

eli-schwartz commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

digit-google commented Jul 4, 2025

Uh oh!

eli-schwartz commented Jul 6, 2025

Uh oh!

digit-google commented Jul 7, 2025

Uh oh!

digit-google commented Jul 7, 2025

Uh oh!

digit-google commented Aug 19, 2025

Uh oh!

T-Wainwright commented Aug 27, 2025

Uh oh!

Andarwinux commented Aug 29, 2025

Uh oh!

digit-google commented Oct 4, 2025

Uh oh!

digit-google commented Oct 4, 2025

Uh oh!

digit-google commented Oct 4, 2025

Uh oh!

digit-google commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

digit-google commented Jul 1, 2025 •

edited

Loading

eli-schwartz commented Jul 4, 2025 •

edited

Loading