Skip to content

Implement jobserver pool mode in Ninja with new --jobserver-pool flag.#2634

Open
digit-google wants to merge 3 commits intoninja-build:masterfrom
digit-google:jobserver-pool
Open

Implement jobserver pool mode in Ninja with new --jobserver-pool flag.#2634
digit-google wants to merge 3 commits intoninja-build:masterfrom
digit-google:jobserver-pool

Conversation

@digit-google
Copy link
Contributor

Enable GNU jobserver "server mode" with a new Ninja flag named --jobserver-pool .

This sets up a jobserver pool of job tokens, whose size is determined by the current parallel count,
and makes it available both to Ninja and the sub-commands it launches.

There are no other configuration knobs. On Posix, this only implements the FIFO-based scheme.
On Windows, this implements the semaphore-based one. Both schemes are already supported by
Ninja when it acts as a jobserver client.

Comment on lines 202 to 206
Pool mode is useful when Ninja is the top-level build tool that
invokes sub-builds recursively in a similar setup.

To enable pool mode, use `--jobserver-pool` on the command line. This also
enables client mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is trivially knowable when build.ninja is written, whether pool mode is useful. So it would be nice if we didn't need to rely on a CLI flag to enable it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean a new dedicated variable in the build.ninja file like enable_jobserver_pool = 1? That sounds like a good idea, let me check if this can be implemented simply....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work, so I uploaded a new PR with a new commit to implement this specifically. Looking for @jhasse's opinion on this though.

@eli-schwartz
Copy link

With --jobserver-pool, the docs say: "This also enables client mode."

What exactly does that mean? Let's say I run ninja --jobserver-pool all from a Makefile that already started its own pool. What happens?

@digit-google
Copy link
Contributor Author

This means it sets the MAKEFLAGS variable to point to the new pool, which will then be used by this instance of Ninja and all commands it launches, as if it was set by another program. I.e. if you use --verbose --jobserver-pool you'll see messages such as:

...
ninja: Creating jobserver pool for 32 parallel jobs
ninja: Jobserver mode detected:  -j32 --jobserver-auth=fifo:/tmp/NinjaFIFO4148659

And if a MAKEFLAGS value was already defined by another pool implementation, that one will be ignored (you are telling Ninja to implement its own pool after all).

@eli-schwartz
Copy link

I cannot conceive of any scenario where a user would desire that behavior. It is outright counterproductive. The whole point of a pool is to pool jobs, so having the flag produce isolated islands of separately tracked job counts means we are right back to where we were in ninja 1.12, every tool insisting on tracking its own jobs and overcommitting to system resources.

At any rate, please delete the confusingly incorrect claim "This also enables client mode" -- since it doesn't do so, and saying it does will mislead the user into thinking this option is useful. Given all jobs participate in the (classic) internal pool, not the one from the parent process, calling ninja a client of something else seems patently false.

@digit-google
Copy link
Contributor Author

digit-google commented Jul 1, 2025

Can you clarify / suggest the alternative behavior you'd like to see. E.g. ignoring --jobserver-pool / enable_jobserver_pool = 1 if MAKEFLAGS is already set (in which case should a warning be printed, or should this be silently ignored?).

Otherwise thank you. I'll try to rephrase the "This also enables client mode", to me it means that it makes Ninja act a client for the pool it just created, but that may be to close to implementation details for someone who is not familiar with them.

@TheBrokenRail
Copy link

What does this flag do if the Ninja instance is already in a pool?

I'm worried tools like Meson will place enable_jobserver_pool = 1 into their generated Ninja files, and prevent using the parent jobserver (if launched using CMake's ExternalProject for instance).

@eli-schwartz
Copy link

Can you clarify / suggest the alternative behavior you'd like to see. E.g. ignoring --jobserver-pool / enable_jobserver_pool = 1 if MAKEFLAGS is already set (in which case should a warning be printed, or should this be silently ignored?).

So my expectations when trying to use this would be, if an existing job server is detected I would always want to use that. Putting --jobserver-pool into ninja invocations (including by configuring shell scripts or cargo) is something I would want to do to indicate "this codebase should be parallelized if it isn't already".

Similarly if I (as a maintainer of a project that generates ninja files) can set a variable inside build.ninja, my intent would be "I have detected that this build is going to run other programs that themselves support the jobserver". It doesn't actually mean I specifically require ninja to be the server -- it just means that I want at least one server somewhere.

If I set it because a configured build uses LTO and thus GCC will either listen for an existing jobserver or fork its own make -j$(nproc), my intent won't be that my userbase, which includes people on a 32-core system building six FOSS projects at the same time inside jobserver-supporting worker pool (four of which use GNU Make and 2 of which use ninja) suddenly get each ninja project using its own pool. All the GNU Makes share a pool, since they gracefully detect an existing one, but each ninja makes its own. Now instead of 32 jobs with LTO seeing the existing pool, I have 96 jobs (and LTO only sees the per-ninja pool). Ouch! My actual goal was to make sure my userbase of people who run ninja in an interactive developer shell don't get clobbered by 32 ninja build edges, 2 of which are internal LTO jobs running make -j32. But as a ninja file generator I don't have a way to detect the difference between these two groups, so making any changes would be a breaking change and regress an existing user experience.

Remember that the primary purpose of a jobserver is to limit parallism of children, not enable more parallelism. I can always run fully unconstrained, and I don't need a job server to do so. A job server tells children to be subordinate to it and not overcommit beyond allowed resources.

I just want to tell commands that I run as ninja rules (such as GCC) that they can't run more often than ninja would otherwise allow. That is all a job server does. I need a way to unambiguously tell that to GCC without running the risk that ninja is going to go rogue and define a second job server.

tl;dr

I expect that it should be ignored. I don't have strong feelings about whether it also prints a warning. (If the idea behind a warning is to alert me that something might be wrong then a warning is totally misguided because nothing is wrong, but if the intent was to print a status message rather than a warning, telling me verbosely that something went right, I have no issues with such a status message.)

Otherwise thank you. I'll try to rephrase the "This also enables client mode", to me it means that it makes Ninja act a client for the pool it just created, but that may be to close to implementation details for someone who is not familiar with them.

Right, pretty much the point of confusion from my side. As a user, I would look at client mode as meaning "communicates with a server run by another program" rather than "implementation detail: passes messages internally by reusing the pool code". When reading the documentation, I'm just trying to figure out what changes for me as a user -- and for that, I need to know how my jobs are going to be divided up and how many of them can run in parallel.

@Neustradamus
Copy link

Nice PR!

@digit-google digit-google force-pushed the jobserver-pool branch 2 times, most recently from 59cc767 to 7161925 Compare July 2, 2025 12:50
@digit-google
Copy link
Contributor Author

Thanks. I just push a new PR with a considerable refactor of the logic being used to decide when to setup a pool and/or a client (it became ... less simple), plus clarifications in the manual. Can I ask you to review this cautiously and let me know of anything that strikes you as incorrect or suggest improvements.

Side note: the current failed builds seem unrelated to the PR (failures to install dependencies).

Comment on lines 153 to 154
if (ret < 0 && errno == EINTR)
continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to still decrement slot_count when we encounter EINTR? I believe that means that the character wasn't written.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. And this is the type of thing that is really hard to test properly. Fixed.

Comment on lines 239 to 241
Since version 1.14, Ninja can also be used as a top-level protocol server
by using the `--jobserver-pool` flag on the command-line, or by setting
`enable_jobserver_pool = 1` in the Ninja build file. Note that this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This more or less duplicates lines 205-207

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I simplified that a bit in the next push. Thank you.

@eli-schwartz
Copy link

eli-schwartz commented Jul 4, 2025

I took a look at this, trying a relatively heavy project using ninja -j5 --jobserver-pool with a fair number of longish-running jobs and watching htop. It looks like ninja runs 5 jobs, and each of those runs an LTO link, environment DOES have MAKEFLAGS="--jobserver-auth=fifo:/tmp/NinjaFIFO****" which is good. And make is running e.g one to three LTO subprocesses per job, not clobbering with 5 each.

But, it looks like ninja itself isn't reserving slots? I have 5 link jobs running, and they are in turn pooling 10 lto jobs (each link job gets one "free" task it can do, and borrows from a shared pool of 5 more).

@digit-google
Copy link
Contributor Author

Weird. Can you run with the --verbose flag, it will print how Ninja did setup the pool and the client in your invocation?
What happens if you run the same build plan with jobserver_pool.py --jobs=5?

@eli-schwartz
Copy link

ninja: Creating jobserver pool for 5 parallel jobs
ninja: warning: ignoring jobserver: Explicit parallelism specified []

It looks like if I run ninja -v from a Makefile as make -j5, it is handling it fine:

ninja -v
ninja: not creating a jobserver pool: external pool detected
ninja: Jobserver mode detected:  -j5 --jobserver-auth=fifo:/tmp/GMfifo2372017

htop seems to indicate only 5 total jobs across both ninja and lto-wrapper.

I haven't tried jobserver_pool.py, my test involved passing the url of your PR branch to Gentoo Portage and having it build a modified distro package for ninja using your tree. The package doesn't install jobserver_pool.py by default. If you want me to test with that too, I suppose I can. :)

@digit-google
Copy link
Contributor Author

Thank you, the ignoring jobserver: Explicit parallelism specified [] message indicates that Ninja indeed does not acts as a client because of the explicit -j5 on your command line (the default behavior before this CL). It is a big strange there is nothing between the brackets though. This is a bug, I'll fix it and add a unit-test for this.

@digit-google
Copy link
Contributor Author

The latest push fixes the issue. What happened was that Ninja did setup a pool, that was passed to subcommands, but did not use it and instead just scheduled 5 parallel jobs as usual. The latest patch includes a unit-test to verify that both pool and clients are setup properly with -j<COUNT> --parallel-pool is used.

@digit-google
Copy link
Contributor Author

Pinging @jhasse and @eli-schwartz to look at the latest commits if they have time (no pressure, it's August, may people are on vacation :-))

@T-Wainwright
Copy link

Thanks for the great work in this @digit-google! Just a note from my experience of testing and implementing this- there's a compatibility piece with make- we were running 4.2 on our docker container but needed to build and bump to 4.4.1 to guarantee compatibility with --jobserver-auth.

4.4.1 and this PR built works exactly as expected though- set a maximum pool of workers on a shared machine, and the build and all nested sub-builds respect it. Nice work!

@Andarwinux
Copy link

Is there any way to set the jobserver pool separately instead of following the current parallel count? This would be useful for cmake ExternalProject, I want to be able to run as many ExternalProjects in parallel as possible to hide the delay in configure steps, but still be able to limit the parallelism in build steps.

A new class implementing a GNU Jobserver pool of job slots.
This only supports FIFO mode on Posix, to match the implementation
of the client Jobserver class that uses the pool.
Make Ninja provide a pool of GNU Make jobserver slots when
invoked with the `--jobserver-pool` command-line option.

- Introduce JobserverState class to manage the state of
  the jobserver pool and client instances for a given
  Ninja build.

  In particular, the methods ShouldSetupClient() and
  ShouldSetupPool() clarify under which conditions
  the pool or client should be created, and provide
  explanations for the decision.

- All jobserver-related info / warnings are moved to the
  VERBOSE level, keeping the output of normal invocations
  small, and prevents modifying the unit-tests accordingly.

- Update manual accordingly, detailing how everything works.
Another way to enable jobserver pool mode is to set
`enable_jobserver_pool = 1` in the `build.ninja` file directly,
which can be determined by the generator directly.

This is equivalent to using `--jobserver-pool` on the command-line.
Note that:

- Any value other than 1 is ignored, intentionally (the size of
  the pool is determined when Ninja is invoked only).

- There is no way to disable the feature from the command-line
  when enabled in the build plan.

- Just like `--jobserver-pool`, this is ignored if a parent
  jobserver pool is detected when Ninja is invoked.
@digit-google
Copy link
Contributor Author

Thanks for the great work in this @digit-google! Just a note from my experience of testing and implementing this- there's a compatibility piece with make- we were running 4.2 on our docker container but needed to build and bump to 4.4.1 to guarantee compatibility with --jobserver-auth.

4.4.1 and this PR built works exactly as expected though- set a maximum pool of workers on a shared machine, and the build and all nested sub-builds respect it. Nice work!

This is intentional, unfortunately. For more details see the long thread in #2506 where @jhasse specifically asked not to implement support for --jobserver-fds.

This is also already in line 233 of doc/manual.asciidoc (thought the corresponding section is not touched by this PR).

@digit-google
Copy link
Contributor Author

Is there any way to set the jobserver pool separately instead of following the current parallel count? This would be useful for cmake ExternalProject, I want to be able to run as many ExternalProjects in parallel as possible to hide the delay in configure steps, but still be able to limit the parallelism in build steps.

You can start any jobserver pool implementation yourself before invoking your parallel builds, as long as it implements the --jobserver-auth=fifo:.. protocol on Posix. This will be picked up automatically by Ninja (and Make 4.4+).

For example, there is a misc/jobserver_pool.py script in the Ninja source tree to do that.
Another way is to make a minimal build.ninja that spawns the sub-builds, and invoke it with --jobserver-pool.
Or could do the same with a GNUMakefile and make 4.4+.

@digit-google
Copy link
Contributor Author

Friendly ping for @jhasse to approve or comment on this PR. Thanks in advance.

@digit-google
Copy link
Contributor Author

Another friendly pong to @jhasse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants