Implement jobserver pool mode in Ninja with new --jobserver-pool flag.#2634
Implement jobserver pool mode in Ninja with new --jobserver-pool flag.#2634digit-google wants to merge 3 commits intoninja-build:masterfrom
--jobserver-pool flag.#2634Conversation
doc/manual.asciidoc
Outdated
| Pool mode is useful when Ninja is the top-level build tool that | ||
| invokes sub-builds recursively in a similar setup. | ||
|
|
||
| To enable pool mode, use `--jobserver-pool` on the command line. This also | ||
| enables client mode. |
There was a problem hiding this comment.
It is trivially knowable when build.ninja is written, whether pool mode is useful. So it would be nice if we didn't need to rely on a CLI flag to enable it.
There was a problem hiding this comment.
You mean a new dedicated variable in the build.ninja file like enable_jobserver_pool = 1? That sounds like a good idea, let me check if this can be implemented simply....
There was a problem hiding this comment.
This seems to work, so I uploaded a new PR with a new commit to implement this specifically. Looking for @jhasse's opinion on this though.
|
With What exactly does that mean? Let's say I run |
|
This means it sets the And if a |
|
I cannot conceive of any scenario where a user would desire that behavior. It is outright counterproductive. The whole point of a pool is to pool jobs, so having the flag produce isolated islands of separately tracked job counts means we are right back to where we were in ninja 1.12, every tool insisting on tracking its own jobs and overcommitting to system resources. At any rate, please delete the confusingly incorrect claim "This also enables client mode" -- since it doesn't do so, and saying it does will mislead the user into thinking this option is useful. Given all jobs participate in the (classic) internal pool, not the one from the parent process, calling ninja a client of something else seems patently false. |
|
Can you clarify / suggest the alternative behavior you'd like to see. E.g. ignoring Otherwise thank you. I'll try to rephrase the "This also enables client mode", to me it means that it makes Ninja act a client for the pool it just created, but that may be to close to implementation details for someone who is not familiar with them. |
|
What does this flag do if the Ninja instance is already in a pool? I'm worried tools like Meson will place |
So my expectations when trying to use this would be, if an existing job server is detected I would always want to use that. Putting Similarly if I (as a maintainer of a project that generates ninja files) can set a variable inside If I set it because a configured build uses LTO and thus GCC will either listen for an existing jobserver or fork its own Remember that the primary purpose of a jobserver is to limit parallism of children, not enable more parallelism. I can always run fully unconstrained, and I don't need a job server to do so. A job server tells children to be subordinate to it and not overcommit beyond allowed resources. I just want to tell commands that I run as ninja rules (such as GCC) that they can't run more often than ninja would otherwise allow. That is all a job server does. I need a way to unambiguously tell that to GCC without running the risk that ninja is going to go rogue and define a second job server. tl;dr I expect that it should be ignored. I don't have strong feelings about whether it also prints a warning. (If the idea behind a warning is to alert me that something might be wrong then a warning is totally misguided because nothing is wrong, but if the intent was to print a status message rather than a warning, telling me verbosely that something went right, I have no issues with such a status message.)
Right, pretty much the point of confusion from my side. As a user, I would look at client mode as meaning "communicates with a server run by another program" rather than "implementation detail: passes messages internally by reusing the pool code". When reading the documentation, I'm just trying to figure out what changes for me as a user -- and for that, I need to know how my jobs are going to be divided up and how many of them can run in parallel. |
|
Nice PR! |
59cc767 to
7161925
Compare
|
Thanks. I just push a new PR with a considerable refactor of the logic being used to decide when to setup a pool and/or a client (it became ... less simple), plus clarifications in the manual. Can I ask you to review this cautiously and let me know of anything that strikes you as incorrect or suggest improvements. Side note: the current failed builds seem unrelated to the PR (failures to install dependencies). |
src/jobserver_pool.cc
Outdated
| if (ret < 0 && errno == EINTR) | ||
| continue; |
There was a problem hiding this comment.
Is it correct to still decrement slot_count when we encounter EINTR? I believe that means that the character wasn't written.
There was a problem hiding this comment.
Great point. And this is the type of thing that is really hard to test properly. Fixed.
doc/manual.asciidoc
Outdated
| Since version 1.14, Ninja can also be used as a top-level protocol server | ||
| by using the `--jobserver-pool` flag on the command-line, or by setting | ||
| `enable_jobserver_pool = 1` in the Ninja build file. Note that this |
There was a problem hiding this comment.
Indeed, I simplified that a bit in the next push. Thank you.
7161925 to
546bff4
Compare
|
I took a look at this, trying a relatively heavy project using But, it looks like ninja itself isn't reserving slots? I have 5 link jobs running, and they are in turn pooling 10 lto jobs (each link job gets one "free" task it can do, and borrows from a shared pool of 5 more). |
|
Weird. Can you run with the |
It looks like if I run htop seems to indicate only 5 total jobs across both ninja and lto-wrapper. I haven't tried jobserver_pool.py, my test involved passing the url of your PR branch to Gentoo Portage and having it build a modified distro package for ninja using your tree. The package doesn't install jobserver_pool.py by default. If you want me to test with that too, I suppose I can. :) |
|
Thank you, the |
546bff4 to
b3d934a
Compare
|
The latest push fixes the issue. What happened was that Ninja did setup a pool, that was passed to subcommands, but did not use it and instead just scheduled 5 parallel jobs as usual. The latest patch includes a unit-test to verify that both pool and clients are setup properly with |
b3d934a to
eb1825b
Compare
|
Pinging @jhasse and @eli-schwartz to look at the latest commits if they have time (no pressure, it's August, may people are on vacation :-)) |
|
Thanks for the great work in this @digit-google! Just a note from my experience of testing and implementing this- there's a compatibility piece with make- we were running 4.2 on our docker container but needed to build and bump to 4.4.1 to guarantee compatibility with --jobserver-auth. 4.4.1 and this PR built works exactly as expected though- set a maximum pool of workers on a shared machine, and the build and all nested sub-builds respect it. Nice work! |
|
Is there any way to set the jobserver pool separately instead of following the current parallel count? This would be useful for cmake ExternalProject, I want to be able to run as many ExternalProjects in parallel as possible to hide the delay in configure steps, but still be able to limit the parallelism in build steps. |
eb1825b to
6d7d18f
Compare
A new class implementing a GNU Jobserver pool of job slots. This only supports FIFO mode on Posix, to match the implementation of the client Jobserver class that uses the pool.
Make Ninja provide a pool of GNU Make jobserver slots when invoked with the `--jobserver-pool` command-line option. - Introduce JobserverState class to manage the state of the jobserver pool and client instances for a given Ninja build. In particular, the methods ShouldSetupClient() and ShouldSetupPool() clarify under which conditions the pool or client should be created, and provide explanations for the decision. - All jobserver-related info / warnings are moved to the VERBOSE level, keeping the output of normal invocations small, and prevents modifying the unit-tests accordingly. - Update manual accordingly, detailing how everything works.
Another way to enable jobserver pool mode is to set `enable_jobserver_pool = 1` in the `build.ninja` file directly, which can be determined by the generator directly. This is equivalent to using `--jobserver-pool` on the command-line. Note that: - Any value other than 1 is ignored, intentionally (the size of the pool is determined when Ninja is invoked only). - There is no way to disable the feature from the command-line when enabled in the build plan. - Just like `--jobserver-pool`, this is ignored if a parent jobserver pool is detected when Ninja is invoked.
This is intentional, unfortunately. For more details see the long thread in #2506 where @jhasse specifically asked not to implement support for This is also already in line 233 of doc/manual.asciidoc (thought the corresponding section is not touched by this PR). |
You can start any jobserver pool implementation yourself before invoking your parallel builds, as long as it implements the For example, there is a |
6d7d18f to
454271e
Compare
|
Friendly ping for @jhasse to approve or comment on this PR. Thanks in advance. |
|
Another friendly pong to @jhasse |
Enable GNU jobserver "server mode" with a new Ninja flag named
--jobserver-pool.This sets up a jobserver pool of job tokens, whose size is determined by the current parallel count,
and makes it available both to Ninja and the sub-commands it launches.
There are no other configuration knobs. On Posix, this only implements the FIFO-based scheme.
On Windows, this implements the semaphore-based one. Both schemes are already supported by
Ninja when it acts as a jobserver client.