fix: Also enforce io concurrency during package linking#2020
fix: Also enforce io concurrency during package linking#2020iamthebot wants to merge 1 commit intoconda:mainfrom
Conversation
baszalmstra
left a comment
There was a problem hiding this comment.
Would be nice to test if this causes a performance regression.
|
Is think this might also be a fix for: #2021 |
| let cached_package_dir = cached_package_dir.to_path_buf(); | ||
| let clobber_registry = driver.clobber_registry.clone(); | ||
|
|
||
| let _io_permit = driver.acquire_io_permit().await.map_err(|_| { |
There was a problem hiding this comment.
Will this make it so io_concurrency has to be at least 2, given another permit will be acquired for file linking?
There was a problem hiding this comment.
Yes. But that seems reasonable, no? I can add enforcement where we first capture the io_concurrency to not make this a surprise for users.
Or is there a specific reason (the change here, aside) the lower bound must be one instead?
@baszalmstra agreed! Since we weren't even enforcing the concurrency limit before, what I did is actually look at what real effect the io_concurrency_limit has. Ran these tests on my M4 Max MBP (64GB RAM) since the 2TB nvme SSD is actually faster than on some of the high end AWS servers (and we'd most see bottlenecking on systems with very high I/O performance). tl;dr uses a hot package cache, measures linking for an env with 220 packages including pytorch, numpy, polars, etc. Since OSX has a default ulimit of 256, I ran these tests with Important Caveat I did use Opus 4.5 to write the harness + plots (since this is kind of a throaway) but I did review the logic, sanity checked the results manually, etc.
So... interestingly enough, at least on OSX (I'll have to try this again on a really high end bare metal EC2 instance) above 100, the ulimit doesn't make a notable difference. I did sample open fds initially and even in this test it peaked ~280. Below 100 you just get a higher variance. Note that differences aren't statistically significant even then.
Near identical. Found this surprising but double checked and yeah, access to more fds isn't the bottleneck.
This one I was curious about (maybe you get a different experience in a CLI tool because it progresses differently?) but no, super similar. I guess not too surprising a result though. Effectively after ~100ish fds you're still bottlenecked on disk I/O even if you can have more open files. So the |
|
Are you also aware that linking is also already limited by rayon? We perform the linking in the rayon threadpool where there is at most one thread per core. Could also be interesting to try to remove rayon and soley rely on the ionconcurrency semaphore. I have played with all these things a lot but your analysis seems much more capable. |



Description
In investigating prefix-dev/rattler-build#1638 I found that even if we did pass io_concurrency_limit to rattler from rattler-build, the link_package method that's actually used doesn't use the concurrency limit at all. So you'd get unbounded concurrency when spawning the linking tasks.