-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
ignore: gracefully quit worker threads upon panic in ParallelVisitor #3010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ignore: gracefully quit worker threads upon panic in ParallelVisitor #3010
Conversation
| + std::panic::UnwindSafe | ||
| + Send | ||
| + Sync | ||
| + 'static, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I think this is plausibly too big of a breaking change for me to stomach in a semver compatible release. And I don't have the bandwidth to do a semver incompatible release right now. Which kind of makes this PR stuck.
One possible way to get this unstuck is to add a new API that avoid this problem and deprecate the old one. But I'd only want to go this route if it was a very small addition that doesn't significantly complicate the implementation. I'm not sure if such a solution exists.
Otherwise, I'm inclined to leave this bug open for now, and just something to address in the next semver incompatible release.
|
Closing, but keeping #3010 open. |
|
Also, thank you for your work on this! Even though I ended up not taking this PR, your work will be something worth building on for |
|
Wanted to confirm here that:
And I am super glad to hear I would like you to know that as usual, it is extremely difficult to improve upon the kind of work you do. I have tried to argue with your comments in my head and largely failed. I have spent months investigating this, and I have had to introduce some immense complexity in order to make a pretty minor use case more robust. Great work. I'm also looking at the API, and while I'm not done with mine yet, I would absolutely recommend considering the use of I also think the deadlock that occurred here is a symptom of a more general misalignment of the threading model to the input distribution. I think work stealing + recursive data generation (so each work item can always produce more) is necessarily prone to this deadlock. I do not have a proof of this, and I also do not have a more appropriate answer myself yet. I do understand that work stealing is a good approach to avoid having a single very large directory fill up its own queue. I also wonder whether there is any benefit to maximizing thread locality. Finally, I was informed about a very widely available syscall I am actually very confident that supporting
Again, very impressed by your engineering judgement, and I appreciate when you write a comment about decisions you didn't take too. Keep it up!! |
Fixes #3009.
Problem
WalkParallel::visit()will nondeterministically hang if theParallelVisitor::visit()implementation panics. This also occurs when providing a closure toWalkParallel::run(). Minimal repro is provided in #3009:The above code will nondeterministically hang, because of an infinite loop when no new work is available:
ripgrep/crates/ignore/src/walk.rs
Lines 1695 to 1707 in de4baa1
Solution
quit_nowflag in our wait loop.run()method and setquit_nowbefore propagating the panic.Breaking change: In order to ensure soundness, we also enforce that the filter method provided to
WalkBuilder#filter_entry()isUnwindSafe. Users can circumvent this by wrapping inAssertUnwindSafeas needed.Result
The added test
panic_in_parallel()always succeeds instead of hanging.