refactor: add WaitGroup for waiting for a dynamic set of tasks #2046
refactor: add WaitGroup for waiting for a dynamic set of tasks #2046katrinafyi wants to merge 21 commits intolycheeverse:masterfrom
Conversation
threaded synchronisation primitives (like std::sync::Mutex). if we were unlucky, these can block the tokio runtime. replace std::sync::Arc with a system based on (an even thinner wrapper around) tokio::mspc::sync::channel.
mre
left a comment
There was a problem hiding this comment.
Lgtm.
@thomas-zahner, any thoughts before we merge this?
|
Very cool PR! I don't see any problems with it. @katrinafyi Could you add a test to waiter.rs to to test if recursion would really work that way? So that the unused part ( |
i thought quicksort would be easy...
the Infallible name only really makes sense for Result<T, Infallible>
|
@thomas-zahner That's a really interesting idea. I've added a (hopefully) simple example which you can see here: https://github.com/lycheeverse/lychee/compare/74046a9..b27a34f |
thomas-zahner
left a comment
There was a problem hiding this comment.
@katrinafyi Thank you, that's very cool! This helps my understanding quite a bit.
| /// The given `waiter` will be used to detect when the work has finished and it will | ||
| /// close the channels. Additionally, `waiter` can be omitted to show that without | ||
| /// the [`WaitGroup`], the tasks would not terminate. | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Either move into mod test or annotate with #[cfg(test)], this should resolve the #[allow(dead_code)], right?
There was a problem hiding this comment.
Hmm I put it outside of mod tests because I wanted it to be included in the docs.
I tried a few things and it's a bit tricky. We need compiling to work (1) normally, (2) in docs, and (3) in tests.
#[cfg(any(test, doc))]on the examples: dev-dependencies is not available indoc, so tokio-stream has to be in dependencies. but then, it's flagged as an unused crate in normal compilation.- putting inside mod test and making test module
#[cfg(any(test, doc))]has the same problems
Do you have thoughts or other suggestions?
| /// | ||
| /// This is wildly inefficient because it does not cache any results. Computing | ||
| /// `F(n)` would generate `O(2^n)` channel items. | ||
| #[allow(dead_code)] |
| strum = { version = "0.27.2", features = ["derive"] } | ||
| thiserror = "2.0.18" | ||
| tokio = { version = "1.49.0", features = ["full"] } | ||
| tokio-stream = "0.1.18" |
There was a problem hiding this comment.
This is only depended on in the new test, right? If so, move it to the [dev-dependencies] group
Co-authored-by: Thomas Zahner <thomas.zahner@protonmail.ch>
This is a small step towards recursion. It will enable waiting for a dynamic number of tasks which can spawn other tasks, which allows for correct and deadlock-free termination in the presence of (eventual) recursion.
I think recursion is not an astoundingly difficult feature but, because lychee is async and stream, it needs the combination of a few parts which might be tricky to get right. This is one of those parts. In fact, I had two different deadlock bugs while working on this PR before reaching the final implementation. If this was all done in one big recursion PR, then I think these bugs would be a lot harder to find and fix.
The old implementation in #1603 used
Arc<AtomicUsize>to implement this requirement. This new PR uses a RAII-based guard value to represent tasks, which should make it much easier to write correct code - it's all handled by the borrow checker and Drop, so it's impossible to forget to adjust the counter.In terms of testing, changes the check.rs command to rely on WaitGroup for termination. You can read about this in the comments. This means that every run of lychee goes through the WaitGroup code and acts as a test for WaitGroup. There is a risk that if it is buggy, the impacts would be big. However, I think the unit tests are big enough that they're a decent exercise of the WaitGroup. My second deadlock bug was found by
make test.You can also see the docs at https://waitgroup.lychee-docs-katrinafyi.pages.dev/lychee_lib/waiter/
Related work: