Skip to content

refactor: add WaitGroup for waiting for a dynamic set of tasks #2046

Open
katrinafyi wants to merge 21 commits intolycheeverse:masterfrom
rina-forks:waitgroup
Open

refactor: add WaitGroup for waiting for a dynamic set of tasks #2046
katrinafyi wants to merge 21 commits intolycheeverse:masterfrom
rina-forks:waitgroup

Conversation

@katrinafyi
Copy link
Member

@katrinafyi katrinafyi commented Feb 14, 2026

This is a small step towards recursion. It will enable waiting for a dynamic number of tasks which can spawn other tasks, which allows for correct and deadlock-free termination in the presence of (eventual) recursion.

I think recursion is not an astoundingly difficult feature but, because lychee is async and stream, it needs the combination of a few parts which might be tricky to get right. This is one of those parts. In fact, I had two different deadlock bugs while working on this PR before reaching the final implementation. If this was all done in one big recursion PR, then I think these bugs would be a lot harder to find and fix.

The old implementation in #1603 used Arc<AtomicUsize> to implement this requirement. This new PR uses a RAII-based guard value to represent tasks, which should make it much easier to write correct code - it's all handled by the borrow checker and Drop, so it's impossible to forget to adjust the counter.

In terms of testing, changes the check.rs command to rely on WaitGroup for termination. You can read about this in the comments. This means that every run of lychee goes through the WaitGroup code and acts as a test for WaitGroup. There is a risk that if it is buggy, the impacts would be big. However, I think the unit tests are big enough that they're a decent exercise of the WaitGroup. My second deadlock bug was found by make test.

You can also see the docs at https://waitgroup.lychee-docs-katrinafyi.pages.dev/lychee_lib/waiter/

Related work:

threaded synchronisation primitives (like std::sync::Mutex). if we were
unlucky, these can block the tokio runtime. replace std::sync::Arc with
a system based on (an even thinner wrapper around)
tokio::mspc::sync::channel.
Copy link
Member

@mre mre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm.

@thomas-zahner, any thoughts before we merge this?

@thomas-zahner
Copy link
Member

Very cool PR! I don't see any problems with it.

@katrinafyi Could you add a test to waiter.rs to to test if recursion would really work that way? So that the unused part (// unused for now, but will be used for recursion eventually.) is covered and tested already. This would be great documentation and help me (and others) understand how this is used in a simpler scenario without lychee's additional complexity.

i thought quicksort would be easy...
the Infallible name only really makes sense for Result<T, Infallible>
@katrinafyi
Copy link
Member Author

@thomas-zahner That's a really interesting idea. I've added a (hopefully) simple example which you can see here: https://github.com/lycheeverse/lychee/compare/74046a9..b27a34f

Copy link
Member

@thomas-zahner thomas-zahner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katrinafyi Thank you, that's very cool! This helps my understanding quite a bit.

/// The given `waiter` will be used to detect when the work has finished and it will
/// close the channels. Additionally, `waiter` can be omitted to show that without
/// the [`WaitGroup`], the tasks would not terminate.
#[allow(dead_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either move into mod test or annotate with #[cfg(test)], this should resolve the #[allow(dead_code)], right?

Copy link
Member Author

@katrinafyi katrinafyi Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I put it outside of mod tests because I wanted it to be included in the docs.

I tried a few things and it's a bit tricky. We need compiling to work (1) normally, (2) in docs, and (3) in tests.

  • #[cfg(any(test, doc))] on the examples: dev-dependencies is not available in doc, so tokio-stream has to be in dependencies. but then, it's flagged as an unused crate in normal compilation.
  • putting inside mod test and making test module #[cfg(any(test, doc))] has the same problems

Do you have thoughts or other suggestions?

///
/// This is wildly inefficient because it does not cache any results. Computing
/// `F(n)` would generate `O(2^n)` channel items.
#[allow(dead_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

strum = { version = "0.27.2", features = ["derive"] }
thiserror = "2.0.18"
tokio = { version = "1.49.0", features = ["full"] }
tokio-stream = "0.1.18"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only depended on in the new test, right? If so, move it to the [dev-dependencies] group

Co-authored-by: Thomas Zahner <thomas.zahner@protonmail.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments