Skip to content

Conversation

@jayshrivastava
Copy link
Collaborator

@jayshrivastava jayshrivastava commented Jul 30, 2025

Previously, wait_for would deadlock when used in an async runtime (ie. any async function calls wait_for directly or indirectly). Now, wait_for returns an error if called in an async runtime. There's no reason to call it in an async runtime / function because you can simply await the future. This change updates a few places where wait_for is called to be async functions and just wait on the future.

Why did the old Spawner deadlock? Handle::current().block_on(f) and the std::sync::mpsc::channel would both block an executor thread in tokio (this is a very bad practice generally). If there's no executer threads available, you can't run anything. Now, the closure just does f.await instead of block_on, which does not block an executor. The channel is now a tokio::sync::mpsc::channel as well with a buffer size of 1, so the async send will not block (I think a buffered std channel would have worked, but it's better to use tokio implementations generally). The receiver of the channel is in a sync runtime, so it does a blocking receive.

Testing

  • unit tests now pass
  • added a unit test for wait_for returning an error in an async runtime
  • test for nested wait_for calls which now return an error

Closes https://github.com/datafusion-contrib/datafusion-distributed/issues/63

@jayshrivastava jayshrivastava force-pushed the js/ntran/tests/copy branch 2 times, most recently from 545c4f4 to 269516f Compare July 30, 2025 19:40
@jayshrivastava jayshrivastava changed the title Js/ntran/tests/copy 269516f12aee624930c6035bb56265ad8465cc08 Jul 30, 2025
Previously, `wait_for` would deadlock when used in an async runtime (ie. any async function calls `wait_for` directly or indirectly). Now, `wait_for` returns an error if called in an async runtime. There's no reason to call it in an async runtime / function because you can simply await the future. This change updates a few places where `wait_for` is called to be async functions and just wait on the future.

Why did the old `Spawner` deadlock? `Handle::current().block_on(f)` and the `std::sync::mpsc::channel` would both block an executor thread in tokio (this is a very bad practice generally). If there's no executer threads available, you can't run anything. Now, the closure just does `f.await` instead of `block_on`, which does not block an executor. The channel is now a `tokio::sync::mpsc::channel` as well with a buffer size of 1, so the async send will not block (I think a buffered std channel would have worked, but it's better to use tokio implementations generally). The receiver of the channel is in a sync runtime, so it does a blocking receive.

Testing
- unit tests now pass
- added a unit test for `wait_for` returning an error in an async runtime
- test for nested `wait_for` calls which now return an error

Closes https://github.com/datafusion-contrib/datafusion-distributed/issues/63
@jayshrivastava jayshrivastava changed the title 269516f12aee624930c6035bb56265ad8465cc08 util: ensure wait_for can only be used in a non-async runtime Jul 30, 2025
@jayshrivastava jayshrivastava changed the base branch from main to ntran/tests July 30, 2025 19:41
@jayshrivastava jayshrivastava marked this pull request as ready for review July 30, 2025 19:42
@jayshrivastava jayshrivastava changed the base branch from ntran/tests to main July 30, 2025 20:21
@jayshrivastava jayshrivastava changed the base branch from main to ntran/tests July 30, 2025 20:28
@gabotechs gabotechs deleted the js/ntran/tests/copy branch August 4, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants