Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Dec 18, 2025

Which issue does this PR close?

N/A.

Rationale for this change

The shuffle writer was calling futures::executor::block_on() from within an async function running on Tokio's runtime. This blocks the Tokio worker thread and prevents other tasks from running.

Specific Problems

  1. Blocks worker threads: Uses thread::park() to block the current Tokio worker thread, preventing Tokio from scheduling other tasks on it
  2. Bypasses Tokio runtime: Uses its own polling infrastructure instead of Tokio's scheduler
  3. Hot path performance issue: Runs for every batch during shuffle operations

Sources

I am not a Tokio optimization expert, so I'll defer to some sources I read and maybe others can tell me if I've misunderstood:

From Rust Async Book:

"calling a blocking function in a synchronous method would block the whole thread, [while] blocked Futures will yield control of the thread, allowing other Futures to run."

"The .await operator doesn't block the current thread, but instead asynchronously waits for the future to complete, allowing other tasks to run if the future is currently unable to make progress."

From futures-rs source code, block_on internally uses thread::park() to block the current thread when waiting for the future to complete.

From Tokio documentation:

"issuing a blocking call or performing a lot of compute in a future without yielding is problematic, as it may prevent the executor from driving other tasks forward."

What changes are included in this PR?

  • Replaced block_on(repartitioner.insert_batch(batch?))? with repartitioner.insert_batch(batch?).await?

This maintains the same sequential behavior (waiting for each batch to complete before getting the next one) but does so cooperatively without blocking the thread.

How are these changes tested?

Existing tests. I'll try to run a benchmark too.

@mbutrovich mbutrovich requested a review from andygrove December 18, 2025 20:40
@mbutrovich mbutrovich changed the title perf: [WIP] Don't use block_on in async shuffle writer perf: Use await instead of block_on in async shuffle writer Dec 18, 2025
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mbutrovich mbutrovich changed the title perf: Use await instead of block_on in async shuffle writer perf: Use await instead of block_on in native shuffle writer Dec 18, 2025
@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.61%. Comparing base (f09f8af) to head (c80af44).
⚠️ Report is 776 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2937      +/-   ##
============================================
+ Coverage     56.12%   59.61%   +3.48%     
- Complexity      976     1368     +392     
============================================
  Files           119      167      +48     
  Lines         11743    15430    +3687     
  Branches       2251     2550     +299     
============================================
+ Hits           6591     9198    +2607     
- Misses         4012     4946     +934     
- Partials       1140     1286     +146     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich mbutrovich merged commit 7ff3cbe into apache:main Dec 18, 2025
123 checks passed
@mbutrovich mbutrovich deleted the native_shuffle_block branch December 18, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants