fix: RangePartitioning with native shuffle #2258

mbutrovich · 2025-08-28T16:12:06Z

Which issue does this PR close?

Closes #1906.

Rationale for this change

#1862 tried to implement RangePartitioning with native shuffle. The implementation didn't work because executors calculated their own partition boundaries.

What changes are included in this PR?

This modifies the flow for the driver to calculate the boundaries (like Spark). At a high level:

Hoist code from Spark's ShuffleExchangeExec for using Spark's RangePartitioner to calculate boundary rows.
Serialize boundary rows to native side.
Deserialize boundary rows and pass as part of the partitioning scheme. Each executor should have the boundary values now.

How are these changes tested?

New test implementing RangePartitioning does not yield correct results with native shuffle #1906.

Remaining concerns

What is the performance implication of iterating over these batches to calculate the bounds at the driver? Are we introducing significant overhead? It's what Spark does, but how does this approach affect Comet?
Should we remove the code for calculating boundary rows in native code? It's possible we (or someone else) might want this, so I left the tests and annotated them to allow dead code.
I change the default to enable this feature, but we can discuss setting it to false. For now I want to see how it does in CI.
I need to generate new golden plans if we decide to enable this feature by default.

…for native shuffle to consume. Added new test to represent apache#1906.

codecov-commenter · 2025-08-28T16:34:20Z

Codecov Report

❌ Patch coverage is 86.66667% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.29%. Comparing base (f09f8af) to head (522ef80).
⚠️ Report is 430 commits behind head on main.

Files with missing lines	Patch %	Lines
...t/execution/shuffle/CometNativeShuffleWriter.scala	73.52%	4 Missing and 5 partials ⚠️
...t/execution/shuffle/CometShuffleExchangeExec.scala	96.29%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##               main    #2258       +/-   ##
=============================================
- Coverage     56.12%   44.29%   -11.83%     
- Complexity      976     1106      +130     
=============================================
  Files           119      143       +24     
  Lines         11743    13373     +1630     
  Branches       2251     2397      +146     
=============================================
- Hits           6591     5924      -667     
- Misses         4012     6420     +2408     
+ Partials       1140     1029      -111

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…sult in 1 partition.

mbutrovich · 2025-08-28T21:00:46Z

The next challenge to figure out is adding some flexibility for dictionary-encoded columns. The current approach with one schema is too rigid.

…ow to handle dictionary encoding.

mbutrovich added 6 commits August 28, 2025 10:35

Use Spark's RangePartitioning to compute boundary rows and serialize …

709d6e9

…for native shuffle to consume. Added new test to represent apache#1906.

Fix warnings and benchmark compilation.

4ee3d8e

Fix benchmark bug.

6f34e35

Minor refactor.

332e76a

Cleanup to make it more clear what code came from Spark.

0eb1134

Fix errant comment.

bb67f73

mbutrovich self-assigned this Aug 28, 2025

mbutrovich changed the title ~~fix: RangePartitioning boundaries with native shuffle~~ fix: RangePartitioning with native shuffle Aug 28, 2025

mbutrovich added 2 commits August 28, 2025 13:50

Override partitioning scheme at serialization when num_partitions is 1.

abd8958

Override partitioning scheme at serialization when computed bounds re…

967d1a1

…sult in 1 partition.

mbutrovich added 2 commits August 29, 2025 10:42

Merge branch 'main' into fix_range_partitioning

7af9474

Remove string and binary range partitioning types until we sort out h…

522ef80

…ow to handle dictionary encoding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: RangePartitioning with native shuffle #2258

fix: RangePartitioning with native shuffle #2258

mbutrovich commented Aug 28, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 28, 2025 •

edited

Loading

Uh oh!

mbutrovich commented Aug 28, 2025

Uh oh!

Uh oh!

fix: RangePartitioning with native shuffle #2258

Are you sure you want to change the base?

fix: RangePartitioning with native shuffle #2258

Conversation

mbutrovich commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Remaining concerns

Uh oh!

codecov-commenter commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mbutrovich commented Aug 28, 2025

Uh oh!

Uh oh!

mbutrovich commented Aug 28, 2025 •

edited

Loading

codecov-commenter commented Aug 28, 2025 •

edited

Loading