Skip to content

Conversation

@mcheshkov
Copy link
Contributor

@mcheshkov mcheshkov commented Apr 14, 2025

Check List

  • Tests have been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Description of Changes Made (if issue reference is not provided)

LogicalPlan::Join and CrossJoin do not preserve the ordering semantically
When planned as HashJoin it will output batches in same order as they are coming from right stream
But both Join and CrossJoin will have same partitioning as right input (even when repartition_joins disabled), and these partitions can be collected in arbitrary order by CoalescePartitions

Side note: Substrait says that for both Join and Cross Product

Orderedness is empty post operation

See https://substrait.io/relations/logical_relations/#join-operation

@mcheshkov mcheshkov changed the title fix(cubesql): Fix SortPushDown pushing sort over joins fix(cubesql): Fix SortPushDown pushing sort through joins Apr 14, 2025
@codecov
Copy link

codecov bot commented Apr 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.54%. Comparing base (52da601) to head (fc2339b).
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #9464       +/-   ##
===========================================
+ Coverage   58.98%   80.54%   +21.56%     
===========================================
  Files         153      382      +229     
  Lines       12973    96521    +83548     
  Branches     2192     2192               
===========================================
+ Hits         7652    77743    +70091     
- Misses       5010    18467    +13457     
  Partials      311      311               
Flag Coverage Δ
cube-backend 58.98% <ø> (ø)
cubesql 83.89% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mcheshkov mcheshkov marked this pull request as ready for review April 14, 2025 14:30
@mcheshkov mcheshkov requested a review from a team as a code owner April 14, 2025 14:30
LogicalPlan::Join and CrossJoin do not preserve the ordering semantically
When planned as HashJoin it will output batches in same order as they are coming from right stream
But both Join and CrossJoin will have same partitioning as right input (even when repartition_joins disabled), and these partitions can be collected in arbitrary order by CoalescePartitions
See https://github.com/apache/datafusion/blob/7.0.0/datafusion/src/physical_plan/hash_join.rs#L282-L284
See https://github.com/apache/datafusion/blob/7.0.0/datafusion/src/physical_plan/cross_join.rs#L141-L143

Also, Substrait says that for both Join and Cross Product
> Orderedness is empty post operation
See https://substrait.io/relations/logical_relations/#join-operation
@mcheshkov mcheshkov force-pushed the sort-push-down-join branch from b9303ee to fc2339b Compare April 22, 2025 09:21
@mcheshkov mcheshkov merged commit fed08e1 into master Apr 22, 2025
81 checks passed
@mcheshkov mcheshkov deleted the sort-push-down-join branch April 22, 2025 10:18
marianore-muttdata pushed a commit to MuttData/cube that referenced this pull request Jun 17, 2025
LogicalPlan::Join and CrossJoin do not preserve the ordering semantically
When planned as HashJoin it will output batches in same order as they are coming from right stream
But both Join and CrossJoin will have same partitioning as right input (even when repartition_joins disabled), and these partitions can be collected in arbitrary order by CoalescePartitions

See https://github.com/apache/datafusion/blob/7.0.0/datafusion/src/physical_plan/hash_join.rs#L282-L284
See https://github.com/apache/datafusion/blob/7.0.0/datafusion/src/physical_plan/cross_join.rs#L141-L143

Side note: Substrait says that for both Join and Cross Product

> Orderedness is empty post operation

See https://substrait.io/relations/logical_relations/#join-operation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants