Commit 09d9338
authored
fix: adjust CometNativeScan's doCanonicalize and hashCode for AQE, use DataSourceScanExec trait (#1578)
## Which issue does this PR close?
Addresses another failure in #1441.
## Rationale for this change
`CometExecSuite.explain native plan` fails with `native_datafusion` experimental scan. It's an interesting query that does a self-join of two columns from the same table. The root case is that when AQE is enabled, it would reuse the shuffle output from one scan as the output of the other scan:
```
+- == Initial Plan ==
CometProject [_1#6], [_1#6]
+- CometSortMergeJoin [_1#6], [_2#11], Inner
:- CometSort [_1#6], [_1#6 ASC NULLS FIRST]
: +- CometExchange hashpartitioning(_1#6, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=304]
: +- CometFilter [_1#6], isnotnull(_1#6)
: +- CometNativeScan: [_1#6]
+- CometSort [_2#11], [_2#11 ASC NULLS FIRST]
+- CometExchange hashpartitioning(_2#11, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=308]
+- CometFilter [_2#11], isnotnull(_2#11)
+- CometNativeScan: [_2#11]
```
AQE incorrectly adds a `ReusedExchange` on the left side with the same `plan_id` as the right side of the join.
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(1) CometColumnarToRow
+- CometProject [_1#6], [_1#6]
+- CometBroadcastHashJoin [_1#6], [_2#11], Inner, BuildRight
:- AQEShuffleRead coalesced
: +- ShuffleQueryStage 0
: +- CometExchange hashpartitioning(_1#6, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=304]
: +- CometFilter [_1#6], isnotnull(_1#6)
: +- CometNativeScan: [_1#6]
+- BroadcastQueryStage 2
+- CometBroadcastExchange [_2#11]
+- AQEShuffleRead local
+- ShuffleQueryStage 1
+- ReusedExchange [_2#11], CometExchange hashpartitioning(_1#6, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=304]
```
The reason is that `hashCode()` for `CometNativeScan` is only defined as the output of the node, so the `TrieMap` used in AQE (which hashes the `SparkPlan`) resulted in the stages having the same hash value (after canonicalization), making AQE think that one stage could be reused for the other.
## What changes are included in this PR?
- Expand `hashCode` to include the original `FileSourceScanExec` and `serializedPlanOpt` which has better info about the node. I'd like to understand if this is hashing too much information, and may make stages that could be reused appear to distinct, but need to dig into AQE behavior more.
- Expand `equals` to check more than just the plan output.
- Expand `doCanonicalize` based on behavior seen in `CometScan` node. Similar to above: I'd like to understand if this is canonicalizing the right information, but need to dig into AQE behavior more.
- `CometNativeScan` now uses the `DataSourceScanExec` trait. The benefit here is that we get more detailed information in the Spark plan. For example, explain before (note the `CometNativeScan`):
```
CometProject [_1#6], [_1#6]
+- CometSortMergeJoin [_1#6], [_2#11], Inner
:- CometSort [_1#6], [_1#6 ASC NULLS FIRST]
: +- CometExchange hashpartitioning(_1#6, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=304]
: +- CometFilter [_1#6], isnotnull(_1#6)
: +- CometNativeScan: [_1#6]
+- CometSort [_2#11], [_2#11 ASC NULLS FIRST]
+- CometExchange hashpartitioning(_2#11, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=308]
+- CometFilter [_2#11], isnotnull(_2#11)
+- CometNativeScan: [_2#11]
```
and explain now (note the `CometNativeScan`):
```
CometProject [_1#6], [_1#6]
+- CometSortMergeJoin [_1#6], [_2#11], Inner
:- CometSort [_1#6], [_1#6 ASC NULLS FIRST]
: +- CometExchange hashpartitioning(_1#6, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=91]
: +- CometFilter [_1#6], isnotnull(_1#6)
: +- CometNativeScan parquet [_1#6] Batched: true, DataFilters: [isnotnull(_1#6)], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/12/4pf3d5zn72n7q2_0ks3bkh7c0000gn/T/spark-8f..., PartitionFilters: [], PushedFilters: [IsNotNull(_1)], ReadSchema: struct<_1:int>
+- CometSort [_2#11], [_2#11 ASC NULLS FIRST]
+- CometExchange hashpartitioning(_2#11, 10), ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=95]
+- CometFilter [_2#11], isnotnull(_2#11)
+- CometNativeScan parquet [_2#11] Batched: true, DataFilters: [isnotnull(_2#11)], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/12/4pf3d5zn72n7q2_0ks3bkh7c0000gn/T/spark-8f..., PartitionFilters: [], PushedFilters: [IsNotNull(_2)], ReadSchema: struct<_2:int>
```
This better represents a corresponding Spark plan with its `FileScan` node:
```
Project [_1#6]
+- SortMergeJoin [_1#6], [_2#11], Inner
:- Sort [_1#6 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(_1#6, 10), ENSURE_REQUIREMENTS, [plan_id=126]
: +- Filter isnotnull(_1#6)
: +- FileScan parquet [_1#6] Batched: true, DataFilters: [isnotnull(_1#6)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/12/4pf3d5zn72n7q2_0ks3bkh7c0000gn/T/spark-8f..., PartitionFilters: [], PushedFilters: [IsNotNull(_1)], ReadSchema: struct<_1:int>
+- Sort [_2#11 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(_2#11, 10), ENSURE_REQUIREMENTS, [plan_id=127]
+- Filter isnotnull(_2#11)
+- FileScan parquet [_2#11] Batched: true, DataFilters: [isnotnull(_2#11)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/12/4pf3d5zn72n7q2_0ks3bkh7c0000gn/T/spark-8f..., PartitionFilters: [], PushedFilters: [IsNotNull(_2)], ReadSchema: struct<_2:int>
```
- `doCanonicalize` reused a method from `CometScanExec` so I moved it to a new common `CometScanUtils`.
## How are these changes tested?
Existing tests. Enabled one previously skipped test for `native_datafusion`.1 parent 68199c2 commit 09d9338
File tree
4 files changed
+69
-20
lines changed- spark/src
- main/scala/org/apache/spark/sql/comet
- test/scala/org/apache/comet/exec
4 files changed
+69
-20
lines changedLines changed: 35 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| 28 | + | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
| |||
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
56 | | - | |
| 58 | + | |
| 59 | + | |
57 | 60 | | |
58 | | - | |
59 | | - | |
| 61 | + | |
60 | 62 | | |
61 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
62 | 67 | | |
63 | 68 | | |
64 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
65 | 88 | | |
66 | 89 | | |
67 | 90 | | |
68 | 91 | | |
69 | 92 | | |
70 | 93 | | |
71 | | - | |
| 94 | + | |
72 | 95 | | |
73 | 96 | | |
74 | 97 | | |
75 | 98 | | |
76 | 99 | | |
77 | 100 | | |
78 | | - | |
| 101 | + | |
79 | 102 | | |
80 | 103 | | |
81 | 104 | | |
| |||
153 | 176 | | |
154 | 177 | | |
155 | 178 | | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
156 | 184 | | |
157 | 185 | | |
158 | 186 | | |
| |||
Lines changed: 1 addition & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
459 | 459 | | |
460 | 460 | | |
461 | 461 | | |
462 | | - | |
463 | | - | |
464 | | - | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
469 | 462 | | |
470 | 463 | | |
471 | 464 | | |
472 | 465 | | |
473 | 466 | | |
474 | 467 | | |
475 | | - | |
| 468 | + | |
476 | 469 | | |
477 | 470 | | |
478 | 471 | | |
| |||
Lines changed: 33 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
Lines changed: 0 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
813 | 813 | | |
814 | 814 | | |
815 | 815 | | |
816 | | - | |
817 | | - | |
818 | | - | |
819 | | - | |
820 | | - | |
821 | 816 | | |
822 | 817 | | |
823 | 818 | | |
| |||
0 commit comments