Skip to content

fuzz test failure: corr null vs Nan #2646

@andygrove

Description

@andygrove

Describe the bug

SQL

SELECT c3, c42, corr(c20, c6) FROM test0 GROUP BY c3,c42 ORDER BY c3, c42;

Spark Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(3) Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 1
         +- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91129]
            +- *(2) HashAggregate(keys=[c3#3, c42#42], functions=[corr(c20#20, c6#6)], output=[c3#3, c42#42, corr(c20, c6)#28057])
               +- AQEShuffleRead coalesced
                  +- ShuffleQueryStage 0
                     +- Exchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, [plan_id=91101]
                        +- *(1) HashAggregate(keys=[c3#3, c42#42], functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038, xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
                           +- *(1) ColumnarToRow
                              +- FileScan parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
+- == Initial Plan ==
   Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91083]
      +- HashAggregate(keys=[c3#3, c42#42], functions=[corr(c20#20, c6#6)], output=[c3#3, c42#42, corr(c20, c6)#28057])
         +- Exchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, [plan_id=91080]
            +- HashAggregate(keys=[c3#3, c42#42], functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038, xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
               +- FileScan parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>

Comet Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(1) CometColumnarToRow
   +- CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST]
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 1
            +- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91263]
               +- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156, yAvg#28157, ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42], [corr(c20#20, c6#6)]
                  +- AQEShuffleRead coalesced
                     +- ShuffleQueryStage 0
                        +- CometColumnarExchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91217]
                           +- CometHashAggregate [c3#3, c6#6, c20#20, c42#42], Partial, [c3#3, c42#42], [partial_corr(c20#20, c6#6)]
                              +- CometScan [native_iceberg_compat] parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
+- == Initial Plan ==
   CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST]
   +- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91198]
      +- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156, yAvg#28157, ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42], [corr(c20#20, c6#6)]
         +- CometColumnarExchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91196]
            +- CometHashAggregate [c3#3, c6#6, c20#20, c42#42], Partial, [c3#3, c42#42], [partial_corr(c20#20, c6#6)]
               +- CometScan [native_iceberg_compat] parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>

First difference at row 150:
Spark: 1190973260,[3333-01-21T01:11:48.781],NULL
Comet: 1190973260,[3333-01-21T01:11:48.781],NaN

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions