[SPARK-53805][SQL] Push Variant into DSv2 scan #52522

huaxingao · 2025-10-06T03:42:56Z

What changes were proposed in this pull request?

Push Variant into DSv2 scan

Why are the changes needed?

with the change, DSV2 scan only needs to fetch the necessary shredded columns required by the plan

Does this PR introduce any user-facing change?

No

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun

Thank you so much, @huaxingao .

cc @chenhao-db and @cloud-fan from SPARK-53805 .

#49235

dongjoon-hyun

+1, LGTM from my side.

dongjoon-hyun · 2025-10-07T21:15:02Z

cc @peter-toth , too.

singhpk234 · 2025-10-07T21:52:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PushVariantIntoScan.scala

      hadoopFsRelation@HadoopFsRelation(_, _, _, _, _: ParquetFileFormat, _), _)) =>
        rewritePlan(p, projectList, filters, relation, hadoopFsRelation)
+      case p@PhysicalOperation(projectList, filters, relation: DataSourceV2Relation) =>
+        rewriteV2RelationPlan(p, projectList, filters, relation.output, relation)


if we are sending the relation already do we need to send the relation.output seperately ?

I overlooked this. Removed.

singhpk234 · 2025-10-07T22:35:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala

      SchemaPruning,
      GroupBasedRowLevelOperationScanPlanning,
      V1Writes,
+      PushVariantIntoScan,


now PushVariantIntoScan runs before the PruneFileSourcePartition, which i think was for v1 sources, does this matter or if i were to ask did we just like add in later, just because it was a new rule ?

I don't think variant columns will ever be used in the partition schema. Schema transformations by PushVariantIntoScan shouldn't affect partition pruning in v1 sources.

[SPARK-53805][SQL] Push Variant into DSv2 scan

cd8e0d7

github-actions bot added the SQL label Oct 6, 2025

add new line at end of file

c8b9df5

dongjoon-hyun reviewed Oct 7, 2025

View reviewed changes

huaxingao mentioned this pull request Oct 7, 2025

Spark 4.0: Add variant round trip test for Spark apache/iceberg#14276

Open

dongjoon-hyun approved these changes Oct 7, 2025

View reviewed changes

singhpk234 reviewed Oct 7, 2025

View reviewed changes

address comments

2092fce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53805][SQL] Push Variant into DSv2 scan #52522

[SPARK-53805][SQL] Push Variant into DSv2 scan #52522

Uh oh!

huaxingao commented Oct 6, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Oct 7, 2025

Uh oh!

singhpk234 Oct 7, 2025

Uh oh!

huaxingao Oct 7, 2025

Uh oh!

singhpk234 Oct 7, 2025

Uh oh!

huaxingao Oct 7, 2025

Uh oh!

Uh oh!

[SPARK-53805][SQL] Push Variant into DSv2 scan #52522

Are you sure you want to change the base?

[SPARK-53805][SQL] Push Variant into DSv2 scan #52522

Uh oh!

Conversation

huaxingao commented Oct 6, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 7, 2025

Uh oh!

singhpk234 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

huaxingao Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

singhpk234 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

huaxingao Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!