Skip to content

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Enable column pruning and predicate pushdown in DSV2 streaming.
The pushdown happens during analysis instead of relying on the optimizer. The streaming execution needs an actual V2 Scan early so we can materialize a SparkDataStream via Scan.toMicroBatchStream or Scan.toContinuousStream.

Why are the changes needed?

To reduce data read and compute in streaming queries by pushing filters and projecting only needed columns into DSv2 readers, aligning streaming with batch DSv2 capabilities.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@gengliangwang
Copy link
Member Author

cc @jerrypeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant