Skip to content

Conversation

josh0yeh
Copy link

@josh0yeh josh0yeh commented Aug 8, 2025

Before Spark 3.4.3, ParquetFilters were added originally so we could shade Parquet in Comet. Since, ParquetFilters has updated In/NotIn pushdown which comet needs. Remove ParquetFilters from Comet. Also refactor comet native filter functionalities into CometNativeFilter.scala

Before Spark 3.4.3, `ParquetFilters` were added originally so we could shade Parquet in Comet. Since, `ParquetFilters` has updated In/NotIn pushdown which comet needs. Remove `ParquetFilters` from Comet. Also refactor comet native filter functionalities into CometNativeFilter.scala
@josh0yeh josh0yeh marked this pull request as draft August 8, 2025 21:13
@josh0yeh josh0yeh marked this pull request as ready for review August 8, 2025 21:15
@josh0yeh
Copy link
Author

josh0yeh commented Aug 8, 2025

CC: @andygrove @parthchandra (please feel free to cc others I might overlook)

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh0yeh

}

predicate match {
case sources.IsNull(name) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we used to have canMakeFilterOn. Do we have equivalent checks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like canMakeFilterOn was a private method in ParquetFilters and there is an identical implementation in Spark's ParquetFilters

@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2025

Codecov Report

❌ Patch coverage is 70.96774% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.87%. Comparing base (f09f8af) to head (0a3ca0d).
⚠️ Report is 408 commits behind head on main.

Files with missing lines Patch % Lines
.../org/apache/comet/parquet/CometNativeFilters.scala 70.65% 9 Missing and 18 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2100      +/-   ##
============================================
+ Coverage     56.12%   58.87%   +2.74%     
- Complexity      976     1115     +139     
============================================
  Files           119      140      +21     
  Lines         11743    12623     +880     
  Branches       2251     2185      -66     
============================================
+ Hits           6591     7432     +841     
- Misses         4012     4075      +63     
+ Partials       1140     1116      -24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@parthchandra
Copy link
Contributor

@josh0yeh The PR is causing a regressions. Specifically, tests are failing because of -

[info]   Cause: java.lang.UnsupportedOperationException: visit in is not supported.
[info]   at org.apache.parquet.filter2.predicate.FilterPredicate$Visitor.visit(FilterPredicate.java:81)
[info]   at org.apache.parquet.filter2.predicate.Operators$In.accept(Operators.java:323)
[info]   at org.apache.comet.parquet.BloomFilterReader.visit(BloomFilterReader.java:125)
[info]   at org.apache.comet.parquet.BloomFilterReader.visit(BloomFilterReader.java:47)
[info]   at org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:573)
[info]   at org.apache.comet.parquet.RowGroupFilter.visit(RowGroupFilter.java:102)
[info]   at org.apache.comet.parquet.RowGroupFilter.visit(RowGroupFilter.java:36)
[info]   at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:157)
[info]   at org.apache.comet.parquet.RowGroupFilter.filterRowGroups(RowGroupFilter.java:50)
[info]   at org.apache.comet.parquet.FileReader.filterRowGroups(FileReader.java:827)
[info]   at org.apache.comet.parquet.FileReader.filterRowGroups(FileReader.java:806)
[info]   at org.apache.comet.parquet.FileReader.<init>(FileReader.java:218)
[info]   at org.apache.comet.parquet.BatchReader.init(BatchReader.java:264)

Can you investigate ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants