-
Notifications
You must be signed in to change notification settings - Fork 3
refactor: Extract split filter provider interface to improve flexibility for user customization. #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Extract split filter provider interface to improve flexibility for user customization. #46
Conversation
WalkthroughReplaces the legacy metadata-filter mechanism with a new split-filter framework. Introduces ClpSplitFilterProvider (with a MySQL implementation), updates wiring (connector, module, optimizer, provider), adjusts config and error codes, deletes ClpMetadataFilterProvider, updates docs, and adds tests and a JSON resource for the new split-filter configuration. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Planner as Presto Planner
participant Optimizer as ClpPlanOptimizer
participant Rewriter as Plan Rewriter
participant Provider as ClpSplitFilterProvider
Note over Planner,Optimizer: Physical planning phase
User->>Planner: Submit query
Planner->>Optimizer: optimize(plan)
Optimizer->>Rewriter: rewrite(plan)
Rewriter->>Rewriter: visitTableScan -> collect tableScope
alt Filter on TableScan
Rewriter->>Provider: getColumnNames(tableScope)
Rewriter->>Provider: remapSplitFilterPushDownExpression(tableScope, pushdownExpr)
Rewriter-->>Optimizer: updated Filter/TableScan nodes
else No filter visited but tables seen
Optimizer->>Provider: checkContainsRequiredFilters({tableScopes}, "")
Provider-->>Optimizer: ok or throws
end
Optimizer-->>Planner: optimized plan
sequenceDiagram
autonumber
participant Guice as Guice Binder
participant Config as ClpConfig
participant Module as ClpModule
participant Provider as ClpSplitFilterProvider
Note over Module,Provider: Startup binding
Guice->>Config: read clp.split-filter-provider-type
alt type == MYSQL
Guice->>Module: bind(ClpSplitFilterProvider -> ClpMySqlSplitFilterProvider)
Guice->>Provider: inject with clp.split-filter-config
else unsupported
Module->>Guice: throw CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
|
Note that this PR exceeds the 400 lines is because I renamed some files and added a lot of new docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (13)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java(0 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java(5 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java(1 hunks)presto-clp/src/test/resources/test-mysql-metadata-filter.json(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst(2 hunks)
🧠 Learnings (11)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
💤 Files with no reviewable changes (1)
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
🔇 Additional comments (24)
presto-clp/src/test/resources/test-mysql-metadata-filter.json (1)
15-20: LGTM! The nested structure supports the new extensible architecture.The addition of the
metadataDatabaseSpecificwrapper aroundrangeMappingaligns well with the new abstractClpMetadataFilterProviderdesign, enabling database-specific filter configurations while maintaining backwards compatibility with the existing range mapping logic.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)
19-20: Import updates reflect the package restructuring correctly.The imports have been properly updated to reference the new
metadata.filterpackage location for the restructured filter provider classes.
48-48: Conditional binding maintains consistency with the existing pattern.The conditional binding of
ClpMetadataFilterProvidertoClpMySqlMetadataFilterProviderfollows the same pattern as the metadata provider binding above, ensuring consistency in the dependency injection configuration.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2)
17-17: Import correctly updated for the new package structure.The import has been properly updated to reference the
ClpMetadataFilterProviderfrom its new location in themetadata.filterpackage.
98-98: Method name change improves clarity and specificity.The method name change from
remapFilterSqltoremapMetadataFilterPushDownbetter describes the specific purpose of remapping metadata filter pushdown expressions, making the code more self-documenting.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (2)
18-18: Import correctly updated for the package restructuring.The import statement properly reflects the new package location for
ClpMetadataFilterProvider.
44-44: Dependency injection properly implemented with standard patterns.The
ClpMetadataFilterProviderdependency is correctly:
- Declared as a final field
- Injected via constructor parameter
- Null-checked with appropriate error message
- Passed to the
ClpPlanOptimizerProviderThis follows established dependency injection patterns in the codebase.
Also applies to: 54-54, 62-62, 68-68
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (2)
16-16: Import correctly added for the new dependency.The import statement properly includes the
ClpMetadataFilterProviderfrom the restructured package location.
32-32: Dependency injection implemented consistently with existing pattern.The
ClpMetadataFilterProviderdependency is properly:
- Declared as a final field
- Accepted via constructor parameter
- Stored in the field
- Passed to the
ClpPlanOptimizerconstructorThe implementation follows the same pattern as the existing
functionManagerandfunctionResolutiondependencies in this class.Also applies to: 35-35, 39-39, 51-51
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java (4)
14-14: LGTM - Package structure follows new organization.The package change to
com.facebook.presto.plugin.clp.metadata.filteraligns with the new metadata filter provider structure.
36-36: LGTM - Class rename reflects focused scope.The rename to
TestClpMetadataFilterConfigCommonappropriately indicates this class tests common functionality shared across different metadata filter provider implementations.
43-45: LGTM - Resource file update aligns with new JSON structure.The change from
"test-metadata-filter.json"to"test-mysql-metadata-filter.json"reflects the new filter configuration format that nestsrangeMappinginsidemetadataDatabaseSpecific.
56-56: LGTM - Provider class updated consistently.All instances of
ClpMetadataFilterProviderhave been correctly replaced withClpMySqlMetadataFilterProvider, maintaining consistency with the new MySQL-specific implementation.Also applies to: 71-71, 86-86, 93-93
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java (3)
26-30: LGTM - Clear and well-documented purpose.The class documentation clearly explains its role in deserializing the
metadataDatabaseSpecificfield based on the configured implementation class.
36-39: LGTM - Constructor design supports polymorphic deserialization.The constructor accepts the specific implementation class, enabling flexible deserialization based on the metadata provider type configuration.
42-48: LGTM - Deserialize implementation is correct.The method correctly:
- Reads the JSON content as an ObjectNode
- Uses Jackson's
treeToValueto convert to the specified implementation class- Handles the polymorphic nature of the
MetadataDatabaseSpecificinterfacepresto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (3)
18-31: LGTM - Comprehensive and clear documentation.The JavaDoc provides excellent documentation for each field, explaining their purpose and the optional nature of some fields with default values.
34-41: LGTM - Appropriate data model design.The public fields with Jackson annotations follow the simple data holder pattern commonly used for configuration objects. The field types are appropriate for their intended use.
43-44: LGTM - Marker interface enables polymorphic design.The empty
MetadataDatabaseSpecificinterface serves as a type-safe marker for database-specific filter details, enabling the polymorphic deserialization pattern used throughout the system.presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java (2)
28-42: LGTM - Proper test setup follows established pattern.The setup method correctly loads the MySQL-specific filter configuration file and handles potential exceptions appropriately.
44-70: LGTM - Comprehensive test coverage for remapping logic.The test method thoroughly covers all comparison operators and verifies the correct remapping behaviour:
>and>=operators correctly map toend_timestamp<and<=operators correctly map tobegin_timestamp=operator correctly generates a range check with both boundsThe test assertions validate the MySQL-specific remapping logic for timestamp range queries.
presto-docs/src/main/sphinx/connector/clp.rst (3)
134-146: LGTM - Clear documentation of the new filter structure.The updated description accurately reflects the new filter configuration model:
- Clarifies that filters specify how data columns are used for querying, not just column mapping
- Properly documents all three fields with their purposes and optional nature
- Explains the extensible design through the
metadataDatabaseSpecificfield
151-157: LGTM - MySQL-specific section appropriately scoped.The renamed section clearly indicates this is MySQL-specific documentation, aligning with the extensible design where different database types can have different metadata-specific fields.
184-189: LGTM - JSON example matches new implementation structure.The updated JSON example correctly nests
rangeMappinginsidemetadataDatabaseSpecific, which matches the structure expected by the newClpMetadataFilterclass and the custom deserializer.
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
| public void checkContainsRequiredFilters(SchemaTableName schemaTableName, String metadataFilterPushDownExpression) | ||
| { | ||
| boolean hasRequiredMetadataFilterColumns = true; | ||
| ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder(); | ||
| for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) { | ||
| if (!metadataFilterPushDownExpression.contains(columnName)) { | ||
| hasRequiredMetadataFilterColumns = false; | ||
| notFoundListBuilder.add(columnName); | ||
| } | ||
| } | ||
| if (!hasRequiredMetadataFilterColumns) { | ||
| throw new PrestoException( | ||
| CLP_MANDATORY_METADATA_FILTER_NOT_VALID, | ||
| notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix unreliable column name matching and improve error message.
The current implementation has two issues:
- Using
contains()for column name matching can cause false positives (e.g., "timestamp" would match "begin_timestamp") - The error message grammar is incorrect when multiple columns are missing
Consider using a more reliable matching approach, such as parsing the SQL expression or using regex with word boundaries. For the immediate fix, improve the error message:
if (!hasRequiredMetadataFilterColumns) {
+ List<String> missingColumns = notFoundListBuilder.build();
+ String columnText = missingColumns.size() == 1 ? "column" : "columns";
+ String verbText = missingColumns.size() == 1 ? "is" : "are";
throw new PrestoException(
CLP_MANDATORY_METADATA_FILTER_NOT_VALID,
- notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid");
+ String.format("Required metadata filter %s %s %s not present in the filter expression",
+ columnText, missingColumns, verbText));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public void checkContainsRequiredFilters(SchemaTableName schemaTableName, String metadataFilterPushDownExpression) | |
| { | |
| boolean hasRequiredMetadataFilterColumns = true; | |
| ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder(); | |
| for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) { | |
| if (!metadataFilterPushDownExpression.contains(columnName)) { | |
| hasRequiredMetadataFilterColumns = false; | |
| notFoundListBuilder.add(columnName); | |
| } | |
| } | |
| if (!hasRequiredMetadataFilterColumns) { | |
| throw new PrestoException( | |
| CLP_MANDATORY_METADATA_FILTER_NOT_VALID, | |
| notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid"); | |
| } | |
| } | |
| public void checkContainsRequiredFilters(SchemaTableName schemaTableName, String metadataFilterPushDownExpression) | |
| { | |
| boolean hasRequiredMetadataFilterColumns = true; | |
| ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder(); | |
| for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) { | |
| if (!metadataFilterPushDownExpression.contains(columnName)) { | |
| hasRequiredMetadataFilterColumns = false; | |
| notFoundListBuilder.add(columnName); | |
| } | |
| } | |
| if (!hasRequiredMetadataFilterColumns) { | |
| List<String> missingColumns = notFoundListBuilder.build(); | |
| String columnText = missingColumns.size() == 1 ? "column" : "columns"; | |
| String verbText = missingColumns.size() == 1 ? "is" : "are"; | |
| throw new PrestoException( | |
| CLP_MANDATORY_METADATA_FILTER_NOT_VALID, | |
| String.format( | |
| "Required metadata filter %s %s %s not present in the filter expression", | |
| columnText, | |
| missingColumns, | |
| verbText)); | |
| } | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
around lines 107 to 122, the current use of String.contains() to check for
required column names can cause false positives by matching substrings within
other column names. Replace this with a more reliable matching method, such as
using regex with word boundaries to ensure exact column name matches.
Additionally, update the error message to correctly handle pluralization when
multiple columns are missing, ensuring proper grammar in the exception message.
...main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
...main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
| protected static class ClpMySqlMetadataDatabaseSpecific | ||
| implements MetadataDatabaseSpecific | ||
| { | ||
| @JsonProperty("rangeMapping") | ||
| public RangeMapping rangeMapping; | ||
|
|
||
| public static class RangeMapping | ||
| { | ||
| @JsonProperty("lowerBound") | ||
| public String lowerBound; | ||
|
|
||
| @JsonProperty("upperBound") | ||
| public String upperBound; | ||
|
|
||
| @Override | ||
| public boolean equals(Object o) | ||
| { | ||
| if (this == o) { | ||
| return true; | ||
| } | ||
| if (!(o instanceof RangeMapping)) { | ||
| return false; | ||
| } | ||
| RangeMapping that = (RangeMapping) o; | ||
| return Objects.equals(lowerBound, that.lowerBound) && | ||
| Objects.equals(upperBound, that.upperBound); | ||
| } | ||
|
|
||
| @Override | ||
| public int hashCode() | ||
| { | ||
| return Objects.hash(lowerBound, upperBound); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider adding field validation and toString method.
The inner classes use public fields and lack validation. While this might be acceptable for simple DTOs, consider adding validation and a toString method for better debugging.
public static class RangeMapping
{
- @JsonProperty("lowerBound")
- public String lowerBound;
+ private final String lowerBound;
+ private final String upperBound;
- @JsonProperty("upperBound")
- public String upperBound;
+ public RangeMapping(@JsonProperty("lowerBound") String lowerBound,
+ @JsonProperty("upperBound") String upperBound)
+ {
+ this.lowerBound = requireNonNull(lowerBound, "lowerBound is null");
+ this.upperBound = requireNonNull(upperBound, "upperBound is null");
+ }
+
+ public String getLowerBound()
+ {
+ return lowerBound;
+ }
+
+ public String getUpperBound()
+ {
+ return upperBound;
+ }
@Override
public boolean equals(Object o)
{
if (this == o) {
return true;
}
if (!(o instanceof RangeMapping)) {
return false;
}
RangeMapping that = (RangeMapping) o;
return Objects.equals(lowerBound, that.lowerBound) &&
Objects.equals(upperBound, that.upperBound);
}
@Override
public int hashCode()
{
return Objects.hash(lowerBound, upperBound);
}
+
+ @Override
+ public String toString()
+ {
+ return "RangeMapping{" +
+ "lowerBound='" + lowerBound + '\'' +
+ ", upperBound='" + upperBound + '\'' +
+ '}';
+ }
}Note: You'll also need to update the references to these fields in the remapMetadataFilterPushDown method to use the getters.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java
between lines 123 and 157, the inner classes use public fields without
validation or a toString method. Add private fields with appropriate getters and
setters that include validation logic to ensure data integrity. Implement a
toString method in each class to improve debugging output. Finally, update all
references to these fields, especially in the remapMetadataFilterPushDown
method, to use the new getter methods instead of direct field access.
gibber9809
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me, just a few small questions/suggestions for improvements.
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
...main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Devin Gibson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)
58-80: Fix error message grammar as identified in previous reviews.The error message in line 78 has grammatical issues that were flagged in previous reviews but remain unaddressed.
Apply this diff to fix the grammar:
- throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open."); + throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file");
107-122: Address unreliable column name matching flagged in previous reviews.The current implementation using
String.contains()for column name matching can cause false positives, and the error message has grammatical issues when multiple columns are missing. These issues were identified in previous reviews but remain unaddressed.The
contains()method will match substrings, potentially causing false positives (e.g., "timestamp" would match "begin_timestamp"). Additionally, the error message has grammatical issues with pluralization.Consider implementing a more precise matching approach and fix the error message:
if (!hasRequiredMetadataFilterColumns) { + List<String> missingColumns = notFoundListBuilder.build(); + String columnText = missingColumns.size() == 1 ? "column" : "columns"; + String verbText = missingColumns.size() == 1 ? "is" : "are"; throw new PrestoException( CLP_MANDATORY_METADATA_FILTER_NOT_VALID, - notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid"); + String.format("Required metadata filter %s %s %s not present in the filter expression", + columnText, missingColumns, verbText)); }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-186)presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
PrestoException(18-56)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
ClpConnectorFactory(32-70)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (1)
ClpMetadataFilter(32-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)
161-174: Good null-safe implementation.The helper methods correctly handle null filter lists by returning empty sets, which prevents
NullPointerExceptions and maintains consistent behaviour.
143-159: Consider bounds checking for scope splitting.The method splits the scope string and accesses array elements without bounds checking, which could lead to
ArrayIndexOutOfBoundsExceptionif the scope format is unexpected.Add bounds checking to make the method more robust:
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor) { String[] splitScope = scope.split("\\."); + if (splitScope.length == 0) { + return ImmutableSet.of(); + } + ImmutableSet.Builder<String> builder = ImmutableSet.builder(); builder.addAll(extractor.apply(filterMap.get(splitScope[0]))); - if (splitScope.length > 1) { + if (splitScope.length > 1 && splitScope.length >= 2) { builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1]))); } - if (splitScope.length == 3) { + if (splitScope.length >= 3) { builder.addAll(extractor.apply(filterMap.get(scope))); } return builder.build(); }⛔ Skipped due to learnings
Learnt from: anlowee PR: y-scope/presto#0 File: :0-0 Timestamp: 2025-06-19T08:41:46.288Z Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
|
@gibber9809 I fixed the regex to let it also match the negative numbers, decimals and scientific numbers. I also added the unit tests for them. |
.../src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
gibber9809
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one small change suggestion for a docstring, but besides that LGTM. Might also be worth adopting some of rabbit's suggestions for improving the docs, but not critical.
Will approve once the suggested change for the docstring is in.
…ta/filter/ClpMetadataFilterProvider.java Co-authored-by: Devin Gibson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)
51-52: Update documentation to be more specific about database customization.The current documentation mentions "different fields" but doesn't clearly convey that customization happens within the
metadataDatabaseSpecificfield.- * For different metadata databases filter configs can include different fields. + * Different metadata databases can customize filter configurations through the + * {@code metadataDatabaseSpecific} field within each {@link ClpMetadataFilter}.
107-122: Fix unreliable column name matching and improve error message.The current implementation still has the issues identified in previous reviews:
- Using
contains()for column name matching can cause false positives (e.g., "timestamp" would match "begin_timestamp")- The error message grammar is incorrect when multiple columns are missing
Consider using a more reliable matching approach, such as parsing the SQL expression or using regex with word boundaries. For the immediate fix, improve the error message:
if (!hasRequiredMetadataFilterColumns) { + List<String> missingColumns = notFoundListBuilder.build(); + String columnText = missingColumns.size() == 1 ? "column" : "columns"; + String verbText = missingColumns.size() == 1 ? "is" : "are"; throw new PrestoException( CLP_MANDATORY_METADATA_FILTER_NOT_VALID, - notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid"); + String.format("Required metadata filter %s %s %s not present in the filter expression", + columnText, missingColumns, verbText)); }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)
Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
🔇 Additional comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)
78-78: Good fix on the error message grammar and exception chaining.The error message grammar has been corrected from the previous awkward phrasing, and the IOException is properly chained as the cause, which will provide valuable debugging information for configuration file issues.
131-131: Good fix on the documentation grammar.The grammatical error has been corrected from "recommend to implementing" to "recommend implementing".
| private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor) | ||
| { | ||
| String[] splitScope = scope.split("\\."); | ||
| ImmutableSet.Builder<String> builder = ImmutableSet.builder(); | ||
|
|
||
| builder.addAll(extractor.apply(filterMap.get(splitScope[0]))); | ||
|
|
||
| if (splitScope.length > 1) { | ||
| builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1]))); | ||
| } | ||
|
|
||
| if (splitScope.length == 3) { | ||
| builder.addAll(extractor.apply(filterMap.get(scope))); | ||
| } | ||
|
|
||
| return builder.build(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider adding null safety checks for scope operations.
The method splits the scope string and accesses array elements without validating the input format or handling potential null/empty values from the filterMap.
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor)
{
+ if (scope == null || scope.isEmpty()) {
+ return ImmutableSet.of();
+ }
+
String[] splitScope = scope.split("\\.");
+ if (splitScope.length == 0) {
+ return ImmutableSet.of();
+ }
+
ImmutableSet.Builder<String> builder = ImmutableSet.builder();
- builder.addAll(extractor.apply(filterMap.get(splitScope[0])));
+ List<ClpMetadataFilter> catalogFilters = filterMap.get(splitScope[0]);
+ if (catalogFilters != null) {
+ builder.addAll(extractor.apply(catalogFilters));
+ }
if (splitScope.length > 1) {
- builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
+ List<ClpMetadataFilter> schemaFilters = filterMap.get(splitScope[0] + "." + splitScope[1]);
+ if (schemaFilters != null) {
+ builder.addAll(extractor.apply(schemaFilters));
+ }
}
if (splitScope.length == 3) {
- builder.addAll(extractor.apply(filterMap.get(scope)));
+ List<ClpMetadataFilter> tableFilters = filterMap.get(scope);
+ if (tableFilters != null) {
+ builder.addAll(extractor.apply(tableFilters));
+ }
}
return builder.build();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor) | |
| { | |
| String[] splitScope = scope.split("\\."); | |
| ImmutableSet.Builder<String> builder = ImmutableSet.builder(); | |
| builder.addAll(extractor.apply(filterMap.get(splitScope[0]))); | |
| if (splitScope.length > 1) { | |
| builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1]))); | |
| } | |
| if (splitScope.length == 3) { | |
| builder.addAll(extractor.apply(filterMap.get(scope))); | |
| } | |
| return builder.build(); | |
| } | |
| private Set<String> collectColumnNamesFromScopes( | |
| String scope, | |
| Function<List<ClpMetadataFilter>, Set<String>> extractor) | |
| { | |
| if (scope == null || scope.isEmpty()) { | |
| return ImmutableSet.of(); | |
| } | |
| String[] splitScope = scope.split("\\."); | |
| if (splitScope.length == 0) { | |
| return ImmutableSet.of(); | |
| } | |
| ImmutableSet.Builder<String> builder = ImmutableSet.builder(); | |
| List<ClpMetadataFilter> catalogFilters = filterMap.get(splitScope[0]); | |
| if (catalogFilters != null) { | |
| builder.addAll(extractor.apply(catalogFilters)); | |
| } | |
| if (splitScope.length > 1) { | |
| List<ClpMetadataFilter> schemaFilters = filterMap.get(splitScope[0] + "." + splitScope[1]); | |
| if (schemaFilters != null) { | |
| builder.addAll(extractor.apply(schemaFilters)); | |
| } | |
| } | |
| if (splitScope.length == 3) { | |
| List<ClpMetadataFilter> tableFilters = filterMap.get(scope); | |
| if (tableFilters != null) { | |
| builder.addAll(extractor.apply(tableFilters)); | |
| } | |
| } | |
| return builder.build(); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
around lines 143 to 159, the method collectColumnNamesFromScopes splits the
scope string and accesses elements without checking if the scope is null or
empty, and also does not verify if filterMap.get(...) returns null before
applying the extractor. Add null and empty checks for the scope string before
splitting, and add null checks for the results of filterMap.get(...) before
passing them to the extractor function to avoid potential NullPointerExceptions.
…resto into xwei/refactor-metadatafilter-1
|
@gibber9809 Addressed your comment and some of coderabbitai's comments. |
d3a8ff3 to
9289bd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🔭 Outside diff range comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
16-33: Fix Checkstyle import-order violations to unblock the buildThe pipeline flagged import order issues. Reorder regular and static imports and add the missing static import used below if you adopt the fail-fast suggestion. Example fix:
@@ -import com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterProvider; -import com.facebook.presto.plugin.clp.split.filter.ClpMySqlSplitFilterProvider; -import com.facebook.presto.plugin.clp.split.ClpMySqlSplitProvider; -import com.facebook.presto.plugin.clp.split.ClpSplitProvider; +import com.facebook.presto.plugin.clp.split.ClpMySqlSplitProvider; +import com.facebook.presto.plugin.clp.split.ClpSplitProvider; +import com.facebook.presto.plugin.clp.split.filter.ClpMySqlSplitFilterProvider; +import com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterProvider; @@ -import static com.facebook.presto.plugin.clp.ClpConfig.SplitFilterProviderType; -import static com.facebook.presto.plugin.clp.ClpConfig.MetadataProviderType; -import static com.facebook.presto.plugin.clp.ClpConfig.SplitProviderType; +import static com.facebook.presto.plugin.clp.ClpConfig.MetadataProviderType; +import static com.facebook.presto.plugin.clp.ClpConfig.SplitFilterProviderType; +import static com.facebook.presto.plugin.clp.ClpConfig.SplitProviderType; +import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_UNSUPPORTED_CONFIG_OPTION; import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_UNSUPPORTED_METADATA_SOURCE; import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_UNSUPPORTED_SPLIT_SOURCE;presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
124-131: Log concrete strings instead of Optional wrappersThese debug logs currently print Optional[…]. Log the unwrapped strings for clarity.
- log.debug("Metadata SQL query: %s", metadataSqlQuery); + log.debug("Metadata SQL query: %s", metadataSqlQuery.get()); @@ - log.debug("KQL query: %s", kqlQuery); + log.debug("KQL query: %s", kqlQuery.get());presto-docs/src/main/sphinx/connector/clp.rst (1)
27-36: Fix property name: use clp.split-filter-config instead of clp.metadata-filter-configThe example still shows the old key, which will mislead users.
- clp.metadata-filter-config=/path/to/metadata-filter-config.json + clp.split-filter-config=/path/to/split-filter-config.json
♻️ Duplicate comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
48-50: Fail fast on unsupported split-filter provider type and align comparison styleAdd an else branch like the other provider bindings and keep comparison style consistent with the rest of this class.
Apply:
- if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) { + if (config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL) { binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON); } + else { + throw new PrestoException( + CLP_UNSUPPORTED_CONFIG_OPTION, + "Unsupported split filter provider type: " + config.getSplitFilterProviderType()); + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2)
66-71: Required-filter validation misses tables without FilterNodes (joins); track per-table filters and validate the remainderUsing a single hasVisitedFilter flag suppresses validation for all tables if any FilterNode is seen. In join plans where only some table scans have a FilterNode, required-filter checks are skipped for other tables. Track the set of tables that actually had a FilterNode and validate only the remainder after rewriting.
Apply these diffs:
- Validate only unfiltered table scopes in optimize(...):
- // Throw exception if any required split filters are missing - if (!rewriter.tableScopeSet.isEmpty() && !rewriter.hasVisitedFilter) { - splitFilterProvider.checkContainsRequiredFilters(rewriter.tableScopeSet, ""); - } + // Validate tables that did not have a FilterNode associated with them + if (!rewriter.tableScopeSet.isEmpty()) { + Set<String> unfilteredScopes = new HashSet<>(rewriter.tableScopeSet); + unfilteredScopes.removeAll(rewriter.tableScopesWithFilter); + if (!unfilteredScopes.isEmpty()) { + splitFilterProvider.checkContainsRequiredFilters(unfilteredScopes, ""); + } + }
- Track table scopes that actually had a FilterNode:
private class Rewriter extends ConnectorPlanRewriter<Void> { private final PlanNodeIdAllocator idAllocator; private final Set<String> tableScopeSet; + private final Set<String> tableScopesWithFilter; private boolean hasVisitedFilter; public Rewriter(PlanNodeIdAllocator idAllocator) { this.idAllocator = idAllocator; hasVisitedFilter = false; tableScopeSet = new HashSet<>(); + tableScopesWithFilter = new HashSet<>(); }
- Unify scope formatting and record scopes with filters in visitFilter(...):
- String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString(); + String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName()); + tableScopesWithFilter.add(tableScope);Note: with the above change, hasVisitedFilter becomes redundant; you can remove it in a follow-up cleanup to simplify state.
Also applies to: 76-86, 102-114
108-114: Use consistent scope formatting helperMinor consistency fix: use format("%s.%s", ...) here as elsewhere to avoid subtle string differences and future drift.
- String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString(); + String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName());presto-docs/src/main/sphinx/connector/clp.rst (4)
141-143: Minor grammar: remove superfluous commaReadability polish.
-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent +For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent
219-221: Singular “implementation” and remove extra commaConsistency and grammar.
-If you prefer to use a different format for the ``customOptions``, you can provide your own implementations of the -``ClpSplitFilterProvider`` interface, and configure the connector accordingly. +If you prefer to use a different format for the ``customOptions``, you can provide your own implementation of the +``ClpSplitFilterProvider`` interface and configure the connector accordingly.
151-154: Clarify enforcement even when no FilterNode is producedReflects the behavioural change noted in the PR narrative.
- query. If a required filter is missing or cannot be added to the split filter query, the original query will be - rejected. + query. If a required filter is missing or cannot be added to the split filter query, the original query will be + rejected. This requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ... LIMIT 1``).
50-57: Use Presto type names and fix grammarAlign with Presto docs and remove the article mismatch.
- When enabled, type annotations are added to conflicting field names to - distinguish between types. For example, if an ``id`` column appears with - both an ``int`` and ``string`` types, the connector will create two - columns named ``id_bigint`` and ``id_varchar``. - - Supported type annotations include ``bigint``, ``varchar``, ``double``, - ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For + When enabled, type annotations are added to conflicting field names to + distinguish between types. For example, if an ``id`` column appears as + both ``BIGINT`` and ``VARCHAR``, the connector will create two columns + named ``id_bigint`` and ``id_varchar``. + + Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``, + ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (see `Data Types`_ for details). For columns with only one type, the original column name is used.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (14)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(4 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpCustomSplitFilterOptionsDeserializer.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java(4 hunks)presto-clp/src/test/resources/test-mysql-split-filter.json(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst(9 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
🧬 Code Graph Analysis (9)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(51-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
PrestoException(18-56)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpCustomSplitFilterOptionsDeserializer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(51-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(34-158)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(51-175)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1)
Test(37-97)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(51-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
ClpConnectorFactory(32-70)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
ClpConnectorFactory(32-70)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (1)
Test(29-87)
🪛 GitHub Actions: maven checks
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
[error] 20-20: Checkstyle: ImportOrder violation. Wrong order for import 'com.facebook.presto.plugin.clp.split.filter.ClpMySqlSplitFilterProvider'. Step: Maven Checkstyle plugin (checkstyle) for module presto-clp.
[error] 21-21: Checkstyle: ImportOrder violation. Wrong order for import 'com.facebook.presto.plugin.clp.split.ClpMySqlSplitProvider'. Step: Maven Checkstyle plugin (checkstyle) for module presto-clp.
[error] 29-29: Checkstyle: ImportOrder violation. Wrong order for import 'com.facebook.presto.plugin.clp.ClpConfig.MetadataProviderType'. Step: Maven Checkstyle plugin (checkstyle) for module presto-clp.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (8)
presto-clp/src/test/resources/test-mysql-split-filter.json (1)
15-19: Config nesting under customOptions looks correct for provider-driven deserializationMoving rangeMapping under customOptions aligns with ClpSplitFilterConfig and the provider-specific options class. This should deserialize via the new CustomSplitFilterOptions deserializer as intended.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
16-52: Wiring change to ClpSplitFilterProvider looks correctConstructor injection and the physical plan optimizer wiring correctly pass the new provider through to ClpPlanOptimizer.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
18-18: Wiring to ClpSplitFilterProvider looks correctConstructor injection, null-checks, and plan optimiser provider hand-off are consistent with the new provider. No functional concerns here.
Also applies to: 44-45, 54-55, 62-63, 68-69
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
36-39: Confirm migration docs and deprecation path for old metadata-filter configsThis introduces new keys (split-filter-*) and removes/renames the old metadata-filter keys. Ensure docs, examples, and release notes clearly call out the migration path and failure modes when old keys are present.
Also applies to: 154-158, 195-198
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (1)
52-63: Good coverage for numeric variants (ints, decimals, scientific), including negativesThe remapping assertions exercise operator combinations and numeric formats well. Looks solid.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
143-159: Scope collector is clean; null-safety handled downstreamUse of extractor.apply on possibly-null lists is safe given the extractor methods return empty sets on null. Good readability and behaviour across catalog/schema/table levels.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (2)
82-96: Good coverage of empty and invalid config pathsConstructing the provider with no config and with an invalid path covers the error path (CLP_SPLIT_FILTER_CONFIG_NOT_FOUND). Assertions are precise.
53-66: Test validates required filters on the original (pre-remap) expression — consider adding a post-remap assertion if you change validationConfirmed: the existing test (checkRequiredFilters) exercises presence checking against the original/pre-remap expression. ClpPlanOptimizer calls checkContainsRequiredFilters before remapping, while remapSplitFilterPushDownExpression performs the transform.
Files to note:
- presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java — method checkRequiredFilters
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java — calls checkContainsRequiredFilters(...) before remap in visitFilter()
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java — remapSplitFilterPushDownExpression(...)
Optional suggestion (for future-proofing if you move validation to post-remap):
Add an assertion that the remapped SQL would also be accepted (only add if you change checkContainsRequiredFilters to validate post-remap):String remapped = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", "("msg.timestamp" > 1234 AND "msg.timestamp" < 5678)");
filterProvider.checkContainsRequiredFilters(testTableScopeSet, remapped); // add only if validation is performed after remapNote: with the current behaviour this assertion would fail (required column names are checked against the pre-remap names).
| @Config("clp.split-filter-config") | ||
| public ClpConfig setSplitFilterConfig(String splitFilterConfig) | ||
| { | ||
| this.splitFilterConfig = splitFilterConfig; | ||
| return this; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add @ConfigDescription to document the new config key
Annotate the setter to improve self-documentation and config help output.
- @Config("clp.split-filter-config")
+ @Config("clp.split-filter-config")
+ @com.facebook.airlift.configuration.ConfigDescription("Path to the split-filter configuration JSON file")
public ClpConfig setSplitFilterConfig(String splitFilterConfig)If you prefer an explicit import, add:
import com.facebook.airlift.configuration.ConfigDescription;🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java around
lines 159 to 164, the setter for splitFilterConfig is missing a
@ConfigDescription annotation; add @ConfigDescription("description of what
clp.split-filter-config controls") above the setSplitFilterConfig method (and
add the import com.facebook.airlift.configuration.ConfigDescription if not
already present) so the config key is documented in help output and
self-documentation.
| @Config("clp.split-filter-provider-type") | ||
| public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType) | ||
| { | ||
| this.splitFilterProviderType = splitFilterProviderType; | ||
| return this; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add @ConfigDescription for provider type
Clarify what values are accepted and what it controls.
- @Config("clp.split-filter-provider-type")
+ @Config("clp.split-filter-provider-type")
+ @com.facebook.airlift.configuration.ConfigDescription("Split-filter provider type (e.g., MYSQL)")
public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Config("clp.split-filter-provider-type") | |
| public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType) | |
| { | |
| this.splitFilterProviderType = splitFilterProviderType; | |
| return this; | |
| } | |
| @Config("clp.split-filter-provider-type") | |
| @com.facebook.airlift.configuration.ConfigDescription("Split-filter provider type (e.g., MYSQL)") | |
| public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType) | |
| { | |
| this.splitFilterProviderType = splitFilterProviderType; | |
| return this; | |
| } |
| CLP_SPLIT_FILTER_CONFIG_NOT_FOUND(10, USER_ERROR), | ||
| CLP_MANDATORY_SPLIT_FILTER_NOT_VALID(11, USER_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Renamed error codes align with split-filter refactor; verify all call sites updated
Names now match the split-filter terminology and keep the same numeric codes and types. Please ensure all throw sites and references have been migrated.
Run this quick search to catch any lingering references to the old constants or method names:
🏁 Script executed:
#!/bin/bash
# Find references to old metadata-filter error codes or method names
rg -n --fixed-strings \
'CLP_METADATA_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_METADATA_FILTER_NOT_VALID|remapFilterSql'Length of output: 117
🏁 Script executed:
#!/bin/bash
echo "Search old constants and remapFilterSql..."
rg -n --hidden --no-ignore-vcs -S 'CLP_METADATA_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_METADATA_FILTER_NOT_VALID|remapFilterSql' || echo "[no matches]"
echo "Search new constants (split)..."
rg -n --hidden --no-ignore-vcs -S 'CLP_SPLIT_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_SPLIT_FILTER_NOT_VALID' || echo "[no matches]"
echo "Search generic split/metadata/filter strings..."
rg -n --hidden --no-ignore-vcs -S 'splitFilter|SplitFilter|split-filter|SPLIT_FILTER|METADATA_FILTER|metadata-filter' || echo "[no matches]"
echo "Search remapFilter*..."
rg -n --hidden --no-ignore-vcs -S 'remapFilter|remap.*Filter' || echo "[no matches]"
echo "Search ClpErrorCode usages..."
rg -n --hidden --no-ignore-vcs -S 'ClpErrorCode' || echo "[no matches]"
echo "Print ClpErrorCode.java (if present)..."
if [ -f presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java ]; then
sed -n '1,240p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
else
echo "[file not found: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java]"
fiLength of output: 300579
Split-filter rename verified — code updated; docs contain lingering metadata-filter-config
Checked the repository: old constants/method names (CLP_METADATA_FILTER_CONFIG_NOT_FOUND, CLP_MANDATORY_METADATA_FILTER_NOT_VALID, remapFilterSql) are not present; the new split-filter constants and methods are used.
Verified usages (non-exhaustive):
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java — CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, CLP_MANDATORY_SPLIT_FILTER_NOT_VALID
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java — imports and throws using the new CLP_SPLIT_* codes
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java — remapSplitFilterPushDownExpression implementation
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java — calls splitFilterProvider.remapSplitFilterPushDownExpression(...)
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java — clp.split-filter-config property and SplitFilterProviderType
- Tests: presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig*.java — exercise remapSplitFilterPushDownExpression
Needs attention:
- presto-docs/src/main/sphinx/connector/clp.rst: line ~33 still references clp.metadata-filter-config=/path/to/metadata-filter-config.json — update documentation to use the new clp.split-filter-config (or otherwise reconcile the naming) so docs match the refactor.
🤖 Prompt for AI Agents
In presto-docs/src/main/sphinx/connector/clp.rst around line ~33, the docs still
reference the old property name clp.metadata-filter-config; update all
occurrences to the new property clp.split-filter-config (including example usage
clp.split-filter-config=/path/to/split-filter-config.json), update any
explanatory text to use "split filter" terminology (and the corresponding file
name in examples), and run a quick repo-wide search/update for any remaining
"metadata-filter-config" references in the docs to ensure consistency with the
refactor.
| import com.fasterxml.jackson.core.JsonParser; | ||
| import com.fasterxml.jackson.databind.DeserializationContext; | ||
| import com.fasterxml.jackson.databind.JsonDeserializer; | ||
| import com.fasterxml.jackson.databind.ObjectMapper; | ||
| import com.fasterxml.jackson.databind.node.ObjectNode; | ||
|
|
||
| import java.io.IOException; | ||
|
|
||
| import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions; | ||
|
|
||
| /** | ||
| * Uses the given implementation of {@link CustomSplitFilterOptions} to deserialize the | ||
| * {@code "customOptions"} field in the filter. The implementation is originally chosen | ||
| * by the {@code clp.split-filter-provider-type} config option. | ||
| */ | ||
| public class ClpCustomSplitFilterOptionsDeserializer | ||
| extends JsonDeserializer<CustomSplitFilterOptions> | ||
| { | ||
| private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass; | ||
|
|
||
| public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass) | ||
| { | ||
| this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass; | ||
| } | ||
|
|
||
| @Override | ||
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | ||
| { | ||
| ObjectNode node = p.getCodec().readTree(p); | ||
| ObjectMapper mapper = (ObjectMapper) p.getCodec(); | ||
|
|
||
| return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle null values and avoid narrowing casts in deserializer
Make deserialization resilient to null customOptions and avoid assuming the codec is an ObjectMapper. Also validate the constructor argument.
@@
-import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.core.ObjectCodec;
@@
-import com.fasterxml.jackson.databind.JsonDeserializer;
-import com.fasterxml.jackson.databind.ObjectMapper;
-import com.fasterxml.jackson.databind.node.ObjectNode;
+import com.fasterxml.jackson.databind.JsonDeserializer;
+import com.fasterxml.jackson.databind.JsonNode;
@@
import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
+import static java.util.Objects.requireNonNull;
@@
public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
- this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+ this.actualCustomSplitFilterOptionsClass = requireNonNull(
+ actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
}
@@
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
- ObjectNode node = p.getCodec().readTree(p);
- ObjectMapper mapper = (ObjectMapper) p.getCodec();
-
- return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
+ if (p.getCurrentToken() == JsonToken.VALUE_NULL) {
+ return null;
+ }
+ JsonNode node = p.getCodec().readTree(p);
+ ObjectCodec codec = p.getCodec();
+ return codec.treeToValue(node, actualCustomSplitFilterOptionsClass);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import com.fasterxml.jackson.core.JsonParser; | |
| import com.fasterxml.jackson.databind.DeserializationContext; | |
| import com.fasterxml.jackson.databind.JsonDeserializer; | |
| import com.fasterxml.jackson.databind.ObjectMapper; | |
| import com.fasterxml.jackson.databind.node.ObjectNode; | |
| import java.io.IOException; | |
| import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions; | |
| /** | |
| * Uses the given implementation of {@link CustomSplitFilterOptions} to deserialize the | |
| * {@code "customOptions"} field in the filter. The implementation is originally chosen | |
| * by the {@code clp.split-filter-provider-type} config option. | |
| */ | |
| public class ClpCustomSplitFilterOptionsDeserializer | |
| extends JsonDeserializer<CustomSplitFilterOptions> | |
| { | |
| private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass; | |
| public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass) | |
| { | |
| this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass; | |
| } | |
| @Override | |
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | |
| { | |
| ObjectNode node = p.getCodec().readTree(p); | |
| ObjectMapper mapper = (ObjectMapper) p.getCodec(); | |
| return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass); | |
| } | |
| } | |
| import com.fasterxml.jackson.core.JsonParser; | |
| import com.fasterxml.jackson.core.JsonToken; | |
| import com.fasterxml.jackson.core.ObjectCodec; | |
| import com.fasterxml.jackson.databind.DeserializationContext; | |
| import com.fasterxml.jackson.databind.JsonDeserializer; | |
| import com.fasterxml.jackson.databind.JsonNode; | |
| import java.io.IOException; | |
| import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions; | |
| import static java.util.Objects.requireNonNull; | |
| /** | |
| * Uses the given implementation of {@link CustomSplitFilterOptions} to deserialize the | |
| * {@code "customOptions"} field in the filter. The implementation is originally chosen | |
| * by the {@code clp.split-filter-provider-type} config option. | |
| */ | |
| public class ClpCustomSplitFilterOptionsDeserializer | |
| extends JsonDeserializer<CustomSplitFilterOptions> | |
| { | |
| private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass; | |
| public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass) | |
| { | |
| this.actualCustomSplitFilterOptionsClass = requireNonNull( | |
| actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null"); | |
| } | |
| @Override | |
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | |
| { | |
| if (p.getCurrentToken() == JsonToken.VALUE_NULL) { | |
| return null; | |
| } | |
| JsonNode node = p.getCodec().readTree(p); | |
| ObjectCodec codec = p.getCodec(); | |
| return codec.treeToValue(node, actualCustomSplitFilterOptionsClass); | |
| } | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpCustomSplitFilterOptionsDeserializer.java
around lines 16-49, validate that the constructor argument
actualCustomSplitFilterOptionsClass is non-null, avoid the unsafe cast to
ObjectMapper by using the JsonParser's ObjectCodec, and make deserialization
handle null customOptions: obtain ObjectCodec codec = p.getCodec(); read the
tree into a JsonNode, return null immediately if the node is null or represents
JSON null, and call codec.treeToValue(node, actualCustomSplitFilterOptionsClass)
to produce the result; also throw an informative IllegalArgumentException in the
constructor if the provided class is null.
| Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> mappings = new HashMap<>(getAllMappingsFromFilters(filterMap.get(splitScope[0]))); | ||
|
|
||
| if (1 < splitScope.length) { | ||
| mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1]))); | ||
| } | ||
|
|
||
| if (3 == splitScope.length) { | ||
| mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Scope aggregation logic is fine; minor style tweak optional
Current logic correctly layers catalog → schema → table mappings. You can simplify the conditions slightly for readability, but this is optional.
- if (1 < splitScope.length) {
+ if (splitScope.length > 1) {
mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1])));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> mappings = new HashMap<>(getAllMappingsFromFilters(filterMap.get(splitScope[0]))); | |
| if (1 < splitScope.length) { | |
| mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1]))); | |
| } | |
| if (3 == splitScope.length) { | |
| mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope))); | |
| } | |
| Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> mappings = new HashMap<>(getAllMappingsFromFilters(filterMap.get(splitScope[0]))); | |
| if (splitScope.length > 1) { | |
| mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1]))); | |
| } | |
| if (3 == splitScope.length) { | |
| mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope))); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java
around lines 63 to 71, the existing numeric comparisons (1 < splitScope.length
and 3 == splitScope.length) are correct but slightly unconventional; change them
to the clearer forms splitScope.length >= 2 and splitScope.length == 3, and keep
the same mapping.putAll calls so the catalog→schema→table layering remains
unchanged for readability and consistency.
| public class ClpSplitFilterConfig | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Harden JSON parsing against future/unknown fields
Add JsonIgnoreProperties(ignoreUnknown = true) to make the config resilient to forward-compatible changes in the JSON.
Apply this diff:
-public class ClpSplitFilterConfig
+@com.fasterxml.jackson.annotation.JsonIgnoreProperties(ignoreUnknown = true)
+public class ClpSplitFilterConfigIf you prefer explicit imports, add this import at the top of the file:
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java
around lines 32-33, the class lacks a JsonIgnoreProperties(ignoreUnknown = true)
annotation which would make JSON deserialization resilient to added/unknown
fields; add the annotation to the class declaration (and the import
com.fasterxml.jackson.annotation.JsonIgnoreProperties if explicit imports are
used) so Jackson will ignore unknown properties during parsing.
| @JsonProperty("columnName") | ||
| public String columnName; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Make columnName explicitly required in JSON
Column name is fundamental for remapping and validation. Mark it as required to fail fast on malformed configs.
- @JsonProperty("columnName")
+ @JsonProperty(value = "columnName", required = true)
public String columnName;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @JsonProperty("columnName") | |
| public String columnName; | |
| @JsonProperty(value = "columnName", required = true) | |
| public String columnName; |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java
around lines 34-36, mark the columnName JSON property as required by changing
the @JsonProperty annotation to include required = true and add a null
check/validation so deserialization fails fast (e.g., throw
IllegalArgumentException or use Objects.requireNonNull in the constructor or a
@PostConstruct/validation method); this ensures malformed configs without
columnName are rejected immediately.
...to-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
Show resolved
Hide resolved
| public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression) | ||
| { | ||
| boolean hasRequiredSplitFilterColumns = true; | ||
| ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder(); | ||
| for (String tableScope : tableScopeSet) { | ||
| for (String columnName : getRequiredColumnNames(tableScope)) { | ||
| if (!splitFilterPushDownExpression.contains(columnName)) { | ||
| hasRequiredSplitFilterColumns = false; | ||
| notFoundListBuilder.add(columnName); | ||
| } | ||
| } | ||
| } | ||
| if (!hasRequiredSplitFilterColumns) { | ||
| throw new PrestoException( | ||
| CLP_MANDATORY_SPLIT_FILTER_NOT_VALID, | ||
| notFoundListBuilder.build() + " is a mandatory split filter column but not valid"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Required-filter detection via String.contains is error-prone (false positives/negatives)
Using contains(columnName) can:
- false-positive: "id" matches in "user_id"
- false-negative: different quoting/casing or dereferencing variants
Also, this NPEs if splitFilterPushDownExpression is null. Switch to a robust, null-safe check using regex with quoting and deduplicate missing columns for a clearer error.
- public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
+ public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
{
- boolean hasRequiredSplitFilterColumns = true;
- ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
+ boolean hasRequiredSplitFilterColumns = true;
+ // Deduplicate to avoid repeated column names across multiple tables/scopes
+ ImmutableSet.Builder<String> notFoundSetBuilder = ImmutableSet.builder();
+ String expr = (splitFilterPushDownExpression == null) ? "" : splitFilterPushDownExpression;
for (String tableScope : tableScopeSet) {
for (String columnName : getRequiredColumnNames(tableScope)) {
- if (!splitFilterPushDownExpression.contains(columnName)) {
+ if (!containsRequiredColumn(expr, columnName)) {
hasRequiredSplitFilterColumns = false;
- notFoundListBuilder.add(columnName);
+ notFoundSetBuilder.add(columnName);
}
}
}
if (!hasRequiredSplitFilterColumns) {
throw new PrestoException(
CLP_MANDATORY_SPLIT_FILTER_NOT_VALID,
- notFoundListBuilder.build() + " is a mandatory split filter column but not valid");
+ "Required split filter columns missing from predicate: " + notFoundSetBuilder.build());
}
}Add this helper within the same class (outside the selected range):
// Performs a strict match on a quoted identifier to avoid substring false-positives.
// Example match: "\"msg.timestamp\"" in the expression.
private static boolean containsRequiredColumn(String expression, String columnName)
{
// Match the exact quoted identifier; Pattern.quote handles dots and other meta chars.
String regex = "\\Q\"" + columnName + "\"\\E";
return java.util.regex.Pattern.compile(regex).matcher(expression).find();
}| private <T> void testRange(T lowerBound, T upperBound, ClpMySqlSplitFilterProvider filterProvider) | ||
| { | ||
| String splitFilterSql1 = format("(\"msg.timestamp\" > %s AND \"msg.timestamp\" < %s)", lowerBound, upperBound); | ||
| String remappedSql1 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql1); | ||
| assertEquals(remappedSql1, format("(end_timestamp > %s AND begin_timestamp < %s)", lowerBound, upperBound)); | ||
|
|
||
| String splitFilterSql2 = format("(\"msg.timestamp\" >= %s AND \"msg.timestamp\" <= %s)", lowerBound, upperBound); | ||
| String remappedSql2 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql2); | ||
| assertEquals(remappedSql2, format("(end_timestamp >= %s AND begin_timestamp <= %s)", lowerBound, upperBound)); | ||
|
|
||
| String splitFilterSql3 = format("(\"msg.timestamp\" > %s AND \"msg.timestamp\" <= %s)", lowerBound, upperBound); | ||
| String remappedSql3 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql3); | ||
| assertEquals(remappedSql3, format("(end_timestamp > %s AND begin_timestamp <= %s)", lowerBound, upperBound)); | ||
|
|
||
| String splitFilterSql4 = format("(\"msg.timestamp\" >= %s AND \"msg.timestamp\" < %s)", lowerBound, upperBound); | ||
| String remappedSql4 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql4); | ||
| assertEquals(remappedSql4, format("(end_timestamp >= %s AND begin_timestamp < %s)", lowerBound, upperBound)); | ||
|
|
||
| String splitFilterSql5 = format("(\"msg.timestamp\" = %s)", lowerBound); | ||
| String remappedSql5 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql5); | ||
| assertEquals(remappedSql5, format("((begin_timestamp <= %s AND end_timestamp >= %s))", lowerBound, lowerBound)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Optional: Parameterise to reduce duplication and add edge-case scenarios
Consider a TestNG DataProvider to drive the operator combinations and add a couple of edge cases (e.g., whitespace around operators, mixed-case column quoting), keeping assertions concise.
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java
around lines 65 to 86, the testRange method repeats nearly identical assertions
for multiple operator combinations; convert these into a TestNG data-driven test
by adding a @DataProvider that yields tuples (inputSqlTemplate,
expectedSqlTemplate, lowerBound, upperBound) including edge-case rows for
whitespace around operators and mixed-case/quoted column names, then change the
test method to accept parameters and annotate it with @Test(dataProvider =
"yourProviderName"); inside the test simply call
filterProvider.remapSplitFilterPushDownExpression and assertEquals against the
formatted expected string to eliminate duplication and keep assertions concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
52-64: Nit: use consistent enum comparison style throughout the classAbove (Lines 48–50) you use the constant-first style (
Type.X == value). For consistency, align the two conditions below to the same style.Apply this diff:
- if (config.getMetadataProviderType() == MetadataProviderType.MYSQL) { + if (MetadataProviderType.MYSQL == config.getMetadataProviderType()) { binder.bind(ClpMetadataProvider.class).to(ClpMySqlMetadataProvider.class).in(Scopes.SINGLETON); } else { throw new PrestoException(CLP_UNSUPPORTED_METADATA_SOURCE, "Unsupported metadata provider type: " + config.getMetadataProviderType()); } - if (config.getSplitProviderType() == SplitProviderType.MYSQL) { + if (SplitProviderType.MYSQL == config.getSplitProviderType()) { binder.bind(ClpSplitProvider.class).to(ClpMySqlSplitProvider.class).in(Scopes.SINGLETON); } else { throw new PrestoException(CLP_UNSUPPORTED_SPLIT_SOURCE, "Unsupported split provider type: " + config.getSplitProviderType()); }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(34-158)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(51-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)
21-23: Imports for split-filter provider look correctThe new imports align with the refactor and are used below for conditional binding.
29-29: Static import for SplitFilterProviderType is appropriateUsed directly in the new conditional; no issues.
|
Finished the E2E testing again, and should be ready for review.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
presto-docs/src/main/sphinx/connector/clp.rst (1)
102-105: Unify dash style and remove superfluous comma.Replace double hyphens with an em dash for readability and remove the extra comma before “and configure”.
Apply this diff:
-If you prefer to use a different source--or the same source with a custom implementation--you can provide your own -implementations of the ``ClpMetadataProvider`` and ``ClpSplitProvider`` interfaces, and configure the connector +If you prefer to use a different source—or the same source with a custom implementation—you can provide your own +implementations of the ``ClpMetadataProvider`` and ``ClpSplitProvider`` interfaces and configure the connector accordingly.
♻️ Duplicate comments (4)
presto-docs/src/main/sphinx/connector/clp.rst (4)
142-144: Remove superfluous comma after the parenthetical.Minor grammar polish for smoother reading.
Apply this diff:
-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent +For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent
220-222: Singular “implementation” and remove extra comma.Matches wording used elsewhere and reads more cleanly.
Apply this diff:
-If you prefer to use a different format for the ``customOptions``, you can provide your own implementations of the -``ClpSplitFilterProvider`` interface, and configure the connector accordingly. +If you prefer to use a different format for the ``customOptions``, you can provide your own implementation of the +``ClpSplitFilterProvider`` interface and configure the connector accordingly.
152-154: Clarify enforcement of required filters even when no FilterNode is produced.Docs should reflect that required-filter checking still occurs (e.g., LIMIT-only queries).
Apply this diff:
- query. If a required filter is missing or cannot be added to the split filter query, the original query will be - rejected. + query. If a required filter is missing or cannot be added to the split filter query, the original query will be + rejected. This requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ... + LIMIT 1``).
52-58: Use Presto type names and tighten grammar.
- Prefer Presto type names (uppercase) for consistency across the docs.
- Fix “both an int and string types” grammar.
- Standardise casing in the “See …” parenthetical.
Apply this diff:
- distinguish between types. For example, if an ``id`` column appears with - both an ``int`` and ``string`` types, the connector will create two + distinguish between types. For example, if an ``id`` column appears with + both ``BIGINT`` and ``VARCHAR`` types, the connector will create two columns named ``id_bigint`` and ``id_varchar``. @@ - Supported type annotations include ``bigint``, ``varchar``, ``double``, - ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For + Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``, + ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (see `Data Types`_ for details). For columns with only one type, the original column name is used.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
presto-docs/src/main/sphinx/connector/clp.rst(9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (4)
presto-docs/src/main/sphinx/connector/clp.rst (4)
139-141: Clear explanation of mapping from data columns to metadata columns.The phrasing crisply distinguishes data columns vs. metadata columns and sets reader expectations for remapping.
192-197: Good alignment with provider-specific options.The example correctly nests MySQL-specific rangeMapping under customOptions with clear bounds.
226-231: Scope and examples are appropriate.The supported-expression list is concrete and avoids over-promising. Looks good.
236-251: Data type mapping table reads cleanly and is consistent with Presto types.The mappings are correct and the unsupported catch-all helps set expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
presto-docs/src/main/sphinx/connector/clp.rst (3)
142-143: Remove superfluous comma after parentheticalThe comma after the parenthetical breaks the flow.
-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent +For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent
152-154: Document enforcement when no FilterNode is producedThe PR enforces required filters even when the planner does not produce a FilterNode (e.g., SELECT * ... LIMIT 1). Make this explicit to avoid surprises.
- query. If a required filter is missing or cannot be added to the split filter query, the original query will be - rejected. + query. If a required filter is missing or cannot be added to the split-filter query, the original query will be + rejected. This requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ... LIMIT 1``).
52-58: Use Presto type names and fix article/pluralisation; align with Data Types section
- Replace informal types with Presto types and correct grammar.
- Keep terminology consistent with the Data Types table (use uppercase and ARRAY(...) notation).
- distinguish between types. For example, if an ``id`` column appears with - both an ``int`` and ``string`` types, the connector will create two + distinguish between types. For example, if an ``id`` column appears as + both ``BIGINT`` and ``VARCHAR`` types, the connector will create two columns named ``id_bigint`` and ``id_varchar``. - Supported type annotations include ``bigint``, ``varchar``, ``double``, - ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For + Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``, + ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (see `Data Types`_ for details). For columns with only one type, the original column name is used.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
presto-docs/src/main/sphinx/connector/clp.rst(9 hunks)
🔇 Additional comments (1)
presto-docs/src/main/sphinx/connector/clp.rst (1)
112-116: Nice clarification of the split-filter flowClear, concise explanation of how split filters reduce scanned data and how predicates are translated.
gibber9809
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor nit agreeing with a rabbit comment, but otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)
32-35: Split-filter error codes added/renamed — numeric IDs stable; looks goodThe new constants and their ErrorType choices are appropriate, and IDs remain stable. No concerns here.
Optional: run a quick scan for any lingering references to the old metadata-filter constants or property names in code/docs.
#!/bin/bash # 1) Ensure old metadata-filter error codes/usages are gone rg -n --hidden --no-ignore-vcs -S 'CLP_METADATA_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_METADATA_FILTER_NOT_VALID|remapFilterSql' || true # 2) Ensure docs/examples use split-filter terminology rg -n --hidden --no-ignore-vcs -S 'metadata-filter-config' || true # 3) Sanity-check new constants are referenced rg -n --hidden --no-ignore-vcs -S 'CLP_SPLIT_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_SPLIT_FILTER_NOT_VALID|CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE' || true
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(34-158)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(51-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
21-23: Imports align with split-filter wiringCorrect classes and static imports for the new split-filter provider and error code. All good.
Also applies to: 29-33
| if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) { | ||
| binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON); | ||
| } | ||
| else { | ||
| throw new PrestoException(CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE, "Unsupported split filter provider type: " + config.getSplitFilterProviderType()); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Fail-fast on unsupported split-filter provider type — good; align comparison style with rest of file
The explicit PrestoException improves debuggability. For consistency with the other provider checks in this class, favour config.getX() == Type.Y comparison style.
Apply:
- if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) {
+ if (config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL) {
binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON);
}
else {
throw new PrestoException(CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE, "Unsupported split filter provider type: " + config.getSplitFilterProviderType());
}Note: with only MYSQL in SplitFilterProviderType today, the else branch is currently unreachable; keeping it is fine for future-proofing.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) { | |
| binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON); | |
| } | |
| else { | |
| throw new PrestoException(CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE, "Unsupported split filter provider type: " + config.getSplitFilterProviderType()); | |
| } | |
| if (config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL) { | |
| binder.bind(ClpSplitFilterProvider.class) | |
| .to(ClpMySqlSplitFilterProvider.class) | |
| .in(Scopes.SINGLETON); | |
| } | |
| else { | |
| throw new PrestoException( | |
| CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE, | |
| "Unsupported split filter provider type: " + config.getSplitFilterProviderType()); | |
| } |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java around
lines 49 to 55, change the equality check style to match the rest of the file by
using config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL
instead of SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType();
keep the existing binder.bind(...) for MYSQL and preserve the else branch that
throws the PrestoException so unsupported types still fail fast.
gibber9809
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. For PR title how about
refactor: Extract split filter provider interface to improve flexibility for user customization.
| * {@code "customOptions"} field in the filter. The implementation is originally chosen | ||
| * by the {@code clp.split-filter-provider-type} config option. | ||
| */ | ||
| public class ClpCustomSplitFilterOptionsDeserializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ClpSplitFilterConfigrCustomOptionsDeserializer? It's long, but fits better with ClpSplitFilterConfig.
...ava/com/facebook/presto/plugin/clp/split/filter/ClpCustomSplitFilterOptionsDeserializer.java
Outdated
Show resolved
Hide resolved
...p/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java
Outdated
Show resolved
Hide resolved
...p/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java
Outdated
Show resolved
Hide resolved
...p/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java
Outdated
Show resolved
Hide resolved
Co-authored-by: kirkrodrigues <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (2)
16-17: Import JsonIgnoreProperties to harden forward-compatibilityAdds resilience to unknown fields.
import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
32-36: Ignore unknown fields and make columnName requiredFail fast on malformed configs; ignore extra fields for forward compatibility.
-public class ClpSplitFilterConfig +@JsonIgnoreProperties(ignoreUnknown = true) +public class ClpSplitFilterConfig { - @JsonProperty("columnName") + @JsonProperty(value = "columnName", required = true) public String columnName;presto-docs/src/main/sphinx/connector/clp.rst (3)
53-58: Use Presto type names and fix grammarAligns terminology and fixes pluralisation.
- both an ``int`` and ``string`` types, the connector will create two + both ``BIGINT`` and ``VARCHAR`` types, the connector will create two columns named ``id_bigint`` and ``id_varchar``. - Supported type annotations include ``bigint``, ``varchar``, ``double``, - ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For + Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``, + ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (See `Data Types`_ for details). For columns with only one type, the original column name is used.
142-143: Minor grammar: remove superfluous commaImproves readability.
-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent +For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent
152-154: Document enforcement when no FilterNode is producedReflects the behavioural change so users know queries without filters are still validated.
- - ``required`` *(optional, defaults to false)*: Whether the filter **must** be present in the generated metadata query. - If a required filter is missing or cannot be added to the metadata query, the original query will be rejected. + - ``required`` *(optional, defaults to false)*: Whether the filter **must** be present in the generated metadata query. + If a required filter is missing or cannot be added to the metadata query, the original query will be rejected. This + requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ... LIMIT 1``).presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
101-118: Required-filter detection via String.contains is brittle and null-unsafeFalse positives (“id” in “user_id”), case/quoting variants, and NPE if expression is null.
- public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression) - { - boolean hasRequiredSplitFilterColumns = true; - ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder(); - for (String tableScope : tableScopeSet) { - for (String columnName : getRequiredColumnNames(tableScope)) { - if (!splitFilterPushDownExpression.contains(columnName)) { - hasRequiredSplitFilterColumns = false; - notFoundListBuilder.add(columnName); - } - } - } - if (!hasRequiredSplitFilterColumns) { - throw new PrestoException( - CLP_MANDATORY_SPLIT_FILTER_NOT_VALID, - notFoundListBuilder.build() + " is a mandatory split filter column but not valid"); - } - } + public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression) + { + boolean hasAll = true; + ImmutableSet.Builder<String> missing = ImmutableSet.builder(); + final String expr = (splitFilterPushDownExpression == null) ? "" : splitFilterPushDownExpression; + for (String tableScope : tableScopeSet) { + for (String columnName : getRequiredColumnNames(tableScope)) { + if (!containsRequiredColumn(expr, columnName)) { + hasAll = false; + missing.add(columnName); + } + } + } + if (!hasAll) { + throw new PrestoException( + CLP_MANDATORY_SPLIT_FILTER_NOT_VALID, + "Required split filter columns missing from predicate: " + missing.build()); + } + }Add this helper in the class:
// Performs a strict match on a quoted identifier to avoid substring false-positives. private static boolean containsRequiredColumn(String expression, String columnName) { String regex = "\\Q\"" + columnName + "\"\\E"; return java.util.regex.Pattern.compile(regex).matcher(expression).find(); }Quick check:
#!/bin/bash # Sanity check: ensure the helper is present and method updated rg -n 'containsRequiredColumn\(' -C2 rg -n 'CLP_MANDATORY_SPLIT_FILTER_NOT_VALID' -C2presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2)
63-69: Style: conventional length checks improve readabilityNo behaviour change.
- if (1 < splitScope.length) { + if (splitScope.length > 1) { mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1]))); } - if (3 == splitScope.length) { + if (splitScope.length == 3) { mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope))); }
71-84: Quote column names in regex and tolerate flexible whitespace; handle null inputUnescaped dots in keys (e.g., msg.timestamp) act as wildcards and will mis-match; also avoid NPE when there’s no predicate to remap.
- String remappedSql = pushDownExpression; + if (pushDownExpression == null) { + return null; + } + String remappedSql = pushDownExpression; for (Map.Entry<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> entry : mappings.entrySet()) { String key = entry.getKey(); ClpMySqlCustomSplitFilterOptions.RangeMapping value = entry.getValue(); - remappedSql = remappedSql.replaceAll( - format("\"(%s)\"\\s(>=?)\\s(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", key), - format("%s $2 $3", value.upperBound)); - remappedSql = remappedSql.replaceAll( - format("\"(%s)\"\\s(<=?)\\s(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", key), - format("%s $2 $3", value.lowerBound)); - remappedSql = remappedSql.replaceAll( - format("\"(%s)\"\\s(=)\\s(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", key), - format("(%s <= $3 AND %s >= $3)", value.lowerBound, value.upperBound)); + String qKey = java.util.regex.Pattern.quote(key); + remappedSql = remappedSql.replaceAll( + format("\"(%s)\"\\s*(>=?)\\s*(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", qKey), + format("%s $2 $3", value.upperBound)); + remappedSql = remappedSql.replaceAll( + format("\"(%s)\"\\s*(<=?)\\s*(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", qKey), + format("%s $2 $3", value.lowerBound)); + remappedSql = remappedSql.replaceAll( + format("\"(%s)\"\\s*(=)\\s*(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", qKey), + format("(%s <= $3 AND %s >= $3)", value.lowerBound, value.upperBound)); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestissimo-worker-images-build
| private Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> getAllMappingsFromFilters(List<ClpSplitFilterConfig> filters) | ||
| { | ||
| return null != filters | ||
| ? filters.stream() | ||
| .filter(filter -> | ||
| filter.customOptions instanceof ClpMySqlCustomSplitFilterOptions && | ||
| ((ClpMySqlCustomSplitFilterOptions) filter.customOptions).rangeMapping != null) | ||
| .collect(toImmutableMap( | ||
| filter -> filter.columnName, | ||
| filter -> ((ClpMySqlCustomSplitFilterOptions) filter.customOptions).rangeMapping)) | ||
| : ImmutableMap.of(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Validate mapped metadata column identifiers to mitigate injection/config errors
Range-mapped metadata column names are interpolated into SQL. Validate they match a safe identifier pattern before use, or reject the config.
I recommend validating lowerBound/upperBound against a safe identifier regex (e.g., reuse ClpConfig.SAFE_SQL_TABLE_NAME_PATTERN or define a SAFE_SQL_IDENTIFIER_PATTERN) during config load and throwing a PrestoException if invalid. Would you like me to draft this validation?
| import com.fasterxml.jackson.core.JsonParser; | ||
| import com.fasterxml.jackson.databind.DeserializationContext; | ||
| import com.fasterxml.jackson.databind.JsonDeserializer; | ||
| import com.fasterxml.jackson.databind.ObjectMapper; | ||
| import com.fasterxml.jackson.databind.node.ObjectNode; | ||
|
|
||
| import java.io.IOException; | ||
|
|
||
| import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use JsonNode/ObjectCodec and add requireNonNull import; remove unnecessary ObjectMapper cast
Avoid casting the codec to ObjectMapper (can throw ClassCastException with alternate codecs) and don’t assume an ObjectNode.
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
-import com.fasterxml.jackson.databind.ObjectMapper;
-import com.fasterxml.jackson.databind.node.ObjectNode;
+import com.fasterxml.jackson.databind.JsonNode;
import java.io.IOException;
import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
+import static java.util.Objects.requireNonNull;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import com.fasterxml.jackson.core.JsonParser; | |
| import com.fasterxml.jackson.databind.DeserializationContext; | |
| import com.fasterxml.jackson.databind.JsonDeserializer; | |
| import com.fasterxml.jackson.databind.ObjectMapper; | |
| import com.fasterxml.jackson.databind.node.ObjectNode; | |
| import java.io.IOException; | |
| import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions; | |
| import com.fasterxml.jackson.core.JsonParser; | |
| import com.fasterxml.jackson.databind.DeserializationContext; | |
| import com.fasterxml.jackson.databind.JsonDeserializer; | |
| import com.fasterxml.jackson.databind.JsonNode; | |
| import java.io.IOException; | |
| import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions; | |
| import static java.util.Objects.requireNonNull; |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java
around lines 16 to 25, remove the unsafe cast to ObjectMapper and the assumption
that the codec returns an ObjectNode: use parser.getCodec() (ObjectCodec) and
call readTree(parser) to obtain a JsonNode, import and use
java.util.Objects.requireNonNull to assert the node is not null, then use the
codec (or ObjectMapper obtained safely via ObjectCodec if needed) to convert the
JsonNode to ClpSplitFilterConfig.CustomSplitFilterOptions via treeToValue or
equivalent; also add the requireNonNull import and remove any ObjectMapper cast
to avoid ClassCastException and unsafe ObjectNode assumptions.
| public class ClpSplitFilterConfigrCustomOptionsDeserializer | ||
| extends JsonDeserializer<CustomSplitFilterOptions> | ||
| { | ||
| private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass; | ||
|
|
||
| public ClpSplitFilterConfigrCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass) | ||
| { | ||
| this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix class name typo, make constructor null-safe, and rename the file accordingly
The public class name is misspelled (“Configr”). This leaks into the API and downstream references. Also, guard against null target type.
Apply this diff and rename the file to ClpSplitFilterCustomOptionsDeserializer.java:
-public class ClpSplitFilterConfigrCustomOptionsDeserializer
+public class ClpSplitFilterCustomOptionsDeserializer
extends JsonDeserializer<CustomSplitFilterOptions>
{
private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;
- public ClpSplitFilterConfigrCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
+ public ClpSplitFilterCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
- this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+ this.actualCustomSplitFilterOptionsClass = requireNonNull(actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
}Run this to find and update all references:
🏁 Script executed:
#!/bin/bash
# Find references to the old, misspelled class name
rg -n 'ClpSplitFilterConfigrCustomOptionsDeserializer' -S -C2Length of output: 2550
🏁 Script executed:
#!/bin/bash
# Find any occurrence of the typo 'Configr' across the repository, including imports and docs
rg -n 'Configr'Length of output: 665
🏁 Script executed:
#!/bin/bash
# Display the top 20 lines to check for existing imports
sed -n '1,20p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.javaLength of output: 1022
Apply critical fixes to correct the class name, guard against null, and update all references
The class ClpSplitFilterConfigrCustomOptionsDeserializer is misspelled, which leaks into the public API. We should also make the constructor parameter null-safe and ensure imports are added for requireNonNull. Specifically:
• Rename the Java file
– From
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java
– To
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterCustomOptionsDeserializer.java
• In that file, update the class and constructor names:
-public class ClpSplitFilterConfigrCustomOptionsDeserializer
+public class ClpSplitFilterCustomOptionsDeserializer
extends JsonDeserializer<CustomSplitFilterOptions>
{
private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;
- public ClpSplitFilterConfigrCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
+ public ClpSplitFilterCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
- this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+ this.actualCustomSplitFilterOptionsClass =
+ requireNonNull(actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
}• Add the static import at the top of the file for null checks:
import static java.util.Objects.requireNonNull;• Update all call sites (e.g., in ClpSplitFilterProvider.java):
- new ClpSplitFilterConfigrCustomOptionsDeserializer(getCustomSplitFilterOptionsClass()));
+ new ClpSplitFilterCustomOptionsDeserializer(getCustomSplitFilterOptionsClass()));After these changes, rebuild and run the ripgrep search again to confirm there are no remaining references to the old, misspelled name.
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter around
lines 31-39, rename the file and class from
ClpSplitFilterConfigrCustomOptionsDeserializer to
ClpSplitFilterCustomOptionsDeserializer, change the constructor to match the new
class name, add a static import "import static
java.util.Objects.requireNonNull;" at the top, and guard the constructor
parameter with this.actualCustomSplitFilterOptionsClass =
requireNonNull(actualCustomSplitFilterOptionsClass,
"actualCustomSplitFilterOptionsClass");; then update all call sites (e.g.,
ClpSplitFilterProvider and any other references) to use the new class name,
rebuild and run ripgrep to verify no references to the old misspelled name
remain.
| @Override | ||
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | ||
| { | ||
| ObjectNode node = p.getCodec().readTree(p); | ||
| ObjectMapper mapper = (ObjectMapper) p.getCodec(); | ||
|
|
||
| return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Safer deserialization via ObjectCodec; don’t depend on concrete mapper
Keeps the deserializer robust if a non-ObjectMapper codec is used.
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
- ObjectNode node = p.getCodec().readTree(p);
- ObjectMapper mapper = (ObjectMapper) p.getCodec();
-
- return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
+ JsonNode node = p.getCodec().readTree(p);
+ return p.getCodec().treeToValue(node, actualCustomSplitFilterOptionsClass);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | |
| { | |
| ObjectNode node = p.getCodec().readTree(p); | |
| ObjectMapper mapper = (ObjectMapper) p.getCodec(); | |
| return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass); | |
| } | |
| @Override | |
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | |
| { | |
| JsonNode node = p.getCodec().readTree(p); | |
| return p.getCodec().treeToValue(node, actualCustomSplitFilterOptionsClass); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java
around lines 41 to 48, the deserializer casts the parser codec to ObjectMapper
which is unsafe if a non-ObjectMapper ObjectCodec is used; replace the cast with
the codec directly and call treeToValue on the ObjectCodec (e.g., ObjectCodec
codec = p.getCodec(); return codec.treeToValue(node,
actualCustomSplitFilterOptionsClass);) so deserialization does not depend on a
concrete ObjectMapper.
...to-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
Show resolved
Hide resolved
| catch (IOException e) { | ||
| throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Improve exception message with file path for diagnosability
Helps operators quickly pinpoint misconfiguration.
- catch (IOException e) {
- throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e);
+ catch (IOException e) {
+ throw new PrestoException(
+ CLP_SPLIT_FILTER_CONFIG_NOT_FOUND,
+ "Failed to read split filter config file: " + config.getSplitFilterConfig(),
+ e);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| catch (IOException e) { | |
| throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e); | |
| } | |
| catch (IOException e) { | |
| throw new PrestoException( | |
| CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, | |
| "Failed to read split filter config file: " + config.getSplitFilterConfig(), | |
| e); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
around lines 73 to 75, the PrestoException message thrown on IOException omits
the config file path, making diagnosis harder; change the exception to include
the path variable (or the resolved file path string) in the message and pass the
original exception as the cause so the log shows both the file path and the
underlying error details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (2)
73-75: Include the config file path in the exception for faster diagnosisSurfacing the file path drastically improves operability when configurations are mispointed or unreadable.
- catch (IOException e) { - throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e); + catch (IOException e) { + throw new PrestoException( + CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, + "Failed to read split filter config file: " + config.getSplitFilterConfig(), + e); }
101-118: Robust required-filter detection (null-safe, no false positives, deduplicated message)Current String.contains:
- NPEs if splitFilterPushDownExpression is null.
- False-positives (e.g., "id" in "user_id").
- Repeats columns across scopes.
Switch to a quoted-identifier regex match, deduplicate missing columns, and improve the message.
public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression) { - boolean hasRequiredSplitFilterColumns = true; - ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder(); + boolean hasRequiredSplitFilterColumns = true; + ImmutableSet.Builder<String> notFoundSetBuilder = ImmutableSet.builder(); + String expr = (splitFilterPushDownExpression == null) ? "" : splitFilterPushDownExpression; for (String tableScope : tableScopeSet) { for (String columnName : getRequiredColumnNames(tableScope)) { - if (!splitFilterPushDownExpression.contains(columnName)) { + if (!containsRequiredColumn(expr, columnName)) { hasRequiredSplitFilterColumns = false; - notFoundListBuilder.add(columnName); + notFoundSetBuilder.add(columnName); } } } if (!hasRequiredSplitFilterColumns) { throw new PrestoException( CLP_MANDATORY_SPLIT_FILTER_NOT_VALID, - notFoundListBuilder.build() + " is a mandatory split filter column but not valid"); + "Required split filter columns missing from predicate: " + notFoundSetBuilder.build()); } }Add this helper within the class (outside the selected range):
// Performs a strict match on a quoted identifier to avoid substring false-positives. // Example: matches "\"msg.timestamp\"" in the expression, where columnName is msg.timestamp. private static boolean containsRequiredColumn(String expression, String columnName) { String regex = "\\Q\"" + columnName + "\"\\E"; return java.util.regex.Pattern.compile(regex).matcher(expression).find(); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-204)presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
PrestoException(18-56)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
ClpSplitFilterConfig(32-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
62-67: Approval: Deserializer wiring verifiedNo stale or typoed references to the old deserializer class names were found in the repository. The newly registered ClpSplitFilterConfigCustomOptionsDeserializer with the correct CustomSplitFilterOptions class can be shipped as-is.
| import com.fasterxml.jackson.core.JsonParser; | ||
| import com.fasterxml.jackson.databind.DeserializationContext; | ||
| import com.fasterxml.jackson.databind.JsonDeserializer; | ||
| import com.fasterxml.jackson.databind.ObjectMapper; | ||
| import com.fasterxml.jackson.databind.node.ObjectNode; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Adjust imports to match the refactor (optional but cleaner)
If you apply the previous refactor, update imports accordingly to remove unused ones and add needed types.
@@
-import com.fasterxml.jackson.databind.ObjectMapper;
-import com.fasterxml.jackson.databind.node.ObjectNode;
+import com.fasterxml.jackson.core.ObjectCodec;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.databind.JsonNode;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import com.fasterxml.jackson.core.JsonParser; | |
| import com.fasterxml.jackson.databind.DeserializationContext; | |
| import com.fasterxml.jackson.databind.JsonDeserializer; | |
| import com.fasterxml.jackson.databind.ObjectMapper; | |
| import com.fasterxml.jackson.databind.node.ObjectNode; | |
| import com.fasterxml.jackson.core.JsonParser; | |
| import com.fasterxml.jackson.databind.DeserializationContext; | |
| import com.fasterxml.jackson.databind.JsonDeserializer; | |
| import com.fasterxml.jackson.core.ObjectCodec; | |
| import com.fasterxml.jackson.core.JsonToken; | |
| import com.fasterxml.jackson.databind.JsonNode; |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 16-21, update the import list to match the recent refactor: remove
any imports that are no longer referenced (e.g., ObjectMapper, ObjectNode,
JsonDeserializer if unused) and add the specific Jackson types the new
implementation uses (for example com.fasterxml.jackson.databind.JsonNode and/or
any exception/utility types and the exact JsonParser/DeserializationContext
remains); ensure only the actually referenced classes are imported so there are
no unused imports and the compiler imports match the current code.
| /** | ||
| * Uses the given {@link CustomSplitFilterOptions} implementation to deserialize the | ||
| * {@code "customOptions"} field in a {@link ClpSplitFilterConfig}. The implementation is determined | ||
| * by the implemented {@link ClpSplitFilterProvider}. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Clarify null-handling behaviour in Javadoc
Document that deserialisation returns null when customOptions is absent or explicitly null.
/**
* Uses the given {@link CustomSplitFilterOptions} implementation to deserialize the
* {@code "customOptions"} field in a {@link ClpSplitFilterConfig}. The implementation is determined
* by the implemented {@link ClpSplitFilterProvider}.
+ * <p>
+ * Returns {@code null} if the {@code "customOptions"} field is absent or {@code null}.
+ * </p>
*/🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 26 to 30, update the class Javadoc to explicitly state the
null-handling behaviour: document that deserialization will return null when the
"customOptions" field is missing or explicitly set to null, and clarify that a
non-null CustomSplitFilterOptions implementation is only returned when the field
is present and successfully deserialized; keep the existing references to
CustomSplitFilterOptions and ClpSplitFilterConfig/ClpSplitFilterProvider but add
a short sentence describing these null cases.
| public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass) | ||
| { | ||
| this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validate constructor argument to fail fast
Defensively enforce a non-null class reference to avoid late failures.
- public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
+ public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
- this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+ this.actualCustomSplitFilterOptionsClass = java.util.Objects.requireNonNull(
+ actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass) | |
| { | |
| this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass; | |
| } | |
| public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass) | |
| { | |
| this.actualCustomSplitFilterOptionsClass = java.util.Objects.requireNonNull( | |
| actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null"); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 36 to 39, the constructor does not validate its
actualCustomSplitFilterOptionsClass parameter; add a defensive null check (e.g.,
Objects.requireNonNull or explicit if-check) at the start of the constructor and
throw a clear NullPointerException with a message like
"actualCustomSplitFilterOptionsClass is null" so the class reference is
validated immediately and failures are fast and clear.
| @Override | ||
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | ||
| { | ||
| ObjectNode node = p.getCodec().readTree(p); | ||
| ObjectMapper mapper = (ObjectMapper) p.getCodec(); | ||
|
|
||
| return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Null-safe deserialisation; avoid brittle ObjectMapper cast
Guard against null tokens and map via ObjectCodec to avoid the hard cast to ObjectMapper. This prevents potential NPEs when customOptions is null/missing and future-proofs against non-ObjectMapper codecs.
Apply this diff:
@@
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
- ObjectNode node = p.getCodec().readTree(p);
- ObjectMapper mapper = (ObjectMapper) p.getCodec();
-
- return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
+ if (p.getCurrentToken() == com.fasterxml.jackson.core.JsonToken.VALUE_NULL) {
+ return null;
+ }
+ com.fasterxml.jackson.core.ObjectCodec codec = p.getCodec();
+ com.fasterxml.jackson.databind.JsonNode node = codec.readTree(p);
+ if (node == null || node.isNull()) {
+ return null;
+ }
+ return codec.treeToValue(node, actualCustomSplitFilterOptionsClass);
}If you prefer, also add explicit imports for ObjectCodec, JsonToken, and JsonNode (see separate import note).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | |
| { | |
| ObjectNode node = p.getCodec().readTree(p); | |
| ObjectMapper mapper = (ObjectMapper) p.getCodec(); | |
| return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass); | |
| } | |
| @Override | |
| public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException | |
| { | |
| // Handle explicit JSON null | |
| if (p.getCurrentToken() == com.fasterxml.jackson.core.JsonToken.VALUE_NULL) { | |
| return null; | |
| } | |
| // Use ObjectCodec rather than casting to ObjectMapper | |
| com.fasterxml.jackson.core.ObjectCodec codec = p.getCodec(); | |
| com.fasterxml.jackson.databind.JsonNode node = codec.readTree(p); | |
| // Guard against missing or explicit null node | |
| if (node == null || node.isNull()) { | |
| return null; | |
| } | |
| // Delegate to codec for mapping into the target class | |
| return codec.treeToValue(node, actualCustomSplitFilterOptionsClass); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 41-48, the current code casts the parser's codec to ObjectMapper
and doesn't guard against null/missing values; change it to be null-safe and
codec-agnostic by: obtain the current token and return null immediately if the
token is JsonToken.VALUE_NULL, use ObjectCodec codec = p.getCodec() instead of
casting to ObjectMapper, read a JsonNode via codec.readTree(p), return null if
the node is null or node.isNull(), and finally call codec.treeToValue(node,
actualCustomSplitFilterOptionsClass) to deserialize; also add imports for
ObjectCodec, JsonToken, and JsonNode.
| private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor) | ||
| { | ||
| String[] splitScope = scope.split("\\."); | ||
| ImmutableSet.Builder<String> builder = ImmutableSet.builder(); | ||
|
|
||
| builder.addAll(extractor.apply(filterMap.get(splitScope[0]))); | ||
|
|
||
| if (splitScope.length > 1) { | ||
| builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1]))); | ||
| } | ||
|
|
||
| if (splitScope.length == 3) { | ||
| builder.addAll(extractor.apply(filterMap.get(scope))); | ||
| } | ||
|
|
||
| return builder.build(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Defensive null check on scope to prevent accidental NPEs
If a caller ever passes a null scope, scope.split(...) will NPE. Cheap guard improves resilience.
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor)
{
- String[] splitScope = scope.split("\\.");
+ requireNonNull(scope, "scope is null");
+ String[] splitScope = scope.split("\\.");
ImmutableSet.Builder<String> builder = ImmutableSet.builder();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor) | |
| { | |
| String[] splitScope = scope.split("\\."); | |
| ImmutableSet.Builder<String> builder = ImmutableSet.builder(); | |
| builder.addAll(extractor.apply(filterMap.get(splitScope[0]))); | |
| if (splitScope.length > 1) { | |
| builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1]))); | |
| } | |
| if (splitScope.length == 3) { | |
| builder.addAll(extractor.apply(filterMap.get(scope))); | |
| } | |
| return builder.build(); | |
| } | |
| private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor) | |
| { | |
| requireNonNull(scope, "scope is null"); | |
| String[] splitScope = scope.split("\\."); | |
| ImmutableSet.Builder<String> builder = ImmutableSet.builder(); | |
| builder.addAll(extractor.apply(filterMap.get(splitScope[0]))); | |
| if (splitScope.length > 1) { | |
| builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1]))); | |
| } | |
| if (splitScope.length == 3) { | |
| builder.addAll(extractor.apply(filterMap.get(scope))); | |
| } | |
| return builder.build(); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
around lines 139-155, the method collectColumnNamesFromScopes calls
scope.split(...) without guarding for a null scope which will throw an NPE; add
a defensive null check at the start of the method (e.g., if (scope == null)
return ImmutableSet.of();) so the method returns an empty set for null input
rather than throwing, keeping the rest of the logic unchanged.
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving docs. Deferring to @gibber9809's review of the code.
73342c6
into
y-scope:release-0.293-clp-connector
Description
In this PR we mainly extract an interface of a split filter provider, so that the user can extend their own config format for their own metadata database. Since the metadata filter is only used for split filtering, we decide to rename it to split filter.
We can add a new package: split.filter, and a base class
ClpSplitFilterProvider.In this base class we keep all functions for processing scope, which is a concept that in the previous metadata filter design (i.e., is the filter for all schemas and all tables, or for tables under a certain schema, or only for the certain table). And also two abstract functions:
getCustomSplitFilterOptionsClass, which is for the user to return their own implementation forCustomSplitFilterOptions class. This is also a guard so that the user won’t forget to implement theCustomSplitFilterOptions class(see below for the definition of CustomSplitFilterOptions).remapSplitFilterPushDown, which is renamed from the old function remapFilterSql. This function is for rewriting the split filter push down expression with split-filter-specific stuff, so the user can only extend this function to handle split-filter-specific logic.For the current filter structure, we move it to a separate class
ClpMetadataFilterwhich defined the basic structure for filters of all types of metadata database:”columnName”: the filter name.”customOptions”: the split-filter-specific data structure which should be implemented by the user, where we can put all split-filter-specific fields in it (e.g.,”rangeMapping”). We provide an implementation for MySql metadata database inClpMySqlSplitFilterProvider.SplitDatabaseSpecific.”required”: does the filter have to exist after push down.As discussed, we will move this field into
”customOptions”in the next PR.We also add a new class
ClpCustomSplitFilterOptionsDeserializerwhich is for deserializing the”customOptions”field with the class given bygetCustomSplitFilterOptionsClass.For the changes in the config file for MySql metadata database, we only move the
“rangeMapping”into the field”customOptions”:{ "clp.default.table_1": [ { "columnName": "msg.timestamp", "customOptions": { "rangeMapping": { "lowerBound": "begin_timestamp", "upperBound": "end_timestamp" } }, "required": true }, { "columnName": "file_name" } ] }Checklist
breaking change.
Validation performed
timestampas a required metadata filter:SELECT 1 = 1 FROM default LIMIT 1, throw exception due to missing, because even it is not querying on any columns it still scans the table.SELECT query_id + 1 FROM default LIMIT 1, throw exception due to missing.SELECT * FROM default WHERE ps = 'startup' LIMIT 1, throw exception due to missing.SELECT COUNT(*) FROM default WHERE timestamp > FROM_UNIXTIME(0) AND timestamp < FROM_UNIXTIME(9999999999.1234);, return correctly.Summary by CodeRabbit