Skip to content

Conversation

@anlowee
Copy link

@anlowee anlowee commented Jul 21, 2025

Description

In this PR we mainly extract an interface of a split filter provider, so that the user can extend their own config format for their own metadata database. Since the metadata filter is only used for split filtering, we decide to rename it to split filter.

We can add a new package: split.filter, and a base class ClpSplitFilterProvider.

In this base class we keep all functions for processing scope, which is a concept that in the previous metadata filter design (i.e., is the filter for all schemas and all tables, or for tables under a certain schema, or only for the certain table). And also two abstract functions:

  1. getCustomSplitFilterOptionsClass, which is for the user to return their own implementation for CustomSplitFilterOptions class. This is also a guard so that the user won’t forget to implement the CustomSplitFilterOptions class (see below for the definition of CustomSplitFilterOptions).
  2. remapSplitFilterPushDown, which is renamed from the old function remapFilterSql. This function is for rewriting the split filter push down expression with split-filter-specific stuff, so the user can only extend this function to handle split-filter-specific logic.

For the current filter structure, we move it to a separate class ClpMetadataFilter which defined the basic structure for filters of all types of metadata database:

  1. ”columnName”: the filter name.
  2. ”customOptions”: the split-filter-specific data structure which should be implemented by the user, where we can put all split-filter-specific fields in it (e.g., ”rangeMapping”). We provide an implementation for MySql metadata database in ClpMySqlSplitFilterProvider.SplitDatabaseSpecific.
  3. ”required”: does the filter have to exist after push down.
    As discussed, we will move this field into ”customOptions” in the next PR.

We also add a new class ClpCustomSplitFilterOptionsDeserializer which is for deserializing the ”customOptions” field with the class given by getCustomSplitFilterOptionsClass.

For the changes in the config file for MySql metadata database, we only move the “rangeMapping” into the field ”customOptions”:

{
  "clp.default.table_1": [
      {
        "columnName": "msg.timestamp",
        "customOptions": {
          "rangeMapping": {
            "lowerBound": "begin_timestamp",
            "upperBound": "end_timestamp"
          }
        },
        "required": true
      },
      {
        "columnName": "file_name"
      }
  ]
}

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. E2E test w/ w/o metadata filters. Using pregsql datasets and configuring the timestamp as a required metadata filter:
  • SELECT 1 = 1 FROM default LIMIT 1, throw exception due to missing, because even it is not querying on any columns it still scans the table.
  • SELECT query_id + 1 FROM default LIMIT 1, throw exception due to missing.
  • SELECT * FROM default WHERE ps = 'startup' LIMIT 1, throw exception due to missing.
  • SELECT COUNT(*) FROM default WHERE timestamp > FROM_UNIXTIME(0) AND timestamp < FROM_UNIXTIME(9999999999.1234);, return correctly.
  1. Passed CI.

Summary by CodeRabbit

  • New Features
    • Introduces Split Filter support with a configurable provider (MySQL).
    • New settings: clp.split-filter-config and clp.split-filter-provider-type.
    • Enhanced split-pruning via range-mapping for numeric filters; validates required filters.
  • Refactor
    • Terminology shift from “Metadata Filter” to “Split Filter”.
    • Old clp.metadata-filter-config removed; update configurations accordingly.
    • Error identifiers updated to Split Filter terminology; adds unsupported provider error.
  • Documentation
    • Overhauled connector docs: new Split Filter configuration, examples, and provider guidance.
    • Expanded type mapping, SQL support notes, and revised configuration walkthroughs.

@coderabbitai
Copy link

coderabbitai bot commented Jul 21, 2025

Walkthrough

Replaces the legacy metadata-filter mechanism with a new split-filter framework. Introduces ClpSplitFilterProvider (with a MySQL implementation), updates wiring (connector, module, optimizer, provider), adjusts config and error codes, deletes ClpMetadataFilterProvider, updates docs, and adds tests and a JSON resource for the new split-filter configuration.

Changes

Cohort / File(s) Summary of changes
Wiring (Connector, Module, OptimizerProvider)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java, .../clp/ClpModule.java, .../clp/ClpPlanOptimizerProvider.java
Replace ClpMetadataFilterProvider with ClpSplitFilterProvider across fields, constructors, and usage. ClpModule binds ClpSplitFilterProvider to ClpMySqlSplitFilterProvider based on ClpConfig.SplitFilterProviderType; throws on unsupported types.
Plan Optimizer
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Switch to ClpSplitFilterProvider. Track table scopes in rewriter, call checkContainsRequiredFilters when applicable, and remap pushdown via remapSplitFilterPushDownExpression. Imports and variable names updated.
Split Filter Framework (New)
.../clp/split/filter/ClpSplitFilterProvider.java, .../clp/split/filter/ClpSplitFilterConfig.java, .../clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java, .../clp/split/filter/ClpMySqlSplitFilterProvider.java
Add framework to load per-scope split-filter JSON, expose getColumnNames, required-filter checks, and provider-specific pushdown remapping. Provide MySQL-specific implementation and customOptions deserializer.
Config & Error Codes
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java, .../clp/ClpErrorCode.java
Replace metadata-filter config with split-filter config and provider type. Add SplitFilterProviderType enum. Rename error codes to split-filter variants and add CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE.
Legacy Removal
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Delete legacy metadata filter provider class.
Documentation
presto-docs/src/main/sphinx/connector/clp.rst
Update docs to split-filter terminology, configuration keys, provider model, customOptions, examples, and SQL mapping notes.
Tests & Resources
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java, .../split/filter/TestClpMySqlSplitFilterConfig.java, presto-clp/src/test/resources/test-mysql-split-filter.json
Add MySQL remapping tests and common split-filter tests; migrate from metadata-filter tests. Update test JSON to nest rangeMapping under customOptions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Planner as Presto Planner
  participant Optimizer as ClpPlanOptimizer
  participant Rewriter as Plan Rewriter
  participant Provider as ClpSplitFilterProvider
  Note over Planner,Optimizer: Physical planning phase

  User->>Planner: Submit query
  Planner->>Optimizer: optimize(plan)
  Optimizer->>Rewriter: rewrite(plan)
  Rewriter->>Rewriter: visitTableScan -> collect tableScope
  alt Filter on TableScan
    Rewriter->>Provider: getColumnNames(tableScope)
    Rewriter->>Provider: remapSplitFilterPushDownExpression(tableScope, pushdownExpr)
    Rewriter-->>Optimizer: updated Filter/TableScan nodes
  else No filter visited but tables seen
    Optimizer->>Provider: checkContainsRequiredFilters({tableScopes}, "")
    Provider-->>Optimizer: ok or throws
  end
  Optimizer-->>Planner: optimized plan
Loading
sequenceDiagram
  autonumber
  participant Guice as Guice Binder
  participant Config as ClpConfig
  participant Module as ClpModule
  participant Provider as ClpSplitFilterProvider
  Note over Module,Provider: Startup binding

  Guice->>Config: read clp.split-filter-provider-type
  alt type == MYSQL
    Guice->>Module: bind(ClpSplitFilterProvider -> ClpMySqlSplitFilterProvider)
    Guice->>Provider: inject with clp.split-filter-config
  else unsupported
    Module->>Guice: throw CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@anlowee
Copy link
Author

anlowee commented Jul 21, 2025

Note that this PR exceeds the 400 lines is because I renamed some files and added a lot of new docs

@anlowee anlowee requested a review from wraymo July 21, 2025 23:06
@anlowee anlowee marked this pull request as ready for review July 21, 2025 23:06
@anlowee anlowee changed the title refactor: Refactor metadata filter config. refactor: Refactor metadata filter config -- Phase 1. Jul 21, 2025
@anlowee anlowee changed the title refactor: Refactor metadata filter config -- Phase 1. refactor: Extract metadata filter config provider interface. Jul 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6613d6c and 2079250.

📒 Files selected for processing (13)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (0 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java (5 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java (1 hunks)
  • presto-clp/src/test/resources/test-mysql-metadata-filter.json (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (2 hunks)
🧠 Learnings (11)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

💤 Files with no reviewable changes (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

🔇 Additional comments (24)
presto-clp/src/test/resources/test-mysql-metadata-filter.json (1)

15-20: LGTM! The nested structure supports the new extensible architecture.

The addition of the metadataDatabaseSpecific wrapper around rangeMapping aligns well with the new abstract ClpMetadataFilterProvider design, enabling database-specific filter configurations while maintaining backwards compatibility with the existing range mapping logic.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)

19-20: Import updates reflect the package restructuring correctly.

The imports have been properly updated to reference the new metadata.filter package location for the restructured filter provider classes.


48-48: Conditional binding maintains consistency with the existing pattern.

The conditional binding of ClpMetadataFilterProvider to ClpMySqlMetadataFilterProvider follows the same pattern as the metadata provider binding above, ensuring consistency in the dependency injection configuration.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2)

17-17: Import correctly updated for the new package structure.

The import has been properly updated to reference the ClpMetadataFilterProvider from its new location in the metadata.filter package.


98-98: Method name change improves clarity and specificity.

The method name change from remapFilterSql to remapMetadataFilterPushDown better describes the specific purpose of remapping metadata filter pushdown expressions, making the code more self-documenting.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (2)

18-18: Import correctly updated for the package restructuring.

The import statement properly reflects the new package location for ClpMetadataFilterProvider.


44-44: Dependency injection properly implemented with standard patterns.

The ClpMetadataFilterProvider dependency is correctly:

  • Declared as a final field
  • Injected via constructor parameter
  • Null-checked with appropriate error message
  • Passed to the ClpPlanOptimizerProvider

This follows established dependency injection patterns in the codebase.

Also applies to: 54-54, 62-62, 68-68

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (2)

16-16: Import correctly added for the new dependency.

The import statement properly includes the ClpMetadataFilterProvider from the restructured package location.


32-32: Dependency injection implemented consistently with existing pattern.

The ClpMetadataFilterProvider dependency is properly:

  • Declared as a final field
  • Accepted via constructor parameter
  • Stored in the field
  • Passed to the ClpPlanOptimizer constructor

The implementation follows the same pattern as the existing functionManager and functionResolution dependencies in this class.

Also applies to: 35-35, 39-39, 51-51

presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMetadataFilterConfigCommon.java (4)

14-14: LGTM - Package structure follows new organization.

The package change to com.facebook.presto.plugin.clp.metadata.filter aligns with the new metadata filter provider structure.


36-36: LGTM - Class rename reflects focused scope.

The rename to TestClpMetadataFilterConfigCommon appropriately indicates this class tests common functionality shared across different metadata filter provider implementations.


43-45: LGTM - Resource file update aligns with new JSON structure.

The change from "test-metadata-filter.json" to "test-mysql-metadata-filter.json" reflects the new filter configuration format that nests rangeMapping inside metadataDatabaseSpecific.


56-56: LGTM - Provider class updated consistently.

All instances of ClpMetadataFilterProvider have been correctly replaced with ClpMySqlMetadataFilterProvider, maintaining consistency with the new MySQL-specific implementation.

Also applies to: 71-71, 86-86, 93-93

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataDatabaseSpecificDeserializer.java (3)

26-30: LGTM - Clear and well-documented purpose.

The class documentation clearly explains its role in deserializing the metadataDatabaseSpecific field based on the configured implementation class.


36-39: LGTM - Constructor design supports polymorphic deserialization.

The constructor accepts the specific implementation class, enabling flexible deserialization based on the metadata provider type configuration.


42-48: LGTM - Deserialize implementation is correct.

The method correctly:

  1. Reads the JSON content as an ObjectNode
  2. Uses Jackson's treeToValue to convert to the specified implementation class
  3. Handles the polymorphic nature of the MetadataDatabaseSpecific interface
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (3)

18-31: LGTM - Comprehensive and clear documentation.

The JavaDoc provides excellent documentation for each field, explaining their purpose and the optional nature of some fields with default values.


34-41: LGTM - Appropriate data model design.

The public fields with Jackson annotations follow the simple data holder pattern commonly used for configuration objects. The field types are appropriate for their intended use.


43-44: LGTM - Marker interface enables polymorphic design.

The empty MetadataDatabaseSpecific interface serves as a type-safe marker for database-specific filter details, enabling the polymorphic deserialization pattern used throughout the system.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/filter/TestClpMySqlMetadataFilterConfig.java (2)

28-42: LGTM - Proper test setup follows established pattern.

The setup method correctly loads the MySQL-specific filter configuration file and handles potential exceptions appropriately.


44-70: LGTM - Comprehensive test coverage for remapping logic.

The test method thoroughly covers all comparison operators and verifies the correct remapping behaviour:

  • > and >= operators correctly map to end_timestamp
  • < and <= operators correctly map to begin_timestamp
  • = operator correctly generates a range check with both bounds

The test assertions validate the MySQL-specific remapping logic for timestamp range queries.

presto-docs/src/main/sphinx/connector/clp.rst (3)

134-146: LGTM - Clear documentation of the new filter structure.

The updated description accurately reflects the new filter configuration model:

  • Clarifies that filters specify how data columns are used for querying, not just column mapping
  • Properly documents all three fields with their purposes and optional nature
  • Explains the extensible design through the metadataDatabaseSpecific field

151-157: LGTM - MySQL-specific section appropriately scoped.

The renamed section clearly indicates this is MySQL-specific documentation, aligning with the extensible design where different database types can have different metadata-specific fields.


184-189: LGTM - JSON example matches new implementation structure.

The updated JSON example correctly nests rangeMapping inside metadataDatabaseSpecific, which matches the structure expected by the new ClpMetadataFilter class and the custom deserializer.

Comment on lines 107 to 122
public void checkContainsRequiredFilters(SchemaTableName schemaTableName, String metadataFilterPushDownExpression)
{
boolean hasRequiredMetadataFilterColumns = true;
ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) {
if (!metadataFilterPushDownExpression.contains(columnName)) {
hasRequiredMetadataFilterColumns = false;
notFoundListBuilder.add(columnName);
}
}
if (!hasRequiredMetadataFilterColumns) {
throw new PrestoException(
CLP_MANDATORY_METADATA_FILTER_NOT_VALID,
notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix unreliable column name matching and improve error message.

The current implementation has two issues:

  1. Using contains() for column name matching can cause false positives (e.g., "timestamp" would match "begin_timestamp")
  2. The error message grammar is incorrect when multiple columns are missing

Consider using a more reliable matching approach, such as parsing the SQL expression or using regex with word boundaries. For the immediate fix, improve the error message:

         if (!hasRequiredMetadataFilterColumns) {
+            List<String> missingColumns = notFoundListBuilder.build();
+            String columnText = missingColumns.size() == 1 ? "column" : "columns";
+            String verbText = missingColumns.size() == 1 ? "is" : "are";
             throw new PrestoException(
                     CLP_MANDATORY_METADATA_FILTER_NOT_VALID,
-                    notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid");
+                    String.format("Required metadata filter %s %s %s not present in the filter expression", 
+                                  columnText, missingColumns, verbText));
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public void checkContainsRequiredFilters(SchemaTableName schemaTableName, String metadataFilterPushDownExpression)
{
boolean hasRequiredMetadataFilterColumns = true;
ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) {
if (!metadataFilterPushDownExpression.contains(columnName)) {
hasRequiredMetadataFilterColumns = false;
notFoundListBuilder.add(columnName);
}
}
if (!hasRequiredMetadataFilterColumns) {
throw new PrestoException(
CLP_MANDATORY_METADATA_FILTER_NOT_VALID,
notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid");
}
}
public void checkContainsRequiredFilters(SchemaTableName schemaTableName, String metadataFilterPushDownExpression)
{
boolean hasRequiredMetadataFilterColumns = true;
ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) {
if (!metadataFilterPushDownExpression.contains(columnName)) {
hasRequiredMetadataFilterColumns = false;
notFoundListBuilder.add(columnName);
}
}
if (!hasRequiredMetadataFilterColumns) {
List<String> missingColumns = notFoundListBuilder.build();
String columnText = missingColumns.size() == 1 ? "column" : "columns";
String verbText = missingColumns.size() == 1 ? "is" : "are";
throw new PrestoException(
CLP_MANDATORY_METADATA_FILTER_NOT_VALID,
String.format(
"Required metadata filter %s %s %s not present in the filter expression",
columnText,
missingColumns,
verbText));
}
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
around lines 107 to 122, the current use of String.contains() to check for
required column names can cause false positives by matching substrings within
other column names. Replace this with a more reliable matching method, such as
using regex with word boundaries to ensure exact column name matches.
Additionally, update the error message to correctly handle pluralization when
multiple columns are missing, ensuring proper grammar in the exception message.

Comment on lines 123 to 157
protected static class ClpMySqlMetadataDatabaseSpecific
implements MetadataDatabaseSpecific
{
@JsonProperty("rangeMapping")
public RangeMapping rangeMapping;

public static class RangeMapping
{
@JsonProperty("lowerBound")
public String lowerBound;

@JsonProperty("upperBound")
public String upperBound;

@Override
public boolean equals(Object o)
{
if (this == o) {
return true;
}
if (!(o instanceof RangeMapping)) {
return false;
}
RangeMapping that = (RangeMapping) o;
return Objects.equals(lowerBound, that.lowerBound) &&
Objects.equals(upperBound, that.upperBound);
}

@Override
public int hashCode()
{
return Objects.hash(lowerBound, upperBound);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider adding field validation and toString method.

The inner classes use public fields and lack validation. While this might be acceptable for simple DTOs, consider adding validation and a toString method for better debugging.

         public static class RangeMapping
         {
-            @JsonProperty("lowerBound")
-            public String lowerBound;
+            private final String lowerBound;
+            private final String upperBound;

-            @JsonProperty("upperBound")
-            public String upperBound;
+            public RangeMapping(@JsonProperty("lowerBound") String lowerBound,
+                                @JsonProperty("upperBound") String upperBound)
+            {
+                this.lowerBound = requireNonNull(lowerBound, "lowerBound is null");
+                this.upperBound = requireNonNull(upperBound, "upperBound is null");
+            }
+
+            public String getLowerBound()
+            {
+                return lowerBound;
+            }
+
+            public String getUpperBound()
+            {
+                return upperBound;
+            }

             @Override
             public boolean equals(Object o)
             {
                 if (this == o) {
                     return true;
                 }
                 if (!(o instanceof RangeMapping)) {
                     return false;
                 }
                 RangeMapping that = (RangeMapping) o;
                 return Objects.equals(lowerBound, that.lowerBound) &&
                         Objects.equals(upperBound, that.upperBound);
             }

             @Override
             public int hashCode()
             {
                 return Objects.hash(lowerBound, upperBound);
             }
+
+            @Override
+            public String toString()
+            {
+                return "RangeMapping{" +
+                        "lowerBound='" + lowerBound + '\'' +
+                        ", upperBound='" + upperBound + '\'' +
+                        '}';
+            }
         }

Note: You'll also need to update the references to these fields in the remapMetadataFilterPushDown method to use the getters.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMySqlMetadataFilterProvider.java
between lines 123 and 157, the inner classes use public fields without
validation or a toString method. Add private fields with appropriate getters and
setters that include validation logic to ensure data integrity. Implement a
toString method in each class to improve debugging output. Finally, update all
references to these fields, especially in the remapMetadataFilterPushDown
method, to use the new getter methods instead of direct field access.

@anlowee anlowee requested a review from gibber9809 July 28, 2025 17:06
Copy link

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, just a few small questions/suggestions for improvements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)

58-80: Fix error message grammar as identified in previous reviews.

The error message in line 78 has grammatical issues that were flagged in previous reviews but remain unaddressed.

Apply this diff to fix the grammar:

-            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
+            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file");

107-122: Address unreliable column name matching flagged in previous reviews.

The current implementation using String.contains() for column name matching can cause false positives, and the error message has grammatical issues when multiple columns are missing. These issues were identified in previous reviews but remain unaddressed.

The contains() method will match substrings, potentially causing false positives (e.g., "timestamp" would match "begin_timestamp"). Additionally, the error message has grammatical issues with pluralization.

Consider implementing a more precise matching approach and fix the error message:

         if (!hasRequiredMetadataFilterColumns) {
+            List<String> missingColumns = notFoundListBuilder.build();
+            String columnText = missingColumns.size() == 1 ? "column" : "columns";
+            String verbText = missingColumns.size() == 1 ? "is" : "are";
             throw new PrestoException(
                     CLP_MANDATORY_METADATA_FILTER_NOT_VALID,
-                    notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid");
+                    String.format("Required metadata filter %s %s %s not present in the filter expression", 
+                                  columnText, missingColumns, verbText));
         }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2079250 and 13caaa5.

📒 Files selected for processing (2)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-186)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (32-70)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilter.java (1)
  • ClpMetadataFilter (32-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)

161-174: Good null-safe implementation.

The helper methods correctly handle null filter lists by returning empty sets, which prevents NullPointerExceptions and maintains consistent behaviour.


143-159: Consider bounds checking for scope splitting.

The method splits the scope string and accesses array elements without bounds checking, which could lead to ArrayIndexOutOfBoundsException if the scope format is unexpected.

Add bounds checking to make the method more robust:

 private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor)
 {
     String[] splitScope = scope.split("\\.");
+    if (splitScope.length == 0) {
+        return ImmutableSet.of();
+    }
+    
     ImmutableSet.Builder<String> builder = ImmutableSet.builder();

     builder.addAll(extractor.apply(filterMap.get(splitScope[0])));

-    if (splitScope.length > 1) {
+    if (splitScope.length > 1 && splitScope.length >= 2) {
         builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
     }

-    if (splitScope.length == 3) {
+    if (splitScope.length >= 3) {
         builder.addAll(extractor.apply(filterMap.get(scope)));
     }

     return builder.build();
 }
⛔ Skipped due to learnings
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.

@anlowee anlowee requested a review from gibber9809 July 29, 2025 14:33
@anlowee
Copy link
Author

anlowee commented Jul 29, 2025

@gibber9809 I fixed the regex to let it also match the negative numbers, decimals and scientific numbers. I also added the unit tests for them.

Copy link

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one small change suggestion for a docstring, but besides that LGTM. Might also be worth adopting some of rabbit's suggestions for improving the docs, but not critical.

Will approve once the suggested change for the docstring is in.

…ta/filter/ClpMetadataFilterProvider.java

Co-authored-by: Devin Gibson <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)

51-52: Update documentation to be more specific about database customization.

The current documentation mentions "different fields" but doesn't clearly convey that customization happens within the metadataDatabaseSpecific field.

- * For different metadata databases filter configs can include different fields.
+ * Different metadata databases can customize filter configurations through the 
+ * {@code metadataDatabaseSpecific} field within each {@link ClpMetadataFilter}.

107-122: Fix unreliable column name matching and improve error message.

The current implementation still has the issues identified in previous reviews:

  1. Using contains() for column name matching can cause false positives (e.g., "timestamp" would match "begin_timestamp")
  2. The error message grammar is incorrect when multiple columns are missing

Consider using a more reliable matching approach, such as parsing the SQL expression or using regex with word boundaries. For the immediate fix, improve the error message:

         if (!hasRequiredMetadataFilterColumns) {
+            List<String> missingColumns = notFoundListBuilder.build();
+            String columnText = missingColumns.size() == 1 ? "column" : "columns";
+            String verbText = missingColumns.size() == 1 ? "is" : "are";
             throw new PrestoException(
                     CLP_MANDATORY_METADATA_FILTER_NOT_VALID,
-                    notFoundListBuilder.build() + " is a mandatory metadata filter column but not valid");
+                    String.format("Required metadata filter %s %s %s not present in the filter expression", 
+                                  columnText, missingColumns, verbText));
         }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df82f0f and 36b2659.

📒 Files selected for processing (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

🔇 Additional comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java (2)

78-78: Good fix on the error message grammar and exception chaining.

The error message grammar has been corrected from the previous awkward phrasing, and the IOException is properly chained as the cause, which will provide valuable debugging information for configuration file issues.


131-131: Good fix on the documentation grammar.

The grammatical error has been corrected from "recommend to implementing" to "recommend implementing".

Comment on lines 143 to 159
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor)
{
String[] splitScope = scope.split("\\.");
ImmutableSet.Builder<String> builder = ImmutableSet.builder();

builder.addAll(extractor.apply(filterMap.get(splitScope[0])));

if (splitScope.length > 1) {
builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
}

if (splitScope.length == 3) {
builder.addAll(extractor.apply(filterMap.get(scope)));
}

return builder.build();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider adding null safety checks for scope operations.

The method splits the scope string and accesses array elements without validating the input format or handling potential null/empty values from the filterMap.

 private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor)
 {
+    if (scope == null || scope.isEmpty()) {
+        return ImmutableSet.of();
+    }
+    
     String[] splitScope = scope.split("\\.");
+    if (splitScope.length == 0) {
+        return ImmutableSet.of();
+    }
+    
     ImmutableSet.Builder<String> builder = ImmutableSet.builder();

-    builder.addAll(extractor.apply(filterMap.get(splitScope[0])));
+    List<ClpMetadataFilter> catalogFilters = filterMap.get(splitScope[0]);
+    if (catalogFilters != null) {
+        builder.addAll(extractor.apply(catalogFilters));
+    }

     if (splitScope.length > 1) {
-        builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
+        List<ClpMetadataFilter> schemaFilters = filterMap.get(splitScope[0] + "." + splitScope[1]);
+        if (schemaFilters != null) {
+            builder.addAll(extractor.apply(schemaFilters));
+        }
     }

     if (splitScope.length == 3) {
-        builder.addAll(extractor.apply(filterMap.get(scope)));
+        List<ClpMetadataFilter> tableFilters = filterMap.get(scope);
+        if (tableFilters != null) {
+            builder.addAll(extractor.apply(tableFilters));
+        }
     }

     return builder.build();
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpMetadataFilter>, Set<String>> extractor)
{
String[] splitScope = scope.split("\\.");
ImmutableSet.Builder<String> builder = ImmutableSet.builder();
builder.addAll(extractor.apply(filterMap.get(splitScope[0])));
if (splitScope.length > 1) {
builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
}
if (splitScope.length == 3) {
builder.addAll(extractor.apply(filterMap.get(scope)));
}
return builder.build();
}
private Set<String> collectColumnNamesFromScopes(
String scope,
Function<List<ClpMetadataFilter>, Set<String>> extractor)
{
if (scope == null || scope.isEmpty()) {
return ImmutableSet.of();
}
String[] splitScope = scope.split("\\.");
if (splitScope.length == 0) {
return ImmutableSet.of();
}
ImmutableSet.Builder<String> builder = ImmutableSet.builder();
List<ClpMetadataFilter> catalogFilters = filterMap.get(splitScope[0]);
if (catalogFilters != null) {
builder.addAll(extractor.apply(catalogFilters));
}
if (splitScope.length > 1) {
List<ClpMetadataFilter> schemaFilters = filterMap.get(splitScope[0] + "." + splitScope[1]);
if (schemaFilters != null) {
builder.addAll(extractor.apply(schemaFilters));
}
}
if (splitScope.length == 3) {
List<ClpMetadataFilter> tableFilters = filterMap.get(scope);
if (tableFilters != null) {
builder.addAll(extractor.apply(tableFilters));
}
}
return builder.build();
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/filter/ClpMetadataFilterProvider.java
around lines 143 to 159, the method collectColumnNamesFromScopes splits the
scope string and accesses elements without checking if the scope is null or
empty, and also does not verify if filterMap.get(...) returns null before
applying the extractor. Add null and empty checks for the scope string before
splitting, and add null checks for the results of filterMap.get(...) before
passing them to the extractor function to avoid potential NullPointerExceptions.

@gibber9809 gibber9809 self-requested a review July 29, 2025 21:08
@anlowee
Copy link
Author

anlowee commented Jul 29, 2025

@gibber9809 Addressed your comment and some of coderabbitai's comments.

@anlowee anlowee force-pushed the xwei/refactor-metadatafilter-1 branch from d3a8ff3 to 9289bd7 Compare August 14, 2025 21:47
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🔭 Outside diff range comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

16-33: Fix Checkstyle import-order violations to unblock the build

The pipeline flagged import order issues. Reorder regular and static imports and add the missing static import used below if you adopt the fail-fast suggestion. Example fix:

@@
-import com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterProvider;
-import com.facebook.presto.plugin.clp.split.filter.ClpMySqlSplitFilterProvider;
-import com.facebook.presto.plugin.clp.split.ClpMySqlSplitProvider;
-import com.facebook.presto.plugin.clp.split.ClpSplitProvider;
+import com.facebook.presto.plugin.clp.split.ClpMySqlSplitProvider;
+import com.facebook.presto.plugin.clp.split.ClpSplitProvider;
+import com.facebook.presto.plugin.clp.split.filter.ClpMySqlSplitFilterProvider;
+import com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterProvider;
@@
-import static com.facebook.presto.plugin.clp.ClpConfig.SplitFilterProviderType;
-import static com.facebook.presto.plugin.clp.ClpConfig.MetadataProviderType;
-import static com.facebook.presto.plugin.clp.ClpConfig.SplitProviderType;
+import static com.facebook.presto.plugin.clp.ClpConfig.MetadataProviderType;
+import static com.facebook.presto.plugin.clp.ClpConfig.SplitFilterProviderType;
+import static com.facebook.presto.plugin.clp.ClpConfig.SplitProviderType;
+import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_UNSUPPORTED_CONFIG_OPTION;
 import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_UNSUPPORTED_METADATA_SOURCE;
 import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_UNSUPPORTED_SPLIT_SOURCE;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

124-131: Log concrete strings instead of Optional wrappers

These debug logs currently print Optional[…]. Log the unwrapped strings for clarity.

-                log.debug("Metadata SQL query: %s", metadataSqlQuery);
+                log.debug("Metadata SQL query: %s", metadataSqlQuery.get());
@@
-            log.debug("KQL query: %s", kqlQuery);
+            log.debug("KQL query: %s", kqlQuery.get());
presto-docs/src/main/sphinx/connector/clp.rst (1)

27-36: Fix property name: use clp.split-filter-config instead of clp.metadata-filter-config

The example still shows the old key, which will mislead users.

-    clp.metadata-filter-config=/path/to/metadata-filter-config.json
+    clp.split-filter-config=/path/to/split-filter-config.json
♻️ Duplicate comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

48-50: Fail fast on unsupported split-filter provider type and align comparison style

Add an else branch like the other provider bindings and keep comparison style consistent with the rest of this class.

Apply:

-        if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) {
+        if (config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL) {
             binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON);
         }
+        else {
+            throw new PrestoException(
+                    CLP_UNSUPPORTED_CONFIG_OPTION,
+                    "Unsupported split filter provider type: " + config.getSplitFilterProviderType());
+        }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2)

66-71: Required-filter validation misses tables without FilterNodes (joins); track per-table filters and validate the remainder

Using a single hasVisitedFilter flag suppresses validation for all tables if any FilterNode is seen. In join plans where only some table scans have a FilterNode, required-filter checks are skipped for other tables. Track the set of tables that actually had a FilterNode and validate only the remainder after rewriting.

Apply these diffs:

  1. Validate only unfiltered table scopes in optimize(...):
-        // Throw exception if any required split filters are missing
-        if (!rewriter.tableScopeSet.isEmpty() && !rewriter.hasVisitedFilter) {
-            splitFilterProvider.checkContainsRequiredFilters(rewriter.tableScopeSet, "");
-        }
+        // Validate tables that did not have a FilterNode associated with them
+        if (!rewriter.tableScopeSet.isEmpty()) {
+            Set<String> unfilteredScopes = new HashSet<>(rewriter.tableScopeSet);
+            unfilteredScopes.removeAll(rewriter.tableScopesWithFilter);
+            if (!unfilteredScopes.isEmpty()) {
+                splitFilterProvider.checkContainsRequiredFilters(unfilteredScopes, "");
+            }
+        }
  1. Track table scopes that actually had a FilterNode:
     private class Rewriter
             extends ConnectorPlanRewriter<Void>
     {
         private final PlanNodeIdAllocator idAllocator;
         private final Set<String> tableScopeSet;
+        private final Set<String> tableScopesWithFilter;
         private boolean hasVisitedFilter;

         public Rewriter(PlanNodeIdAllocator idAllocator)
         {
             this.idAllocator = idAllocator;
             hasVisitedFilter = false;
             tableScopeSet = new HashSet<>();
+            tableScopesWithFilter = new HashSet<>();
         }
  1. Unify scope formatting and record scopes with filters in visitFilter(...):
-            String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString();
+            String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName());
+            tableScopesWithFilter.add(tableScope);

Note: with the above change, hasVisitedFilter becomes redundant; you can remove it in a follow-up cleanup to simplify state.

Also applies to: 76-86, 102-114


108-114: Use consistent scope formatting helper

Minor consistency fix: use format("%s.%s", ...) here as elsewhere to avoid subtle string differences and future drift.

-            String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString();
+            String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName());
presto-docs/src/main/sphinx/connector/clp.rst (4)

141-143: Minor grammar: remove superfluous comma

Readability polish.

-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent
+For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent

219-221: Singular “implementation” and remove extra comma

Consistency and grammar.

-If you prefer to use a different format for the ``customOptions``, you can provide your own implementations of the
-``ClpSplitFilterProvider`` interface, and configure the connector accordingly.
+If you prefer to use a different format for the ``customOptions``, you can provide your own implementation of the
+``ClpSplitFilterProvider`` interface and configure the connector accordingly.

151-154: Clarify enforcement even when no FilterNode is produced

Reflects the behavioural change noted in the PR narrative.

-  query. If a required filter is missing or cannot be added to the split filter query, the original query will be
-  rejected.
+  query. If a required filter is missing or cannot be added to the split filter query, the original query will be
+  rejected. This requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ... LIMIT 1``).

50-57: Use Presto type names and fix grammar

Align with Presto docs and remove the article mismatch.

-                                     When enabled, type annotations are added to conflicting field names to
-                                     distinguish between types. For example, if an ``id`` column appears with
-                                     both an ``int`` and ``string`` types, the connector will create two
-                                     columns named ``id_bigint`` and ``id_varchar``.
-
-                                     Supported type annotations include ``bigint``, ``varchar``, ``double``,
-                                     ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For
+                                     When enabled, type annotations are added to conflicting field names to
+                                     distinguish between types. For example, if an ``id`` column appears as
+                                     both ``BIGINT`` and ``VARCHAR``, the connector will create two columns
+                                     named ``id_bigint`` and ``id_varchar``.
+
+                                     Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``,
+                                     ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (see `Data Types`_ for details). For
                                      columns with only one type, the original column name is used.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d3a8ff3 and 9289bd7.

📒 Files selected for processing (14)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpCustomSplitFilterOptionsDeserializer.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (4 hunks)
  • presto-clp/src/test/resources/test-mysql-split-filter.json (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (9 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
🧬 Code Graph Analysis (9)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (51-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpCustomSplitFilterOptionsDeserializer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (51-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (34-158)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (51-175)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1)
  • Test (37-97)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (51-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (32-70)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (32-70)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (1)
  • Test (29-87)
🪛 GitHub Actions: maven checks
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java

[error] 20-20: Checkstyle: ImportOrder violation. Wrong order for import 'com.facebook.presto.plugin.clp.split.filter.ClpMySqlSplitFilterProvider'. Step: Maven Checkstyle plugin (checkstyle) for module presto-clp.


[error] 21-21: Checkstyle: ImportOrder violation. Wrong order for import 'com.facebook.presto.plugin.clp.split.ClpMySqlSplitProvider'. Step: Maven Checkstyle plugin (checkstyle) for module presto-clp.


[error] 29-29: Checkstyle: ImportOrder violation. Wrong order for import 'com.facebook.presto.plugin.clp.ClpConfig.MetadataProviderType'. Step: Maven Checkstyle plugin (checkstyle) for module presto-clp.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (8)
presto-clp/src/test/resources/test-mysql-split-filter.json (1)

15-19: Config nesting under customOptions looks correct for provider-driven deserialization

Moving rangeMapping under customOptions aligns with ClpSplitFilterConfig and the provider-specific options class. This should deserialize via the new CustomSplitFilterOptions deserializer as intended.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)

16-52: Wiring change to ClpSplitFilterProvider looks correct

Constructor injection and the physical plan optimizer wiring correctly pass the new provider through to ClpPlanOptimizer.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

18-18: Wiring to ClpSplitFilterProvider looks correct

Constructor injection, null-checks, and plan optimiser provider hand-off are consistent with the new provider. No functional concerns here.

Also applies to: 44-45, 54-55, 62-63, 68-69

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)

36-39: Confirm migration docs and deprecation path for old metadata-filter configs

This introduces new keys (split-filter-*) and removes/renames the old metadata-filter keys. Ensure docs, examples, and release notes clearly call out the migration path and failure modes when old keys are present.

Also applies to: 154-158, 195-198

presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (1)

52-63: Good coverage for numeric variants (ints, decimals, scientific), including negatives

The remapping assertions exercise operator combinations and numeric formats well. Looks solid.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)

143-159: Scope collector is clean; null-safety handled downstream

Use of extractor.apply on possibly-null lists is safe given the extractor methods return empty sets on null. Good readability and behaviour across catalog/schema/table levels.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (2)

82-96: Good coverage of empty and invalid config paths

Constructing the provider with no config and with an invalid path covers the error path (CLP_SPLIT_FILTER_CONFIG_NOT_FOUND). Assertions are precise.


53-66: Test validates required filters on the original (pre-remap) expression — consider adding a post-remap assertion if you change validation

Confirmed: the existing test (checkRequiredFilters) exercises presence checking against the original/pre-remap expression. ClpPlanOptimizer calls checkContainsRequiredFilters before remapping, while remapSplitFilterPushDownExpression performs the transform.

Files to note:

  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java — method checkRequiredFilters
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java — calls checkContainsRequiredFilters(...) before remap in visitFilter()
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java — remapSplitFilterPushDownExpression(...)

Optional suggestion (for future-proofing if you move validation to post-remap):
Add an assertion that the remapped SQL would also be accepted (only add if you change checkContainsRequiredFilters to validate post-remap):

String remapped = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", "("msg.timestamp" > 1234 AND "msg.timestamp" < 5678)");
filterProvider.checkContainsRequiredFilters(testTableScopeSet, remapped); // add only if validation is performed after remap

Note: with the current behaviour this assertion would fail (required column names are checked against the pre-remap names).

Comment on lines +159 to +164
@Config("clp.split-filter-config")
public ClpConfig setSplitFilterConfig(String splitFilterConfig)
{
this.splitFilterConfig = splitFilterConfig;
return this;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add @ConfigDescription to document the new config key

Annotate the setter to improve self-documentation and config help output.

-    @Config("clp.split-filter-config")
+    @Config("clp.split-filter-config")
+    @com.facebook.airlift.configuration.ConfigDescription("Path to the split-filter configuration JSON file")
     public ClpConfig setSplitFilterConfig(String splitFilterConfig)

If you prefer an explicit import, add:

import com.facebook.airlift.configuration.ConfigDescription;
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java around
lines 159 to 164, the setter for splitFilterConfig is missing a
@ConfigDescription annotation; add @ConfigDescription("description of what
clp.split-filter-config controls") above the setSplitFilterConfig method (and
add the import com.facebook.airlift.configuration.ConfigDescription if not
already present) so the config key is documented in help output and
self-documentation.

Comment on lines +171 to +176
@Config("clp.split-filter-provider-type")
public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType)
{
this.splitFilterProviderType = splitFilterProviderType;
return this;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add @ConfigDescription for provider type

Clarify what values are accepted and what it controls.

-    @Config("clp.split-filter-provider-type")
+    @Config("clp.split-filter-provider-type")
+    @com.facebook.airlift.configuration.ConfigDescription("Split-filter provider type (e.g., MYSQL)")
     public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Config("clp.split-filter-provider-type")
public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType)
{
this.splitFilterProviderType = splitFilterProviderType;
return this;
}
@Config("clp.split-filter-provider-type")
@com.facebook.airlift.configuration.ConfigDescription("Split-filter provider type (e.g., MYSQL)")
public ClpConfig setSplitFilterProviderType(SplitFilterProviderType splitFilterProviderType)
{
this.splitFilterProviderType = splitFilterProviderType;
return this;
}

Comment on lines 32 to 33
CLP_SPLIT_FILTER_CONFIG_NOT_FOUND(10, USER_ERROR),
CLP_MANDATORY_SPLIT_FILTER_NOT_VALID(11, USER_ERROR);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Renamed error codes align with split-filter refactor; verify all call sites updated

Names now match the split-filter terminology and keep the same numeric codes and types. Please ensure all throw sites and references have been migrated.

Run this quick search to catch any lingering references to the old constants or method names:


🏁 Script executed:

#!/bin/bash
# Find references to old metadata-filter error codes or method names
rg -n --fixed-strings \
  'CLP_METADATA_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_METADATA_FILTER_NOT_VALID|remapFilterSql'

Length of output: 117


🏁 Script executed:

#!/bin/bash
echo "Search old constants and remapFilterSql..."
rg -n --hidden --no-ignore-vcs -S 'CLP_METADATA_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_METADATA_FILTER_NOT_VALID|remapFilterSql' || echo "[no matches]"

echo "Search new constants (split)..."
rg -n --hidden --no-ignore-vcs -S 'CLP_SPLIT_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_SPLIT_FILTER_NOT_VALID' || echo "[no matches]"

echo "Search generic split/metadata/filter strings..."
rg -n --hidden --no-ignore-vcs -S 'splitFilter|SplitFilter|split-filter|SPLIT_FILTER|METADATA_FILTER|metadata-filter' || echo "[no matches]"

echo "Search remapFilter*..."
rg -n --hidden --no-ignore-vcs -S 'remapFilter|remap.*Filter' || echo "[no matches]"

echo "Search ClpErrorCode usages..."
rg -n --hidden --no-ignore-vcs -S 'ClpErrorCode' || echo "[no matches]"

echo "Print ClpErrorCode.java (if present)..."
if [ -f presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java ]; then
  sed -n '1,240p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
else
  echo "[file not found: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java]"
fi

Length of output: 300579


Split-filter rename verified — code updated; docs contain lingering metadata-filter-config

Checked the repository: old constants/method names (CLP_METADATA_FILTER_CONFIG_NOT_FOUND, CLP_MANDATORY_METADATA_FILTER_NOT_VALID, remapFilterSql) are not present; the new split-filter constants and methods are used.

Verified usages (non-exhaustive):

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java — CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, CLP_MANDATORY_SPLIT_FILTER_NOT_VALID
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java — imports and throws using the new CLP_SPLIT_* codes
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java — remapSplitFilterPushDownExpression implementation
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java — calls splitFilterProvider.remapSplitFilterPushDownExpression(...)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java — clp.split-filter-config property and SplitFilterProviderType
  • Tests: presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig*.java — exercise remapSplitFilterPushDownExpression

Needs attention:

  • presto-docs/src/main/sphinx/connector/clp.rst: line ~33 still references clp.metadata-filter-config=/path/to/metadata-filter-config.json — update documentation to use the new clp.split-filter-config (or otherwise reconcile the naming) so docs match the refactor.
🤖 Prompt for AI Agents
In presto-docs/src/main/sphinx/connector/clp.rst around line ~33, the docs still
reference the old property name clp.metadata-filter-config; update all
occurrences to the new property clp.split-filter-config (including example usage
clp.split-filter-config=/path/to/split-filter-config.json), update any
explanatory text to use "split filter" terminology (and the corresponding file
name in examples), and run a quick repo-wide search/update for any remaining
"metadata-filter-config" references in the docs to ensure consistency with the
refactor.

Comment on lines 16 to 49
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;

import java.io.IOException;

import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;

/**
* Uses the given implementation of {@link CustomSplitFilterOptions} to deserialize the
* {@code "customOptions"} field in the filter. The implementation is originally chosen
* by the {@code clp.split-filter-provider-type} config option.
*/
public class ClpCustomSplitFilterOptionsDeserializer
extends JsonDeserializer<CustomSplitFilterOptions>
{
private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;

public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
}

@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
ObjectNode node = p.getCodec().readTree(p);
ObjectMapper mapper = (ObjectMapper) p.getCodec();

return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle null values and avoid narrowing casts in deserializer

Make deserialization resilient to null customOptions and avoid assuming the codec is an ObjectMapper. Also validate the constructor argument.

@@
-import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.core.ObjectCodec;
@@
-import com.fasterxml.jackson.databind.JsonDeserializer;
-import com.fasterxml.jackson.databind.ObjectMapper;
-import com.fasterxml.jackson.databind.node.ObjectNode;
+import com.fasterxml.jackson.databind.JsonDeserializer;
+import com.fasterxml.jackson.databind.JsonNode;
@@
 import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
+import static java.util.Objects.requireNonNull;
@@
     public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
     {
-        this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+        this.actualCustomSplitFilterOptionsClass = requireNonNull(
+                actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
     }
@@
     @Override
     public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
     {
-        ObjectNode node = p.getCodec().readTree(p);
-        ObjectMapper mapper = (ObjectMapper) p.getCodec();
-
-        return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
+        if (p.getCurrentToken() == JsonToken.VALUE_NULL) {
+            return null;
+        }
+        JsonNode node = p.getCodec().readTree(p);
+        ObjectCodec codec = p.getCodec();
+        return codec.treeToValue(node, actualCustomSplitFilterOptionsClass);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.IOException;
import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
/**
* Uses the given implementation of {@link CustomSplitFilterOptions} to deserialize the
* {@code "customOptions"} field in the filter. The implementation is originally chosen
* by the {@code clp.split-filter-provider-type} config option.
*/
public class ClpCustomSplitFilterOptionsDeserializer
extends JsonDeserializer<CustomSplitFilterOptions>
{
private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;
public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
}
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
ObjectNode node = p.getCodec().readTree(p);
ObjectMapper mapper = (ObjectMapper) p.getCodec();
return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
}
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.core.ObjectCodec;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.JsonNode;
import java.io.IOException;
import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
import static java.util.Objects.requireNonNull;
/**
* Uses the given implementation of {@link CustomSplitFilterOptions} to deserialize the
* {@code "customOptions"} field in the filter. The implementation is originally chosen
* by the {@code clp.split-filter-provider-type} config option.
*/
public class ClpCustomSplitFilterOptionsDeserializer
extends JsonDeserializer<CustomSplitFilterOptions>
{
private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;
public ClpCustomSplitFilterOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
this.actualCustomSplitFilterOptionsClass = requireNonNull(
actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
}
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
if (p.getCurrentToken() == JsonToken.VALUE_NULL) {
return null;
}
JsonNode node = p.getCodec().readTree(p);
ObjectCodec codec = p.getCodec();
return codec.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpCustomSplitFilterOptionsDeserializer.java
around lines 16-49, validate that the constructor argument
actualCustomSplitFilterOptionsClass is non-null, avoid the unsafe cast to
ObjectMapper by using the JsonParser's ObjectCodec, and make deserialization
handle null customOptions: obtain ObjectCodec codec = p.getCodec(); read the
tree into a JsonNode, return null immediately if the node is null or represents
JSON null, and call codec.treeToValue(node, actualCustomSplitFilterOptionsClass)
to produce the result; also throw an informative IllegalArgumentException in the
constructor if the provided class is null.

Comment on lines +63 to +71
Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> mappings = new HashMap<>(getAllMappingsFromFilters(filterMap.get(splitScope[0])));

if (1 < splitScope.length) {
mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1])));
}

if (3 == splitScope.length) {
mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope)));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Scope aggregation logic is fine; minor style tweak optional

Current logic correctly layers catalog → schema → table mappings. You can simplify the conditions slightly for readability, but this is optional.

-        if (1 < splitScope.length) {
+        if (splitScope.length > 1) {
             mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1])));
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> mappings = new HashMap<>(getAllMappingsFromFilters(filterMap.get(splitScope[0])));
if (1 < splitScope.length) {
mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1])));
}
if (3 == splitScope.length) {
mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope)));
}
Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> mappings = new HashMap<>(getAllMappingsFromFilters(filterMap.get(splitScope[0])));
if (splitScope.length > 1) {
mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1])));
}
if (3 == splitScope.length) {
mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope)));
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java
around lines 63 to 71, the existing numeric comparisons (1 < splitScope.length
and 3 == splitScope.length) are correct but slightly unconventional; change them
to the clearer forms splitScope.length >= 2 and splitScope.length == 3, and keep
the same mapping.putAll calls so the catalog→schema→table layering remains
unchanged for readability and consistency.

Comment on lines +32 to +33
public class ClpSplitFilterConfig
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Harden JSON parsing against future/unknown fields

Add JsonIgnoreProperties(ignoreUnknown = true) to make the config resilient to forward-compatible changes in the JSON.

Apply this diff:

-public class ClpSplitFilterConfig
+@com.fasterxml.jackson.annotation.JsonIgnoreProperties(ignoreUnknown = true)
+public class ClpSplitFilterConfig

If you prefer explicit imports, add this import at the top of the file:

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java
around lines 32-33, the class lacks a JsonIgnoreProperties(ignoreUnknown = true)
annotation which would make JSON deserialization resilient to added/unknown
fields; add the annotation to the class declaration (and the import
com.fasterxml.jackson.annotation.JsonIgnoreProperties if explicit imports are
used) so Jackson will ignore unknown properties during parsing.

Comment on lines +34 to +36
@JsonProperty("columnName")
public String columnName;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Make columnName explicitly required in JSON

Column name is fundamental for remapping and validation. Mark it as required to fail fast on malformed configs.

-    @JsonProperty("columnName")
+    @JsonProperty(value = "columnName", required = true)
     public String columnName;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@JsonProperty("columnName")
public String columnName;
@JsonProperty(value = "columnName", required = true)
public String columnName;
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java
around lines 34-36, mark the columnName JSON property as required by changing
the @JsonProperty annotation to include required = true and add a null
check/validation so deserialization fails fast (e.g., throw
IllegalArgumentException or use Objects.requireNonNull in the constructor or a
@PostConstruct/validation method); this ensures malformed configs without
columnName are rejected immediately.

Comment on lines +105 to +122
public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
{
boolean hasRequiredSplitFilterColumns = true;
ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
for (String tableScope : tableScopeSet) {
for (String columnName : getRequiredColumnNames(tableScope)) {
if (!splitFilterPushDownExpression.contains(columnName)) {
hasRequiredSplitFilterColumns = false;
notFoundListBuilder.add(columnName);
}
}
}
if (!hasRequiredSplitFilterColumns) {
throw new PrestoException(
CLP_MANDATORY_SPLIT_FILTER_NOT_VALID,
notFoundListBuilder.build() + " is a mandatory split filter column but not valid");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Required-filter detection via String.contains is error-prone (false positives/negatives)

Using contains(columnName) can:

  • false-positive: "id" matches in "user_id"
  • false-negative: different quoting/casing or dereferencing variants

Also, this NPEs if splitFilterPushDownExpression is null. Switch to a robust, null-safe check using regex with quoting and deduplicate missing columns for a clearer error.

-    public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
+    public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
     {
-        boolean hasRequiredSplitFilterColumns = true;
-        ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
+        boolean hasRequiredSplitFilterColumns = true;
+        // Deduplicate to avoid repeated column names across multiple tables/scopes
+        ImmutableSet.Builder<String> notFoundSetBuilder = ImmutableSet.builder();
+        String expr = (splitFilterPushDownExpression == null) ? "" : splitFilterPushDownExpression;
         for (String tableScope : tableScopeSet) {
             for (String columnName : getRequiredColumnNames(tableScope)) {
-                if (!splitFilterPushDownExpression.contains(columnName)) {
+                if (!containsRequiredColumn(expr, columnName)) {
                     hasRequiredSplitFilterColumns = false;
-                    notFoundListBuilder.add(columnName);
+                    notFoundSetBuilder.add(columnName);
                 }
             }
         }
         if (!hasRequiredSplitFilterColumns) {
             throw new PrestoException(
                     CLP_MANDATORY_SPLIT_FILTER_NOT_VALID,
-                    notFoundListBuilder.build() + " is a mandatory split filter column but not valid");
+                    "Required split filter columns missing from predicate: " + notFoundSetBuilder.build());
         }
     }

Add this helper within the same class (outside the selected range):

// Performs a strict match on a quoted identifier to avoid substring false-positives.
// Example match: "\"msg.timestamp\"" in the expression.
private static boolean containsRequiredColumn(String expression, String columnName)
{
    // Match the exact quoted identifier; Pattern.quote handles dots and other meta chars.
    String regex = "\\Q\"" + columnName + "\"\\E";
    return java.util.regex.Pattern.compile(regex).matcher(expression).find();
}

Comment on lines +65 to +86
private <T> void testRange(T lowerBound, T upperBound, ClpMySqlSplitFilterProvider filterProvider)
{
String splitFilterSql1 = format("(\"msg.timestamp\" > %s AND \"msg.timestamp\" < %s)", lowerBound, upperBound);
String remappedSql1 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql1);
assertEquals(remappedSql1, format("(end_timestamp > %s AND begin_timestamp < %s)", lowerBound, upperBound));

String splitFilterSql2 = format("(\"msg.timestamp\" >= %s AND \"msg.timestamp\" <= %s)", lowerBound, upperBound);
String remappedSql2 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql2);
assertEquals(remappedSql2, format("(end_timestamp >= %s AND begin_timestamp <= %s)", lowerBound, upperBound));

String splitFilterSql3 = format("(\"msg.timestamp\" > %s AND \"msg.timestamp\" <= %s)", lowerBound, upperBound);
String remappedSql3 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql3);
assertEquals(remappedSql3, format("(end_timestamp > %s AND begin_timestamp <= %s)", lowerBound, upperBound));

String splitFilterSql4 = format("(\"msg.timestamp\" >= %s AND \"msg.timestamp\" < %s)", lowerBound, upperBound);
String remappedSql4 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql4);
assertEquals(remappedSql4, format("(end_timestamp >= %s AND begin_timestamp < %s)", lowerBound, upperBound));

String splitFilterSql5 = format("(\"msg.timestamp\" = %s)", lowerBound);
String remappedSql5 = filterProvider.remapSplitFilterPushDownExpression("clp.default.table_1", splitFilterSql5);
assertEquals(remappedSql5, format("((begin_timestamp <= %s AND end_timestamp >= %s))", lowerBound, lowerBound));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Optional: Parameterise to reduce duplication and add edge-case scenarios

Consider a TestNG DataProvider to drive the operator combinations and add a couple of edge cases (e.g., whitespace around operators, mixed-case column quoting), keeping assertions concise.

🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java
around lines 65 to 86, the testRange method repeats nearly identical assertions
for multiple operator combinations; convert these into a TestNG data-driven test
by adding a @DataProvider that yields tuples (inputSqlTemplate,
expectedSqlTemplate, lowerBound, upperBound) including edge-case rows for
whitespace around operators and mixed-case/quoted column names, then change the
test method to accept parameters and annotate it with @Test(dataProvider =
"yourProviderName"); inside the test simply call
filterProvider.remapSplitFilterPushDownExpression and assertEquals against the
formatted expected string to eliminate duplication and keep assertions concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

52-64: Nit: use consistent enum comparison style throughout the class

Above (Lines 48–50) you use the constant-first style (Type.X == value). For consistency, align the two conditions below to the same style.

Apply this diff:

-        if (config.getMetadataProviderType() == MetadataProviderType.MYSQL) {
+        if (MetadataProviderType.MYSQL == config.getMetadataProviderType()) {
             binder.bind(ClpMetadataProvider.class).to(ClpMySqlMetadataProvider.class).in(Scopes.SINGLETON);
         }
         else {
             throw new PrestoException(CLP_UNSUPPORTED_METADATA_SOURCE, "Unsupported metadata provider type: " + config.getMetadataProviderType());
         }

-        if (config.getSplitProviderType() == SplitProviderType.MYSQL) {
+        if (SplitProviderType.MYSQL == config.getSplitProviderType()) {
             binder.bind(ClpSplitProvider.class).to(ClpMySqlSplitProvider.class).in(Scopes.SINGLETON);
         }
         else {
             throw new PrestoException(CLP_UNSUPPORTED_SPLIT_SOURCE, "Unsupported split provider type: " + config.getSplitProviderType());
         }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9289bd7 and c72c0dc.

📒 Files selected for processing (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (34-158)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (51-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)

21-23: Imports for split-filter provider look correct

The new imports align with the refactor and are used below for conditional binding.


29-29: Static import for SplitFilterProviderType is appropriate

Used directly in the new conditional; no issues.

@anlowee
Copy link
Author

anlowee commented Aug 15, 2025

Finished the E2E testing again, and should be ready for review.
The main change is:

  1. Rename the "metadataProviderSpecific" to "customOptions"
  2. Rename metadata filter to split filter
  3. Add a new config option clp.split-filter-provider-type

@anlowee anlowee requested a review from kirkrodrigues August 15, 2025 14:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
presto-docs/src/main/sphinx/connector/clp.rst (1)

102-105: Unify dash style and remove superfluous comma.

Replace double hyphens with an em dash for readability and remove the extra comma before “and configure”.

Apply this diff:

-If you prefer to use a different source--or the same source with a custom implementation--you can provide your own
-implementations of the ``ClpMetadataProvider`` and ``ClpSplitProvider`` interfaces, and configure the connector
+If you prefer to use a different source—or the same source with a custom implementation—you can provide your own
+implementations of the ``ClpMetadataProvider`` and ``ClpSplitProvider`` interfaces and configure the connector
 accordingly.
♻️ Duplicate comments (4)
presto-docs/src/main/sphinx/connector/clp.rst (4)

142-144: Remove superfluous comma after the parenthetical.

Minor grammar polish for smoother reading.

Apply this diff:

-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent
+For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent

220-222: Singular “implementation” and remove extra comma.

Matches wording used elsewhere and reads more cleanly.

Apply this diff:

-If you prefer to use a different format for the ``customOptions``, you can provide your own implementations of the
-``ClpSplitFilterProvider`` interface, and configure the connector accordingly.
+If you prefer to use a different format for the ``customOptions``, you can provide your own implementation of the
+``ClpSplitFilterProvider`` interface and configure the connector accordingly.

152-154: Clarify enforcement of required filters even when no FilterNode is produced.

Docs should reflect that required-filter checking still occurs (e.g., LIMIT-only queries).

Apply this diff:

-  query. If a required filter is missing or cannot be added to the split filter query, the original query will be
-  rejected.
+  query. If a required filter is missing or cannot be added to the split filter query, the original query will be
+  rejected. This requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ...
+  LIMIT 1``).

52-58: Use Presto type names and tighten grammar.

  • Prefer Presto type names (uppercase) for consistency across the docs.
  • Fix “both an int and string types” grammar.
  • Standardise casing in the “See …” parenthetical.

Apply this diff:

-                                     distinguish between types. For example, if an ``id`` column appears with
-                                     both an ``int`` and ``string`` types, the connector will create two
+                                     distinguish between types. For example, if an ``id`` column appears with
+                                     both ``BIGINT`` and ``VARCHAR`` types, the connector will create two
                                      columns named ``id_bigint`` and ``id_varchar``.
@@
-                                     Supported type annotations include ``bigint``, ``varchar``, ``double``,
-                                     ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For
+                                     Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``,
+                                     ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (see `Data Types`_ for details). For
                                      columns with only one type, the original column name is used.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c72c0dc and bb68029.

📒 Files selected for processing (1)
  • presto-docs/src/main/sphinx/connector/clp.rst (9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (4)
presto-docs/src/main/sphinx/connector/clp.rst (4)

139-141: Clear explanation of mapping from data columns to metadata columns.

The phrasing crisply distinguishes data columns vs. metadata columns and sets reader expectations for remapping.


192-197: Good alignment with provider-specific options.

The example correctly nests MySQL-specific rangeMapping under customOptions with clear bounds.


226-231: Scope and examples are appropriate.

The supported-expression list is concrete and avoids over-promising. Looks good.


236-251: Data type mapping table reads cleanly and is consistent with Presto types.

The mappings are correct and the unsupported catch-all helps set expectations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
presto-docs/src/main/sphinx/connector/clp.rst (3)

142-143: Remove superfluous comma after parenthetical

The comma after the parenthetical breaks the flow.

-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent
+For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent

152-154: Document enforcement when no FilterNode is produced

The PR enforces required filters even when the planner does not produce a FilterNode (e.g., SELECT * ... LIMIT 1). Make this explicit to avoid surprises.

-  query. If a required filter is missing or cannot be added to the split filter query, the original query will be
-  rejected.
+  query. If a required filter is missing or cannot be added to the split-filter query, the original query will be
+  rejected. This requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ... LIMIT 1``).

52-58: Use Presto type names and fix article/pluralisation; align with Data Types section

  • Replace informal types with Presto types and correct grammar.
  • Keep terminology consistent with the Data Types table (use uppercase and ARRAY(...) notation).
-                                     distinguish between types. For example, if an ``id`` column appears with
-                                     both an ``int`` and ``string`` types, the connector will create two
+                                     distinguish between types. For example, if an ``id`` column appears as
+                                     both ``BIGINT`` and ``VARCHAR`` types, the connector will create two
                                      columns named ``id_bigint`` and ``id_varchar``.
 
-                                     Supported type annotations include ``bigint``, ``varchar``, ``double``,
-                                     ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For
+                                     Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``,
+                                     ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (see `Data Types`_ for details). For
                                      columns with only one type, the original column name is used.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bb68029 and 4efe0b1.

📒 Files selected for processing (1)
  • presto-docs/src/main/sphinx/connector/clp.rst (9 hunks)
🔇 Additional comments (1)
presto-docs/src/main/sphinx/connector/clp.rst (1)

112-116: Nice clarification of the split-filter flow

Clear, concise explanation of how split filters reduce scanned data and how predicates are translated.

@anlowee anlowee requested a review from gibber9809 August 18, 2025 14:04
Copy link

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor nit agreeing with a rabbit comment, but otherwise looks good.

@anlowee anlowee requested a review from gibber9809 August 20, 2025 16:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)

32-35: Split-filter error codes added/renamed — numeric IDs stable; looks good

The new constants and their ErrorType choices are appropriate, and IDs remain stable. No concerns here.

Optional: run a quick scan for any lingering references to the old metadata-filter constants or property names in code/docs.

#!/bin/bash
# 1) Ensure old metadata-filter error codes/usages are gone
rg -n --hidden --no-ignore-vcs -S 'CLP_METADATA_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_METADATA_FILTER_NOT_VALID|remapFilterSql' || true

# 2) Ensure docs/examples use split-filter terminology
rg -n --hidden --no-ignore-vcs -S 'metadata-filter-config' || true

# 3) Sanity-check new constants are referenced
rg -n --hidden --no-ignore-vcs -S 'CLP_SPLIT_FILTER_CONFIG_NOT_FOUND|CLP_MANDATORY_SPLIT_FILTER_NOT_VALID|CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE' || true
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4efe0b1 and ee4ea9b.

📒 Files selected for processing (2)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (34-158)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (51-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

21-23: Imports align with split-filter wiring

Correct classes and static imports for the new split-filter provider and error code. All good.

Also applies to: 29-33

Comment on lines +49 to +55
if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) {
binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON);
}
else {
throw new PrestoException(CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE, "Unsupported split filter provider type: " + config.getSplitFilterProviderType());
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Fail-fast on unsupported split-filter provider type — good; align comparison style with rest of file

The explicit PrestoException improves debuggability. For consistency with the other provider checks in this class, favour config.getX() == Type.Y comparison style.

Apply:

-        if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) {
+        if (config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL) {
             binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON);
         }
         else {
             throw new PrestoException(CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE, "Unsupported split filter provider type: " + config.getSplitFilterProviderType());
         }

Note: with only MYSQL in SplitFilterProviderType today, the else branch is currently unreachable; keeping it is fine for future-proofing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType()) {
binder.bind(ClpSplitFilterProvider.class).to(ClpMySqlSplitFilterProvider.class).in(Scopes.SINGLETON);
}
else {
throw new PrestoException(CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE, "Unsupported split filter provider type: " + config.getSplitFilterProviderType());
}
if (config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL) {
binder.bind(ClpSplitFilterProvider.class)
.to(ClpMySqlSplitFilterProvider.class)
.in(Scopes.SINGLETON);
}
else {
throw new PrestoException(
CLP_UNSUPPORTED_SPLIT_FILTER_SOURCE,
"Unsupported split filter provider type: " + config.getSplitFilterProviderType());
}
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java around
lines 49 to 55, change the equality check style to match the rest of the file by
using config.getSplitFilterProviderType() == SplitFilterProviderType.MYSQL
instead of SplitFilterProviderType.MYSQL == config.getSplitFilterProviderType();
keep the existing binder.bind(...) for MYSQL and preserve the else branch that
throws the PrestoException so unsupported types still fail fast.

gibber9809
gibber9809 previously approved these changes Aug 20, 2025
Copy link

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. For PR title how about

refactor: Extract split filter provider interface to improve flexibility for user customization.

@anlowee anlowee changed the title refactor: Extract metadata filter config provider interface to improve flexibility for user customization. refactor: Extract split filter provider interface to improve flexibility for user customization. Aug 20, 2025
* {@code "customOptions"} field in the filter. The implementation is originally chosen
* by the {@code clp.split-filter-provider-type} config option.
*/
public class ClpCustomSplitFilterOptionsDeserializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ClpSplitFilterConfigrCustomOptionsDeserializer? It's long, but fits better with ClpSplitFilterConfig.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (2)

16-17: Import JsonIgnoreProperties to harden forward-compatibility

Adds resilience to unknown fields.

 import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonIgnoreProperties;

32-36: Ignore unknown fields and make columnName required

Fail fast on malformed configs; ignore extra fields for forward compatibility.

-public class ClpSplitFilterConfig
+@JsonIgnoreProperties(ignoreUnknown = true)
+public class ClpSplitFilterConfig
 {
-    @JsonProperty("columnName")
+    @JsonProperty(value = "columnName", required = true)
     public String columnName;
presto-docs/src/main/sphinx/connector/clp.rst (3)

53-58: Use Presto type names and fix grammar

Aligns terminology and fixes pluralisation.

-                                     both an ``int`` and ``string`` types, the connector will create two
+                                     both ``BIGINT`` and ``VARCHAR`` types, the connector will create two
                                      columns named ``id_bigint`` and ``id_varchar``.
 
-                                     Supported type annotations include ``bigint``, ``varchar``, ``double``,
-                                     ``boolean``, and ``array(varchar)`` (See `Data Types`_ for details). For
+                                     Supported type annotations include ``BIGINT``, ``VARCHAR``, ``DOUBLE``,
+                                     ``BOOLEAN``, and ``ARRAY(VARCHAR)`` (See `Data Types`_ for details). For
                                      columns with only one type, the original column name is used.

142-143: Minor grammar: remove superfluous comma

Improves readability.

-For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent
+For example, an integer data column (e.g., ``timestamp``) may be remapped to a pair of metadata columns that represent

152-154: Document enforcement when no FilterNode is produced

Reflects the behavioural change so users know queries without filters are still validated.

- - ``required`` *(optional, defaults to false)*: Whether the filter **must** be present in the generated metadata query.
-  If a required filter is missing or cannot be added to the metadata query, the original query will be rejected.
+ - ``required`` *(optional, defaults to false)*: Whether the filter **must** be present in the generated metadata query.
+  If a required filter is missing or cannot be added to the metadata query, the original query will be rejected. This
+  requirement is enforced even for queries that do not produce a ``FilterNode`` (e.g., ``SELECT * ... LIMIT 1``).
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)

101-118: Required-filter detection via String.contains is brittle and null-unsafe

False positives (“id” in “user_id”), case/quoting variants, and NPE if expression is null.

-    public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
-    {
-        boolean hasRequiredSplitFilterColumns = true;
-        ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
-        for (String tableScope : tableScopeSet) {
-            for (String columnName : getRequiredColumnNames(tableScope)) {
-                if (!splitFilterPushDownExpression.contains(columnName)) {
-                    hasRequiredSplitFilterColumns = false;
-                    notFoundListBuilder.add(columnName);
-                }
-            }
-        }
-        if (!hasRequiredSplitFilterColumns) {
-            throw new PrestoException(
-                    CLP_MANDATORY_SPLIT_FILTER_NOT_VALID,
-                    notFoundListBuilder.build() + " is a mandatory split filter column but not valid");
-        }
-    }
+    public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
+    {
+        boolean hasAll = true;
+        ImmutableSet.Builder<String> missing = ImmutableSet.builder();
+        final String expr = (splitFilterPushDownExpression == null) ? "" : splitFilterPushDownExpression;
+        for (String tableScope : tableScopeSet) {
+            for (String columnName : getRequiredColumnNames(tableScope)) {
+                if (!containsRequiredColumn(expr, columnName)) {
+                    hasAll = false;
+                    missing.add(columnName);
+                }
+            }
+        }
+        if (!hasAll) {
+            throw new PrestoException(
+                    CLP_MANDATORY_SPLIT_FILTER_NOT_VALID,
+                    "Required split filter columns missing from predicate: " + missing.build());
+        }
+    }

Add this helper in the class:

// Performs a strict match on a quoted identifier to avoid substring false-positives.
private static boolean containsRequiredColumn(String expression, String columnName)
{
    String regex = "\\Q\"" + columnName + "\"\\E";
    return java.util.regex.Pattern.compile(regex).matcher(expression).find();
}

Quick check:

#!/bin/bash
# Sanity check: ensure the helper is present and method updated
rg -n 'containsRequiredColumn\(' -C2
rg -n 'CLP_MANDATORY_SPLIT_FILTER_NOT_VALID' -C2
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2)

63-69: Style: conventional length checks improve readability

No behaviour change.

-        if (1 < splitScope.length) {
+        if (splitScope.length > 1) {
             mappings.putAll(getAllMappingsFromFilters(filterMap.get(splitScope[0] + "." + splitScope[1])));
         }
 
-        if (3 == splitScope.length) {
+        if (splitScope.length == 3) {
             mappings.putAll(getAllMappingsFromFilters(filterMap.get(scope)));
         }

71-84: Quote column names in regex and tolerate flexible whitespace; handle null input

Unescaped dots in keys (e.g., msg.timestamp) act as wildcards and will mis-match; also avoid NPE when there’s no predicate to remap.

-        String remappedSql = pushDownExpression;
+        if (pushDownExpression == null) {
+            return null;
+        }
+        String remappedSql = pushDownExpression;
         for (Map.Entry<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> entry : mappings.entrySet()) {
             String key = entry.getKey();
             ClpMySqlCustomSplitFilterOptions.RangeMapping value = entry.getValue();
-            remappedSql = remappedSql.replaceAll(
-                    format("\"(%s)\"\\s(>=?)\\s(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", key),
-                    format("%s $2 $3", value.upperBound));
-            remappedSql = remappedSql.replaceAll(
-                    format("\"(%s)\"\\s(<=?)\\s(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", key),
-                    format("%s $2 $3", value.lowerBound));
-            remappedSql = remappedSql.replaceAll(
-                    format("\"(%s)\"\\s(=)\\s(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", key),
-                    format("(%s <= $3 AND %s >= $3)", value.lowerBound, value.upperBound));
+            String qKey = java.util.regex.Pattern.quote(key);
+            remappedSql = remappedSql.replaceAll(
+                    format("\"(%s)\"\\s*(>=?)\\s*(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", qKey),
+                    format("%s $2 $3", value.upperBound));
+            remappedSql = remappedSql.replaceAll(
+                    format("\"(%s)\"\\s*(<=?)\\s*(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", qKey),
+                    format("%s $2 $3", value.lowerBound));
+            remappedSql = remappedSql.replaceAll(
+                    format("\"(%s)\"\\s*(=)\\s*(-?[0-9]+(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?)", qKey),
+                    format("(%s <= $3 AND %s >= $3)", value.lowerBound, value.upperBound));
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ee4ea9b and df37d2a.

📒 Files selected for processing (5)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build

Comment on lines +94 to +105
private Map<String, ClpMySqlCustomSplitFilterOptions.RangeMapping> getAllMappingsFromFilters(List<ClpSplitFilterConfig> filters)
{
return null != filters
? filters.stream()
.filter(filter ->
filter.customOptions instanceof ClpMySqlCustomSplitFilterOptions &&
((ClpMySqlCustomSplitFilterOptions) filter.customOptions).rangeMapping != null)
.collect(toImmutableMap(
filter -> filter.columnName,
filter -> ((ClpMySqlCustomSplitFilterOptions) filter.customOptions).rangeMapping))
: ImmutableMap.of();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Validate mapped metadata column identifiers to mitigate injection/config errors

Range-mapped metadata column names are interpolated into SQL. Validate they match a safe identifier pattern before use, or reject the config.

I recommend validating lowerBound/upperBound against a safe identifier regex (e.g., reuse ClpConfig.SAFE_SQL_TABLE_NAME_PATTERN or define a SAFE_SQL_IDENTIFIER_PATTERN) during config load and throwing a PrestoException if invalid. Would you like me to draft this validation?

Comment on lines 16 to 25
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;

import java.io.IOException;

import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use JsonNode/ObjectCodec and add requireNonNull import; remove unnecessary ObjectMapper cast

Avoid casting the codec to ObjectMapper (can throw ClassCastException with alternate codecs) and don’t assume an ObjectNode.

 import com.fasterxml.jackson.core.JsonParser;
 import com.fasterxml.jackson.databind.DeserializationContext;
 import com.fasterxml.jackson.databind.JsonDeserializer;
-import com.fasterxml.jackson.databind.ObjectMapper;
-import com.fasterxml.jackson.databind.node.ObjectNode;
+import com.fasterxml.jackson.databind.JsonNode;
 
 import java.io.IOException;
 
 import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
+import static java.util.Objects.requireNonNull;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.IOException;
import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.JsonNode;
import java.io.IOException;
import static com.facebook.presto.plugin.clp.split.filter.ClpSplitFilterConfig.CustomSplitFilterOptions;
import static java.util.Objects.requireNonNull;
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java
around lines 16 to 25, remove the unsafe cast to ObjectMapper and the assumption
that the codec returns an ObjectNode: use parser.getCodec() (ObjectCodec) and
call readTree(parser) to obtain a JsonNode, import and use
java.util.Objects.requireNonNull to assert the node is not null, then use the
codec (or ObjectMapper obtained safely via ObjectCodec if needed) to convert the
JsonNode to ClpSplitFilterConfig.CustomSplitFilterOptions via treeToValue or
equivalent; also add the requireNonNull import and remove any ObjectMapper cast
to avoid ClassCastException and unsafe ObjectNode assumptions.

Comment on lines 31 to 39
public class ClpSplitFilterConfigrCustomOptionsDeserializer
extends JsonDeserializer<CustomSplitFilterOptions>
{
private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;

public ClpSplitFilterConfigrCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix class name typo, make constructor null-safe, and rename the file accordingly

The public class name is misspelled (“Configr”). This leaks into the API and downstream references. Also, guard against null target type.

Apply this diff and rename the file to ClpSplitFilterCustomOptionsDeserializer.java:

-public class ClpSplitFilterConfigrCustomOptionsDeserializer
+public class ClpSplitFilterCustomOptionsDeserializer
         extends JsonDeserializer<CustomSplitFilterOptions>
 {
     private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;
 
-    public ClpSplitFilterConfigrCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
+    public ClpSplitFilterCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
     {
-        this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+        this.actualCustomSplitFilterOptionsClass = requireNonNull(actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
     }

Run this to find and update all references:


🏁 Script executed:

#!/bin/bash
# Find references to the old, misspelled class name
rg -n 'ClpSplitFilterConfigrCustomOptionsDeserializer' -S -C2

Length of output: 2550


🏁 Script executed:

#!/bin/bash
# Find any occurrence of the typo 'Configr' across the repository, including imports and docs
rg -n 'Configr'

Length of output: 665


🏁 Script executed:

#!/bin/bash
# Display the top 20 lines to check for existing imports
sed -n '1,20p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java

Length of output: 1022


Apply critical fixes to correct the class name, guard against null, and update all references

The class ClpSplitFilterConfigrCustomOptionsDeserializer is misspelled, which leaks into the public API. We should also make the constructor parameter null-safe and ensure imports are added for requireNonNull. Specifically:

• Rename the Java file
– From
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java
– To
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterCustomOptionsDeserializer.java

• In that file, update the class and constructor names:

-public class ClpSplitFilterConfigrCustomOptionsDeserializer
+public class ClpSplitFilterCustomOptionsDeserializer
         extends JsonDeserializer<CustomSplitFilterOptions>
 {
     private final Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass;

-    public ClpSplitFilterConfigrCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
+    public ClpSplitFilterCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
     {
-        this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+        this.actualCustomSplitFilterOptionsClass =
+            requireNonNull(actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
     }

• Add the static import at the top of the file for null checks:

import static java.util.Objects.requireNonNull;

• Update all call sites (e.g., in ClpSplitFilterProvider.java):

- new ClpSplitFilterConfigrCustomOptionsDeserializer(getCustomSplitFilterOptionsClass()));
+ new ClpSplitFilterCustomOptionsDeserializer(getCustomSplitFilterOptionsClass()));

After these changes, rebuild and run the ripgrep search again to confirm there are no remaining references to the old, misspelled name.

🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter around
lines 31-39, rename the file and class from
ClpSplitFilterConfigrCustomOptionsDeserializer to
ClpSplitFilterCustomOptionsDeserializer, change the constructor to match the new
class name, add a static import "import static
java.util.Objects.requireNonNull;" at the top, and guard the constructor
parameter with this.actualCustomSplitFilterOptionsClass =
requireNonNull(actualCustomSplitFilterOptionsClass,
"actualCustomSplitFilterOptionsClass");; then update all call sites (e.g.,
ClpSplitFilterProvider and any other references) to use the new class name,
rebuild and run ripgrep to verify no references to the old misspelled name
remain.

Comment on lines 41 to 48
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
ObjectNode node = p.getCodec().readTree(p);
ObjectMapper mapper = (ObjectMapper) p.getCodec();

return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Safer deserialization via ObjectCodec; don’t depend on concrete mapper

Keeps the deserializer robust if a non-ObjectMapper codec is used.

     @Override
     public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
     {
-        ObjectNode node = p.getCodec().readTree(p);
-        ObjectMapper mapper = (ObjectMapper) p.getCodec();
-
-        return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
+        JsonNode node = p.getCodec().readTree(p);
+        return p.getCodec().treeToValue(node, actualCustomSplitFilterOptionsClass);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
ObjectNode node = p.getCodec().readTree(p);
ObjectMapper mapper = (ObjectMapper) p.getCodec();
return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
JsonNode node = p.getCodec().readTree(p);
return p.getCodec().treeToValue(node, actualCustomSplitFilterOptionsClass);
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigrCustomOptionsDeserializer.java
around lines 41 to 48, the deserializer casts the parser codec to ObjectMapper
which is unsafe if a non-ObjectMapper ObjectCodec is used; replace the cast with
the codec directly and call treeToValue on the ObjectCodec (e.g., ObjectCodec
codec = p.getCodec(); return codec.treeToValue(node,
actualCustomSplitFilterOptionsClass);) so deserialization does not depend on a
concrete ObjectMapper.

Comment on lines +73 to +75
catch (IOException e) {
throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Improve exception message with file path for diagnosability

Helps operators quickly pinpoint misconfiguration.

-        catch (IOException e) {
-            throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e);
+        catch (IOException e) {
+            throw new PrestoException(
+                    CLP_SPLIT_FILTER_CONFIG_NOT_FOUND,
+                    "Failed to read split filter config file: " + config.getSplitFilterConfig(),
+                    e);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
catch (IOException e) {
throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e);
}
catch (IOException e) {
throw new PrestoException(
CLP_SPLIT_FILTER_CONFIG_NOT_FOUND,
"Failed to read split filter config file: " + config.getSplitFilterConfig(),
e);
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
around lines 73 to 75, the PrestoException message thrown on IOException omits
the config file path, making diagnosis harder; change the exception to include
the path variable (or the resolved file path string) in the message and pass the
original exception as the cause so the log shows both the file path and the
underlying error details.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (2)

73-75: Include the config file path in the exception for faster diagnosis

Surfacing the file path drastically improves operability when configurations are mispointed or unreadable.

-        catch (IOException e) {
-            throw new PrestoException(CLP_SPLIT_FILTER_CONFIG_NOT_FOUND, "Failed to open split filter config file", e);
+        catch (IOException e) {
+            throw new PrestoException(
+                    CLP_SPLIT_FILTER_CONFIG_NOT_FOUND,
+                    "Failed to read split filter config file: " + config.getSplitFilterConfig(),
+                    e);
         }

101-118: Robust required-filter detection (null-safe, no false positives, deduplicated message)

Current String.contains:

  • NPEs if splitFilterPushDownExpression is null.
  • False-positives (e.g., "id" in "user_id").
  • Repeats columns across scopes.

Switch to a quoted-identifier regex match, deduplicate missing columns, and improve the message.

     public void checkContainsRequiredFilters(Set<String> tableScopeSet, String splitFilterPushDownExpression)
     {
-        boolean hasRequiredSplitFilterColumns = true;
-        ImmutableList.Builder<String> notFoundListBuilder = ImmutableList.builder();
+        boolean hasRequiredSplitFilterColumns = true;
+        ImmutableSet.Builder<String> notFoundSetBuilder = ImmutableSet.builder();
+        String expr = (splitFilterPushDownExpression == null) ? "" : splitFilterPushDownExpression;
         for (String tableScope : tableScopeSet) {
             for (String columnName : getRequiredColumnNames(tableScope)) {
-                if (!splitFilterPushDownExpression.contains(columnName)) {
+                if (!containsRequiredColumn(expr, columnName)) {
                     hasRequiredSplitFilterColumns = false;
-                    notFoundListBuilder.add(columnName);
+                    notFoundSetBuilder.add(columnName);
                 }
             }
         }
         if (!hasRequiredSplitFilterColumns) {
             throw new PrestoException(
                     CLP_MANDATORY_SPLIT_FILTER_NOT_VALID,
-                    notFoundListBuilder.build() + " is a mandatory split filter column but not valid");
+                    "Required split filter columns missing from predicate: " + notFoundSetBuilder.build());
         }
     }

Add this helper within the class (outside the selected range):

// Performs a strict match on a quoted identifier to avoid substring false-positives.
// Example: matches "\"msg.timestamp\"" in the expression, where columnName is msg.timestamp.
private static boolean containsRequiredColumn(String expression, String columnName)
{
    String regex = "\\Q\"" + columnName + "\"\\E";
    return java.util.regex.Pattern.compile(regex).matcher(expression).find();
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between df37d2a and 1871733.

📒 Files selected for processing (2)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-204)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (1)
  • ClpSplitFilterConfig (32-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)

62-67: Approval: Deserializer wiring verified

No stale or typoed references to the old deserializer class names were found in the repository. The newly registered ClpSplitFilterConfigCustomOptionsDeserializer with the correct CustomSplitFilterOptions class can be shipped as-is.

Comment on lines +16 to +21
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Adjust imports to match the refactor (optional but cleaner)

If you apply the previous refactor, update imports accordingly to remove unused ones and add needed types.

@@
-import com.fasterxml.jackson.databind.ObjectMapper;
-import com.fasterxml.jackson.databind.node.ObjectNode;
+import com.fasterxml.jackson.core.ObjectCodec;
+import com.fasterxml.jackson.core.JsonToken;
+import com.fasterxml.jackson.databind.JsonNode;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.core.ObjectCodec;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.JsonNode;
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 16-21, update the import list to match the recent refactor: remove
any imports that are no longer referenced (e.g., ObjectMapper, ObjectNode,
JsonDeserializer if unused) and add the specific Jackson types the new
implementation uses (for example com.fasterxml.jackson.databind.JsonNode and/or
any exception/utility types and the exact JsonParser/DeserializationContext
remains); ensure only the actually referenced classes are imported so there are
no unused imports and the compiler imports match the current code.

Comment on lines +26 to +30
/**
* Uses the given {@link CustomSplitFilterOptions} implementation to deserialize the
* {@code "customOptions"} field in a {@link ClpSplitFilterConfig}. The implementation is determined
* by the implemented {@link ClpSplitFilterProvider}.
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clarify null-handling behaviour in Javadoc

Document that deserialisation returns null when customOptions is absent or explicitly null.

 /**
  * Uses the given {@link CustomSplitFilterOptions} implementation to deserialize the
  * {@code "customOptions"} field in a {@link ClpSplitFilterConfig}. The implementation is determined
  * by the implemented {@link ClpSplitFilterProvider}.
+ * <p>
+ * Returns {@code null} if the {@code "customOptions"} field is absent or {@code null}.
+ * </p>
  */
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 26 to 30, update the class Javadoc to explicitly state the
null-handling behaviour: document that deserialization will return null when the
"customOptions" field is missing or explicitly set to null, and clarify that a
non-null CustomSplitFilterOptions implementation is only returned when the field
is present and successfully deserialized; keep the existing references to
CustomSplitFilterOptions and ClpSplitFilterConfig/ClpSplitFilterProvider but add
a short sentence describing these null cases.

Comment on lines +36 to +39
public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate constructor argument to fail fast

Defensively enforce a non-null class reference to avoid late failures.

-    public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
+    public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
     {
-        this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
+        this.actualCustomSplitFilterOptionsClass = java.util.Objects.requireNonNull(
+                actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
this.actualCustomSplitFilterOptionsClass = actualCustomSplitFilterOptionsClass;
}
public ClpSplitFilterConfigCustomOptionsDeserializer(Class<? extends CustomSplitFilterOptions> actualCustomSplitFilterOptionsClass)
{
this.actualCustomSplitFilterOptionsClass = java.util.Objects.requireNonNull(
actualCustomSplitFilterOptionsClass, "actualCustomSplitFilterOptionsClass is null");
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 36 to 39, the constructor does not validate its
actualCustomSplitFilterOptionsClass parameter; add a defensive null check (e.g.,
Objects.requireNonNull or explicit if-check) at the start of the constructor and
throw a clear NullPointerException with a message like
"actualCustomSplitFilterOptionsClass is null" so the class reference is
validated immediately and failures are fast and clear.

Comment on lines +41 to +48
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
ObjectNode node = p.getCodec().readTree(p);
ObjectMapper mapper = (ObjectMapper) p.getCodec();

return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Null-safe deserialisation; avoid brittle ObjectMapper cast

Guard against null tokens and map via ObjectCodec to avoid the hard cast to ObjectMapper. This prevents potential NPEs when customOptions is null/missing and future-proofs against non-ObjectMapper codecs.

Apply this diff:

@@
     @Override
     public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
     {
-        ObjectNode node = p.getCodec().readTree(p);
-        ObjectMapper mapper = (ObjectMapper) p.getCodec();
-
-        return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
+        if (p.getCurrentToken() == com.fasterxml.jackson.core.JsonToken.VALUE_NULL) {
+            return null;
+        }
+        com.fasterxml.jackson.core.ObjectCodec codec = p.getCodec();
+        com.fasterxml.jackson.databind.JsonNode node = codec.readTree(p);
+        if (node == null || node.isNull()) {
+            return null;
+        }
+        return codec.treeToValue(node, actualCustomSplitFilterOptionsClass);
     }

If you prefer, also add explicit imports for ObjectCodec, JsonToken, and JsonNode (see separate import note).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
ObjectNode node = p.getCodec().readTree(p);
ObjectMapper mapper = (ObjectMapper) p.getCodec();
return mapper.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
@Override
public CustomSplitFilterOptions deserialize(JsonParser p, DeserializationContext ctxt) throws IOException
{
// Handle explicit JSON null
if (p.getCurrentToken() == com.fasterxml.jackson.core.JsonToken.VALUE_NULL) {
return null;
}
// Use ObjectCodec rather than casting to ObjectMapper
com.fasterxml.jackson.core.ObjectCodec codec = p.getCodec();
com.fasterxml.jackson.databind.JsonNode node = codec.readTree(p);
// Guard against missing or explicit null node
if (node == null || node.isNull()) {
return null;
}
// Delegate to codec for mapping into the target class
return codec.treeToValue(node, actualCustomSplitFilterOptionsClass);
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
around lines 41-48, the current code casts the parser's codec to ObjectMapper
and doesn't guard against null/missing values; change it to be null-safe and
codec-agnostic by: obtain the current token and return null immediately if the
token is JsonToken.VALUE_NULL, use ObjectCodec codec = p.getCodec() instead of
casting to ObjectMapper, read a JsonNode via codec.readTree(p), return null if
the node is null or node.isNull(), and finally call codec.treeToValue(node,
actualCustomSplitFilterOptionsClass) to deserialize; also add imports for
ObjectCodec, JsonToken, and JsonNode.

Comment on lines +139 to +155
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor)
{
String[] splitScope = scope.split("\\.");
ImmutableSet.Builder<String> builder = ImmutableSet.builder();

builder.addAll(extractor.apply(filterMap.get(splitScope[0])));

if (splitScope.length > 1) {
builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
}

if (splitScope.length == 3) {
builder.addAll(extractor.apply(filterMap.get(scope)));
}

return builder.build();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Defensive null check on scope to prevent accidental NPEs

If a caller ever passes a null scope, scope.split(...) will NPE. Cheap guard improves resilience.

 private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor)
 {
-        String[] splitScope = scope.split("\\.");
+        requireNonNull(scope, "scope is null");
+        String[] splitScope = scope.split("\\.");
         ImmutableSet.Builder<String> builder = ImmutableSet.builder();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor)
{
String[] splitScope = scope.split("\\.");
ImmutableSet.Builder<String> builder = ImmutableSet.builder();
builder.addAll(extractor.apply(filterMap.get(splitScope[0])));
if (splitScope.length > 1) {
builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
}
if (splitScope.length == 3) {
builder.addAll(extractor.apply(filterMap.get(scope)));
}
return builder.build();
}
private Set<String> collectColumnNamesFromScopes(String scope, Function<List<ClpSplitFilterConfig>, Set<String>> extractor)
{
requireNonNull(scope, "scope is null");
String[] splitScope = scope.split("\\.");
ImmutableSet.Builder<String> builder = ImmutableSet.builder();
builder.addAll(extractor.apply(filterMap.get(splitScope[0])));
if (splitScope.length > 1) {
builder.addAll(extractor.apply(filterMap.get(splitScope[0] + "." + splitScope[1])));
}
if (splitScope.length == 3) {
builder.addAll(extractor.apply(filterMap.get(scope)));
}
return builder.build();
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
around lines 139-155, the method collectColumnNamesFromScopes calls
scope.split(...) without guarding for a null scope which will throw an NPE; add
a defensive null check at the start of the method (e.g., if (scope == null)
return ImmutableSet.of();) so the method returns an empty set for null input
rather than throwing, keeping the rest of the logic unchanged.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving docs. Deferring to @gibber9809's review of the code.

@anlowee anlowee merged commit 73342c6 into y-scope:release-0.293-clp-connector Aug 25, 2025
30 checks passed
@anlowee anlowee deleted the xwei/refactor-metadatafilter-1 branch August 25, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet