Skip to content

Conversation

@anlowee
Copy link

@anlowee anlowee commented Jun 24, 2025

Description

The CLP metadata MySQL database already supported some metadata filtering like timestamp. It is similar to what we did before to use the date-sorted directory structure to filter archives at top level, which can significantly reduce the number of splits to scan.

Benefits:

  • Reduce the number of splits to scan and improve the performance.

Trade Offs:

  • Niche performance cost on generating metadata SQL when generating KQL.
  • Niche performance cost on string replacement when mapping to custom metadata columns.

To implement this, we added a new config property and designed a JSON format config file for metadata filter configuration. For example:

{
  "clp": [
      {
        "columnName": "level"
      }
  ],
  "clp.default": [
      {
        "columnName": "author"
      }
  ],
  "clp.default.table_1": [
      {
        "columnName": "msg.timestamp",
        "rangeMapping": {
          "lowerBound": "begin_timestamp",
          "upperBound": "end_timestamp"
        },
        "required": true
      },
      {
        "columnName": "file_name"
      }
  ]
}

Explanation:

  • "clp": Adds a filter on the column level for all schemas and tables under the clp catalog.
  • "clp.default": Adds a filter on author for all tables under the clp.default schema.
  • "clp.default.table_1": Adds two filters for the table clp.default.table_1:
    • msg.timestamp is remapped via rangeMapping and is marked as required.
    • file_name is used as-is without remapping.

For more details about this config file, please refer to the modified clp.rst doc.

We also modified the ClpExpression and ClpFilterToKqlConverter to extract the metadata filter SQL query when generating KQL pushdown.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

E2E test with sample log with format like:

"msg": {
    "timestamp": "1234",
    "message":"hahaha",
    "logLevel": "INFO"   
}

A new unit test class TestClpMetadataFilterConfig and a few unit tests added into existing classesTestClpFilterToKql and TestClpSplit.

Summary by CodeRabbit

  • New Features

    • Introduced metadata-based filtering in the CLP connector with configurable JSON filters at catalog, schema, and table levels.
    • Added a configuration property to specify the metadata filter config file path.
    • Enabled required filters and range mappings for metadata columns to enforce query constraints and optimize filtering.
    • Enhanced query optimization to integrate metadata SQL filters alongside KQL pushdown expressions.
  • Documentation

    • Added detailed documentation on metadata filter configuration, structure, usage, and examples.
  • Bug Fixes

    • Improved type mapping by treating date strings as timestamps.
  • Tests

    • Added tests validating metadata filter configuration, SQL remapping, and metadata-based split filtering.
    • Refactored existing tests for filter pushdown and added coverage for metadata SQL generation.
  • Chores

    • Enhanced logging for query construction and split listing to aid debugging and traceability.

@anlowee anlowee requested a review from wraymo June 24, 2025 18:45
@coderabbitai
Copy link

coderabbitai bot commented Jun 24, 2025

Walkthrough

This change introduces support for metadata filter pushdown in the CLP Presto connector. It adds a new configuration property for specifying a JSON metadata filter config, extends the filter conversion and planning logic to generate and validate metadata SQL filters, and updates related classes, tests, and documentation to support and verify this feature.

Changes

Files / Groups Change Summary
presto-clp/pom.xml Added jackson-core and jackson-databind dependencies for JSON handling.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java Added metadataFilterConfig property with getter/setter for config path.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java Introduced CONNECTOR_NAME constant for connector name.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java Added error codes for missing/invalid metadata filter config.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java Added metadataSqlQuery field, updated constructors, and new getter.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java Enhanced to generate metadata SQL, handle BETWEEN, propagate metadata SQL, and added utility methods.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java Updated ClpTableLayoutHandle instantiation to include new optional parameter.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java New class: Loads and manages metadata filter config, validates required filters, remaps SQL, exposes filter info.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java Bound ClpMetadataFilterProvider as a singleton in Guice.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java Injected ClpMetadataFilterProvider, integrated metadata SQL generation, validation, and remapping into planning.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java Updated constructor and optimizer instantiation to pass ClpMetadataFilterProvider.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java Added metadataSql field, updated constructor, getter, equality, and string representation.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java Mapped DateString type to TimestampType instead of VARCHAR.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java Appended metadata SQL to archive query if present, added debug logging for query and split count.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java Refactored split setup to use ArchivesTableRow with timestamp fields, updated schema and insertion logic.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java Refactored test helpers, added tests for metadata SQL generation, improved assertion logic and naming.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java New test: Validates metadata filter config loading, required filter enforcement, and SQL remapping.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java Enhanced split tests to validate metadata SQL filtering using timestamp ranges.
presto-clp/src/test/resources/test-metadata-filter.json New resource: Example JSON metadata filter config for tests.
presto-docs/src/main/sphinx/connector/clp.rst Documented clp.metadata-filter-config, described filter config structure, usage, and example.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Presto
    participant ClpPlanOptimizer
    participant ClpFilterToKqlConverter
    participant ClpMetadataFilterProvider
    participant ClpTableLayoutHandle
    participant ClpMySqlSplitProvider

    User->>Presto: Submit query with filters
    Presto->>ClpPlanOptimizer: Plan query
    ClpPlanOptimizer->>ClpFilterToKqlConverter: Convert RowExpression to KQL and metadata SQL
    ClpPlanOptimizer->>ClpMetadataFilterProvider: Validate required filters in metadata SQL
    ClpPlanOptimizer->>ClpMetadataFilterProvider: Remap metadata SQL (range mapping)
    ClpPlanOptimizer->>ClpTableLayoutHandle: Create with KQL and metadata SQL
    Presto->>ClpMySqlSplitProvider: Request splits (with layout handle)
    ClpMySqlSplitProvider->>ClpTableLayoutHandle: Retrieve metadata SQL
    ClpMySqlSplitProvider->>ClpMySqlSplitProvider: Append metadata SQL to archive query
    ClpMySqlSplitProvider->>Presto: Return filtered splits
Loading

Possibly related PRs

  • y-scope/presto#16: Introduced and extended ClpExpression and ClpFilterToKqlConverter classes, which are directly enhanced in this PR to support metadata SQL generation and validation.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8dd013 and f8b7eaf.

📒 Files selected for processing (2)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
🧬 Code Graph Analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (33-72)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (2)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (33-72)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2)

35-51: Field injection and null checks look good.

The addition of ClpMetadataFilterProvider with proper null validation follows the established pattern.


81-98: Metadata filter integration is well-structured.

The implementation correctly:

  • Constructs the scope using the standard format
  • Validates required filters before the KQL check (preventing early returns)
  • Uses debug logging instead of info level
  • Properly remaps the metadata SQL when present
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1)

170-196: Well-refactored scope collection logic.

The extraction of common logic into collectColumnNamesFromScopes eliminates code duplication effectively while maintaining readability.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6143799 and 948ccc0.

📒 Files selected for processing (20)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (8 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (4 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (25)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

40-40: LGTM! Proper dependency injection setup.

The singleton binding for ClpMetadataFilterProvider follows the established pattern and is appropriately placed among other core component bindings.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (2)

18-18: Good addition for timestamp support.

The TimestampType import is properly added to support the enhanced type mapping.


125-126: Excellent semantic improvement for DateString mapping.

Changing DateString from VARCHAR to TimestampType.TIMESTAMP is semantically correct and aligns with the metadata filtering functionality that requires proper timestamp handling.

presto-clp/pom.xml (1)

68-76: Appropriate dependencies for JSON processing.

The addition of jackson-core and jackson-databind complements the existing jackson-annotations dependency and properly supports the JSON configuration processing needs of the new metadata filtering functionality.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (2)

36-36: Good practice: Extract magic string to named constant.

Creating the CONNECTOR_NAME constant improves maintainability and provides consistent access to the connector identifier throughout the codebase.


41-41: Consistent usage of the extracted constant.

Using CONNECTOR_NAME instead of the hardcoded string follows good practices and ensures consistency.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)

122-122: Proper adaptation to enhanced constructor signature.

The addition of Optional.empty() as the third parameter aligns with the enhanced ClpTableLayoutHandle constructor that now supports metadata SQL. Using Optional.empty() maintains existing behaviour while enabling the new functionality in other parts of the system.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)

21-21: LGTM! Well-designed error codes for metadata filtering.

The new error codes are appropriately named, use the correct USER_ERROR type for configuration-related errors, and follow consistent numbering. The required USER_ERROR import has been added correctly.

Also applies to: 32-33

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)

44-44: LGTM! Improved SQL template for dynamic condition appending.

The "WHERE 1 = 1" base condition is a good practice that allows easy appending of additional conditions without complex conditional logic.


92-94: LGTM! Clear variable naming and appropriate debug logging.

The variable renaming to filteredSplits improves code clarity, and debug-level logging for split count is appropriate for troubleshooting.


72-77: LGTM! Proper metadata SQL integration with appropriate logging.

The implementation correctly:

  • Checks for optional metadata SQL using isPresent()
  • Safely appends the condition with AND clause and proper parentheses grouping
  • Adds info-level logging for query debugging

Verify that the metadata SQL from ClpMetadataFilterProvider is properly sanitized to prevent SQL injection attacks:

#!/bin/bash
# Description: Check if ClpMetadataFilterProvider implements SQL injection protection
# Expected: Find SQL sanitization or parameterized query usage

# Search for ClpMetadataFilterProvider implementation
ast-grep --pattern 'class ClpMetadataFilterProvider {
  $$$
}'

# Look for SQL sanitization methods
rg -A 10 -B 5 "sanitize|escape|quote|parameterized|prepared" --type java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)

78-78: LGTM! Constructor simplification aligns with refactoring.

The removal of the StorageType parameter from the ClpTableHandle constructor simplifies the test setup while maintaining the same test functionality.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)

31-31: LGTM! Proper dependency injection implementation.

The ClpMetadataFilterProvider dependency is correctly:

  • Declared as a private final field
  • Injected via constructor with @Inject annotation
  • Stored in the constructor
  • Passed to the ClpPlanOptimizer constructor

This follows standard dependency injection patterns and integrates well with the metadata filtering feature.

Also applies to: 34-34, 38-38, 50-50

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)

30-85: LGTM! Clean implementation of metadata SQL support.

The addition of the metadataSql field is well-implemented with proper immutability, JSON serialization, and consistent updates to equals/hashCode/toString methods.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)

35-89: Well-structured implementation of metadata SQL support.

The constructor updates maintain backward compatibility while adding support for metadata SQL. The new two-parameter constructor provides a clean API for cases where both pushdown expression and metadata SQL are needed.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (2)

78-118: Comprehensive test coverage for metadata filter functionality.

The tests effectively validate hierarchical filter resolution and SQL remapping with various operators including range queries and equality checks.


120-126: Make tearDown more defensive.

The null check should come before accessing tempFile.exists().

 @AfterMethod
 public void tearDown()
 {
-    if (null != tempFile && tempFile.exists()) {
+    if (tempFile != null && tempFile.exists()) {
         tempFile.delete();
     }
 }

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

81-106: Well-integrated metadata filter support in plan optimization.

The implementation correctly:

  • Constructs scope for hierarchical filter resolution
  • Validates mandatory filters before proceeding
  • Remaps metadata SQL according to configuration
  • Provides appropriate logging for debugging

The comment on lines 92-93 correctly explains why validation must happen before the KQL check.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)

231-261: Good implementation of the ArchiveTableRow class

The ArchiveTableRow class is well-structured with appropriate getters for all fields. The constructor properly initializes all final fields.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)

79-138: Comprehensive test coverage for timestamp filtering

The test method thoroughly covers various timestamp filtering scenarios including:

  • No filters
  • Overlapping ranges
  • Non-overlapping ranges
  • Exact matches
  • Edge cases

This provides excellent coverage for the metadata filtering functionality.


140-167: Well-structured helper method for result comparison

The compareListSplitsResult method effectively validates split filtering results by:

  1. Creating the appropriate layout handle with metadata SQL
  2. Comparing the count of splits
  3. Verifying the actual split paths match expected paths

The use of immutable sets for comparison ensures order-independent validation.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)

238-243: Ensure consistent NULL handling in metadata SQL propagation

The method correctly propagates metadata SQL through NOT operations. The null-safe handling with Optional is appropriate.


624-667: Good implementation of metadata SQL propagation in AND operations

The implementation correctly handles metadata SQL propagation for AND operations:

  • Generates metadata SQL only when at least one child has it
  • Properly concatenates multiple metadata SQL expressions
  • Maintains the structure parallel to KQL generation
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2)

194-204: Good refactoring with the tryPushDown helper method

The helper method encapsulates the conversion logic nicely and makes tests more readable. The explicit specification of metadata filter columns (ImmutableSet.of("fare")) is clear.


227-240: Well-structured helper for metadata SQL validation

The testFilterWithMetadataSql helper method properly validates both the KQL expression and the optional metadata SQL, with appropriate null handling.

private String metadataDbName;
private String metadataDbUser;
private String metadataDbPassword;
private String metadataFilterConfig;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

LGTM! Consistent configuration property implementation.

The new metadataFilterConfig property follows the established pattern with proper getter/setter methods and @config annotation.

Consider adding validation for the metadata filter config path, similar to how metadataTablePrefix is validated:

@Config("clp.metadata-filter-config")
public ClpConfig setMetadataFilterConfig(String metadataFilterConfig)
{
+   if (metadataFilterConfig != null && !metadataFilterConfig.trim().isEmpty()) {
+       // Validate file path format or existence if needed
+   }
    this.metadataFilterConfig = metadataFilterConfig;
    return this;
}

Also applies to: 111-121

🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java around
lines 32 and 111-121, the metadataFilterConfig property lacks validation similar
to metadataTablePrefix. Add validation logic in the setter method for
metadataFilterConfig to ensure the config path meets expected criteria, throwing
an appropriate exception if invalid. This will maintain consistency and prevent
misconfiguration.

Comment on lines 32 to 33
Pattern stringFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\"");
Matcher stringFilterMatcher = stringFilterPattern.matcher(sqlQuery);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Optimize regex pattern compilation for better performance.

Compiling regex patterns on every method call is inefficient. Consider making them static constants.

 public static class KqlUtils
 {
     private static final Logger log = Logger.get(KqlUtils.class);
+    private static final Pattern STRING_FILTER_PATTERN = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\"");
+    private static final Pattern NUMERIC_FILTER_PATTERN = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*)\\s*([><]=?|:)\\s*(-?\\d+(?:\\.\\d+)?)");

     public static String toSql(String kqlQuery)
     {
         String sqlQuery = kqlQuery;

         // Translate string filters
-        Pattern stringFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\"");
-        Matcher stringFilterMatcher = stringFilterPattern.matcher(sqlQuery);
+        Matcher stringFilterMatcher = STRING_FILTER_PATTERN.matcher(sqlQuery);

And similarly for the numeric filter pattern.

Also applies to: 65-66

🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java around
lines 32-33 and 65-66, the regex patterns are compiled inside methods causing
repeated compilation and performance overhead. To fix this, declare these
Pattern objects as static final constants at the class level so they are
compiled once and reused across method calls, improving efficiency.

Comment on lines 27 to 94
public static String toSql(String kqlQuery)
{
String sqlQuery = kqlQuery;

// Translate string filters
Pattern stringFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\"");
Matcher stringFilterMatcher = stringFilterPattern.matcher(sqlQuery);
StringBuilder translateStringFiltersSb = new StringBuilder();

int lastStringFilterEnd = 0;
while (stringFilterMatcher.find()) {
String columnName = stringFilterMatcher.group(1);
String value = stringFilterMatcher.group(2);

// Append everything between last match and this one
translateStringFiltersSb.append(sqlQuery, lastStringFilterEnd, stringFilterMatcher.start());

if (value.contains("*")) {
translateStringFiltersSb.append("\"")
.append(columnName)
.append("\" LIKE '")
.append(value.replace('*', '%'))
.append("'");
}
else {
translateStringFiltersSb.append("\"")
.append(columnName)
.append("\" = '")
.append(value)
.append("'");
}

lastStringFilterEnd = stringFilterMatcher.end();
}
translateStringFiltersSb.append(sqlQuery.substring(lastStringFilterEnd));
sqlQuery = translateStringFiltersSb.toString();

// Translate numeric filters
Pattern numericFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*)\\s*([><]=?|:)\\s*(-?\\d+(?:\\.\\d+)?)");
Matcher numericFilterMatcher = numericFilterPattern.matcher(sqlQuery);
StringBuilder translateNumericFilterSb = new StringBuilder();

int lastNumericFilterEnd = 0;
while (numericFilterMatcher.find()) {
String columnName = numericFilterMatcher.group(1);
String operator = numericFilterMatcher.group(2);
if (":".equals(operator)) {
operator = "=";
}
String value = numericFilterMatcher.group(3);

// Append everything between last match and this one
translateNumericFilterSb.append(sqlQuery, lastNumericFilterEnd, numericFilterMatcher.start());

translateNumericFilterSb.append("\"")
.append(columnName)
.append("\" ")
.append(operator)
.append(" ")
.append(value);

lastNumericFilterEnd = numericFilterMatcher.end();
}
translateNumericFilterSb.append(sqlQuery.substring(lastNumericFilterEnd));
sqlQuery = translateNumericFilterSb.toString();

return sqlQuery;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider refactoring for improved maintainability.

The toSql method is quite long and handles multiple responsibilities. Consider extracting the string and numeric filter translation logic into separate methods for better readability and testability.

Would you like me to help refactor this into smaller, more focused methods with proper error handling for malformed input?

🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java between
lines 27 and 94, the toSql method is lengthy and mixes string and numeric filter
translation logic. Refactor by extracting the string filter translation and
numeric filter translation into two separate private methods. Each method should
take the current query string and return the transformed string. Then, update
toSql to call these methods sequentially. This will improve readability,
maintainability, and allow easier unit testing of each translation step.

Comment on lines 46 to 60
@Inject
public ClpMetadataFilterProvider(ClpConfig config)
{
requireNonNull(config, "config is null");

ObjectMapper mapper = new ObjectMapper();
try {
filterMap = mapper.readValue(
new File(config.getMetadataFilterConfig()),
new TypeReference<Map<String, TableConfig>>() {});
}
catch (IOException e) {
throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve exception handling and add null check

The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.

     @Inject
     public ClpMetadataFilterProvider(ClpConfig config)
     {
         requireNonNull(config, "config is null");
+        requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null");
 
         ObjectMapper mapper = new ObjectMapper();
         try {
             filterMap = mapper.readValue(
                     new File(config.getMetadataFilterConfig()),
                     new TypeReference<Map<String, TableConfig>>() {});
         }
         catch (IOException e) {
-            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
+            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e);
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Inject
public ClpMetadataFilterProvider(ClpConfig config)
{
requireNonNull(config, "config is null");
ObjectMapper mapper = new ObjectMapper();
try {
filterMap = mapper.readValue(
new File(config.getMetadataFilterConfig()),
new TypeReference<Map<String, TableConfig>>() {});
}
catch (IOException e) {
throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
}
}
@Inject
public ClpMetadataFilterProvider(ClpConfig config)
{
requireNonNull(config, "config is null");
requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null");
ObjectMapper mapper = new ObjectMapper();
try {
filterMap = mapper.readValue(
new File(config.getMetadataFilterConfig()),
new TypeReference<Map<String, TableConfig>>() {});
}
catch (IOException e) {
throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND,
"Failed to open metadata filter config file: " + e.getMessage(), e);
}
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
around lines 46 to 60, add a null check to ensure the metadata filter config
path from config.getMetadataFilterConfig() is not null before reading the file.
Also, improve the exception message in the catch block to correct the grammar
and make it clearer, for example by changing it to "Failed to open metadata
filter config file."

Comment on lines 150 to 192
private static class TableConfig
{
public List<Filter> filters;
}

private static class RangeMapping
{
@JsonProperty("lowerBound")
public String lowerBound;

@JsonProperty("upperBound")
public String upperBound;

@Override
public boolean equals(Object o)
{
if (this == o) {
return true;
}
if (!(o instanceof RangeMapping)) {
return false;
}
RangeMapping that = (RangeMapping) o;
return Objects.equals(lowerBound, that.lowerBound) &&
Objects.equals(upperBound, that.upperBound);
}

@Override
public int hashCode()
{
return Objects.hash(lowerBound, upperBound);
}
}

private static class Filter
{
@JsonProperty("filterName")
public String filterName;

@JsonProperty("rangeMapping")
public RangeMapping rangeMapping;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add proper access modifiers and validation to nested classes

The nested classes use package-private fields which could lead to unintended modifications. Consider making them private with proper accessors or using immutable patterns.

     private static class TableConfig
     {
-        public List<Filter> filters;
+        @JsonProperty("filters")
+        private List<Filter> filters;
+        
+        public List<Filter> getFilters()
+        {
+            return filters != null ? ImmutableList.copyOf(filters) : ImmutableList.of();
+        }
     }
 
     private static class RangeMapping
     {
         @JsonProperty("lowerBound")
-        public String lowerBound;
+        private String lowerBound;
 
         @JsonProperty("upperBound")
-        public String upperBound;
+        private String upperBound;
+        
+        public String getLowerBound()
+        {
+            return lowerBound;
+        }
+        
+        public String getUpperBound()
+        {
+            return upperBound;
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private static class TableConfig
{
public List<Filter> filters;
}
private static class RangeMapping
{
@JsonProperty("lowerBound")
public String lowerBound;
@JsonProperty("upperBound")
public String upperBound;
@Override
public boolean equals(Object o)
{
if (this == o) {
return true;
}
if (!(o instanceof RangeMapping)) {
return false;
}
RangeMapping that = (RangeMapping) o;
return Objects.equals(lowerBound, that.lowerBound) &&
Objects.equals(upperBound, that.upperBound);
}
@Override
public int hashCode()
{
return Objects.hash(lowerBound, upperBound);
}
}
private static class Filter
{
@JsonProperty("filterName")
public String filterName;
@JsonProperty("rangeMapping")
public RangeMapping rangeMapping;
}
}
private static class TableConfig
{
@JsonProperty("filters")
private List<Filter> filters;
public List<Filter> getFilters()
{
return filters != null
? ImmutableList.copyOf(filters)
: ImmutableList.of();
}
}
private static class RangeMapping
{
@JsonProperty("lowerBound")
private String lowerBound;
@JsonProperty("upperBound")
private String upperBound;
@Override
public boolean equals(Object o)
{
if (this == o) {
return true;
}
if (!(o instanceof RangeMapping)) {
return false;
}
RangeMapping that = (RangeMapping) o;
return Objects.equals(lowerBound, that.lowerBound) &&
Objects.equals(upperBound, that.upperBound);
}
@Override
public int hashCode()
{
return Objects.hash(lowerBound, upperBound);
}
public String getLowerBound()
{
return lowerBound;
}
public String getUpperBound()
{
return upperBound;
}
}
private static class Filter
{
@JsonProperty("filterName")
public String filterName;
@JsonProperty("rangeMapping")
public RangeMapping rangeMapping;
}
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
between lines 150 and 192, the nested classes TableConfig, RangeMapping, and
Filter have package-private fields that risk unintended external modification.
To fix this, make the fields private and add appropriate getter methods to
provide controlled access. Additionally, consider making the classes immutable
by removing setters or making fields final where possible to enforce
immutability and encapsulation.

Copy link

@wraymo wraymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed most of the files except ClpMetadataFilterProvider and the unit test. The main confusing part is introducing a new tryPushDown method in TestClpFilterToKql

@anlowee anlowee requested a review from wraymo June 25, 2025 13:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (9)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)

30-31: Fix typo in comment.

-    // Optinal SQL string extracted from the definition, which is only made of given metadata columns.
+    // Optional SQL string extracted from the definition, which is only made of given metadata columns.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

150-153: Fix indentation in constructor call.

Apply this diff to fix the indentation:

         ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
-                new ClpTableHandle(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName), tablePath),
+                new ClpTableHandle(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName), tablePath),
                 Optional.empty(),
                 metadataSql);
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)

221-229: DbHandle should have an accessor for dbPath.

The DbHandle class has a private dbPath field but no getter method. Since the field is used in other methods (e.g., line 71, 125), consider adding a getter method for better encapsulation.

     static final class DbHandle
     {
         private final String dbPath;
 
         DbHandle(String dbPath)
         {
             this.dbPath = dbPath;
         }
+
+        String getDbPath()
+        {
+            return dbPath;
+        }
     }

Then update usages:

-        final String metadataDbUrl = format(METADATA_DB_URL_TEMPLATE, dbHandle.dbPath);
+        final String metadataDbUrl = format(METADATA_DB_URL_TEMPLATE, dbHandle.getDbPath());
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)

73-87: Fix exception handling and add null check.

The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.

     @Inject
     public ClpMetadataFilterProvider(ClpConfig config)
     {
         requireNonNull(config, "config is null");
+        requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null");
 
         ObjectMapper mapper = new ObjectMapper();
         try {
             filterMap = mapper.readValue(
                     new File(config.getMetadataFilterConfig()),
                     new TypeReference<Map<String, TableConfig>>() {});
         }
         catch (IOException e) {
-            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
+            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e);
         }
     }

110-140: Security concern: SQL injection risk in remapFilterSql.

The remapFilterSql method uses regex-based string replacement to modify SQL expressions, which is inherently risky and could lead to SQL injection vulnerabilities. The regex patterns don't properly validate the SQL structure and could match unintended parts of the query.

Consider using a proper SQL parser to safely manipulate SQL expressions instead of regex replacements. If regex must be used, ensure proper escaping and validation:

-            remappedSql = remappedSql.replaceAll(
-                    format("\"(%s)\"\\s(>=?)\\s([0-9]*)", entry.getKey()),
-                    format("%s $2 $3", entry.getValue().upperBound));
+            // Escape special regex characters in column names
+            String escapedColumnName = Pattern.quote(entry.getKey());
+            // Use word boundaries to ensure exact column matches
+            remappedSql = remappedSql.replaceAll(
+                    "\\b\"" + escapedColumnName + "\"\\s+(>=?)\\s+([0-9]+)\\b",
+                    entry.getValue().upperBound + " $1 $2");

161-166: Fix typo in method name.

The method name has a typo: "Filer" should be "Filter".

-    private Set<String> getAllFilerNamesFromTableConfig(TableConfig tableConfig)
+    private Set<String> getAllFilterNamesFromTableConfig(TableConfig tableConfig)

Also update all references to this method.


179-221: Add proper access modifiers and validation to nested classes.

The nested classes use package-private fields which could lead to unintended modifications. Consider making them private with proper accessors or using immutable patterns.

     private static class TableConfig
     {
-        public List<Filter> filters;
+        @JsonProperty("filters")
+        private List<Filter> filters;
+        
+        public List<Filter> getFilters()
+        {
+            return filters != null ? ImmutableList.copyOf(filters) : ImmutableList.of();
+        }
     }
 
     private static class RangeMapping
     {
         @JsonProperty("lowerBound")
-        public String lowerBound;
+        private String lowerBound;
 
         @JsonProperty("upperBound")
-        public String upperBound;
+        private String upperBound;
+        
+        public String getLowerBound()
+        {
+            return lowerBound;
+        }
+        
+        public String getUpperBound()
+        {
+            return upperBound;
+        }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)

415-449: Consider extracting metadata SQL generation logic.

The pattern of checking if a variable is in metadataFilterColumns and generating metadata SQL is repeated throughout the class. Consider extracting this into a helper method to reduce duplication.

private Optional<String> generateMetadataSqlIfApplicable(String variableName, String sqlExpression)
{
    if (metadataFilterColumns.contains(variableName)) {
        return Optional.of(sqlExpression);
    }
    return Optional.empty();
}

Then use it like:

-        String metadataSql = null;
         String kql = String.format(
                 "\"%s\" >= %s AND \"%s\" <= %s",
                 variableReferenceDefinition.get(),
                 lowerBoundConstantDefinition.get(),
                 variableReferenceDefinition.get(),
                 upperBoundConstantDefinition.get());
-        if (metadataFilterColumns.contains(variableReferenceDefinition.get())) {
-            metadataSql = kql;
-        }
-        return new ClpExpression(kql, metadataSql);
+        Optional<String> metadataSql = generateMetadataSqlIfApplicable(variableReferenceDefinition.get(), kql);
+        return new ClpExpression(kql, metadataSql.orElse(null));
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)

281-309: Comprehensive metadata SQL generation testing.

The new test method provides good coverage of metadata SQL generation scenarios:

  • AND operations that can generate metadata SQL
  • OR operations that cannot generate metadata SQL (correctly returns null)
  • Mixed logical operations

However, consider adding the missing @Test annotation as noted in past review comments.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 948ccc0 and e530913.

📒 Files selected for processing (11)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (9 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (4 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4 hunks)
  • presto-clp/src/test/resources/test-metadata-filter.json (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (20)
presto-clp/src/test/resources/test-metadata-filter.json (1)

1-30: Configuration structure looks well-designed for hierarchical filtering.

The JSON configuration properly defines a three-tier hierarchy (catalog → schema → table) for metadata filters. The range mapping configuration for timestamp handling is a thoughtful approach to translate point-in-time queries to archive-level timestamp ranges.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)

44-44: Good use of "WHERE 1 = 1" pattern for dynamic query building.

The SQL template modification to include "WHERE 1 = 1" is a standard approach that simplifies appending additional conditions with "AND" clauses.


76-76: Appropriate debug logging level.

The debug logging aligns with the previous review feedback and provides useful troubleshooting information without being too verbose in production.


72-76: Verify SQL injection protection for metadata filters.

The metadata SQL is appended directly to the query without parameterization. Ensure that the metadata SQL generation upstream properly sanitizes inputs to prevent SQL injection vulnerabilities.

#!/bin/bash
# Search for where metadataSql is constructed to verify input sanitization
ast-grep --pattern 'metadataSql = $_' 
rg -A 5 -B 5 "metadataSql.*=" --type java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)

24-32: Escaping logic is correct and properly ordered.

The escape sequence ordering is appropriate - backslashes are escaped first to prevent double-escaping of subsequent characters. The method covers the essential KQL special characters that need escaping in string literals.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (3)

37-43: Good resource loading pattern with proper error handling.

The resource loading logic correctly handles the case where the resource file is not found and provides a clear error message.


45-57: Comprehensive test coverage for hierarchical filter aggregation.

The test properly validates that filters are correctly aggregated across the hierarchy (catalog → schema → table levels), ensuring that child levels inherit parent filters while adding their own.


59-85: Thorough validation of SQL remapping logic.

The test cases cover all essential comparison operators and validate the correct transformation from point-in-time queries to range-based queries. The equality case properly handles the overlap condition with both begin and end timestamp bounds.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (3)

35-40: Constructor implementation is correct.

The primary constructor properly initializes all fields using Optional.ofNullable, maintaining consistency with the existing pattern for handling nullable values.


45-69: Well-designed constructor overloading for backward compatibility.

The constructor overloads properly delegate to the primary constructor and maintain backward compatibility while enabling the new metadata SQL functionality. The progression from no parameters to full parameters is logical.


86-90: Consistent getter implementation.

The getMetadataSql() method follows the same pattern as other getters in the class, returning an Optional to handle the nullable field appropriately.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

45-51: LGTM!

The dependency injection of ClpMetadataFilterProvider is properly implemented with null checks.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

61-67: Well-documented timestamp values!

The synthetic timestamp generation is clearly explained with comments, making the test logic easy to understand.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)

51-57: Consistent naming convention applied.

The constant names now consistently use the ARCHIVES prefix, improving code clarity.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (6)

17-18: Import additions look appropriate for enhanced testing capabilities.

The new imports support the enhanced testing functionality: SemanticException for exception testing, ImmutableSet for metadata filter columns, and additional assertion methods.

Also applies to: 22-22, 26-27


40-56: Consistent refactoring to use tryPushDown helper method.

Good refactoring that provides a consistent pattern for converting SQL expressions to ClpExpression objects. The empty set parameter ImmutableSet.of() is appropriate for tests that don't require metadata filtering.


100-117: Enhanced BETWEEN test coverage with proper exception handling.

Excellent addition of semantic exception testing for invalid BETWEEN expressions. The test properly verifies that incompatible types (string range on numeric field) throw a SemanticException.


311-320: Well-structured helper method for consistent test setup.

The tryPushDown method provides a clean abstraction for creating ClpExpression objects with configurable metadata filter columns. This promotes consistency across tests and makes the intent clearer.


322-351: Improved test method signature with better parameter handling.

The updated testFilter method signature accepting ClpExpression directly is more appropriate than the previous string-based approach. The addition of the remainingToRowExpression parameter provides flexibility for different assertion strategies.


353-366: Focused helper method for metadata SQL validation.

The testFilterWithMetadataSql method provides a clean way to test both KQL generation and metadata SQL extraction. The null-safe assertion logic properly handles cases where metadata SQL is not expected.

Comment on lines +92 to +94
ImmutableList<ClpSplit> filteredSplits = splits.build();
log.debug("Number of splits: %s", filteredSplits.size());
return filteredSplits;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider using more descriptive variable names.

While functionally correct, consider renaming filteredSplits to simply splits since no actual filtering occurs at this level - the filtering happens at the SQL query level.

-        ImmutableList<ClpSplit> filteredSplits = splits.build();
-        log.debug("Number of splits: %s", filteredSplits.size());
-        return filteredSplits;
+        ImmutableList<ClpSplit> result = splits.build();
+        log.debug("Number of splits: %s", result.size());
+        return result;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ImmutableList<ClpSplit> filteredSplits = splits.build();
log.debug("Number of splits: %s", filteredSplits.size());
return filteredSplits;
ImmutableList<ClpSplit> result = splits.build();
log.debug("Number of splits: %s", result.size());
return result;
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
around lines 92 to 94, rename the variable filteredSplits to splits to better
reflect that no filtering is done at this stage and the filtering happens at the
SQL query level. Update all references accordingly to maintain clarity and
consistency.

Comment on lines 24 to 32
public static String escapeKqlSpecialCharsForStringValue(String literalString)
{
String escaped = literalString;
escaped = escaped.replace("\\", "\\\\");
escaped = escaped.replace("\"", "\\\"");
escaped = escaped.replace("?", "\\?");
escaped = escaped.replace("*", "\\*");
return escaped;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider adding input validation.

The method doesn't handle null input which could cause a NullPointerException. Consider adding a null check or documenting the precondition.

 public static String escapeKqlSpecialCharsForStringValue(String literalString)
 {
+    if (literalString == null) {
+        return null;
+    }
     String escaped = literalString;
     escaped = escaped.replace("\\", "\\\\");
     escaped = escaped.replace("\"", "\\\"");
     escaped = escaped.replace("?", "\\?");
     escaped = escaped.replace("*", "\\*");
     return escaped;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public static String escapeKqlSpecialCharsForStringValue(String literalString)
{
String escaped = literalString;
escaped = escaped.replace("\\", "\\\\");
escaped = escaped.replace("\"", "\\\"");
escaped = escaped.replace("?", "\\?");
escaped = escaped.replace("*", "\\*");
return escaped;
}
public static String escapeKqlSpecialCharsForStringValue(String literalString)
{
if (literalString == null) {
return null;
}
String escaped = literalString;
escaped = escaped.replace("\\", "\\\\");
escaped = escaped.replace("\"", "\\\"");
escaped = escaped.replace("?", "\\?");
escaped = escaped.replace("*", "\\*");
return escaped;
}
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java around
lines 24 to 32, the method escapeKqlSpecialCharsForStringValue does not handle
null input, which can cause a NullPointerException. Add a null check at the
start of the method to either return null or throw an IllegalArgumentException
with a clear message, or document that the input must not be null as a
precondition.

Comment on lines +82 to +84
String metadataFilterSql5 = "(\"msg.timestamp\" = 1234)";
String remappedSql5 = filterProvider.remapFilterSql("clp.default.table_1", metadataFilterSql5);
assertEquals(remappedSql5, "((begin_timestamp <= 1234 AND end_timestamp >= 1234))");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider adding edge case tests.

While the current test coverage is good, consider adding tests for:

  • Invalid SQL expressions
  • Filters without range mapping
  • Empty or malformed configuration files
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java
around lines 82 to 84, add new test cases to cover edge scenarios such as
invalid SQL expressions, filters that do not require range mapping, and cases
with empty or malformed configuration files. Create separate test methods or
extend existing ones to include these inputs and verify that the remapFilterSql
method handles them gracefully without errors or unexpected behavior.

@anlowee anlowee force-pushed the xwei/metadata-filter-v2 branch from 51b16e6 to ee181b8 Compare June 25, 2025 17:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)

39-47: Add null input validation.

The method doesn't handle null input which could cause a NullPointerException. This is consistent with past review feedback requesting null checks.

 public static String escapeKqlSpecialCharsForStringValue(String literalString)
 {
+    if (literalString == null) {
+        return null;
+    }
     String escaped = literalString;
     escaped = escaped.replace("\\", "\\\\");
     escaped = escaped.replace("\"", "\\\"");
     escaped = escaped.replace("?", "\\?");
     escaped = escaped.replace("*", "\\*");
     return escaped;
 }
presto-clp/src/test/resources/test-metadata-filter.json (1)

31-31: Add missing newline at end of file.

Consistent with past review feedback, the file should end with a newline.

 }
+
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

61-67: Improve timestamp documentation with meaningful values.

The timestamp generation logic is well-commented now, but the base timestamp values are still arbitrary. Consider using more readable values as suggested in past reviews.

-                // We generate synthetic begin_timestamp and end_timestamp values for each split
-                // by offsetting two base timestamps (1700000000000L and 1705000000000L) with a
-                // fixed increment per split (10^10 * j).
+                // Generate synthetic timestamps with a clear pattern:
+                // Begin: 1700000000000L (2023-11-15 06:13:20 UTC) + 10 billion * j  
+                // End:   1705000000000L (2024-01-12 07:06:40 UTC) + 10 billion * j
                 values.add(new ArchivesTableRow(
                         "id_" + j,
                         1700000000000L + 10000000000L * j,
                         1705000000000L + 10000000000L * j));
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)

221-229: Consider adding a getter method for dbPath.

The DbHandle class has a private dbPath field that's accessed directly in other methods. Adding a getter would improve encapsulation, as suggested in past reviews.

 static final class DbHandle
 {
     private final String dbPath;

     DbHandle(String dbPath)
     {
         this.dbPath = dbPath;
     }
+
+    String getDbPath()
+    {
+        return dbPath;
+    }
 }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1)

182-182: Fix typo in method name

The method name has a typo: "Filer" should be "Filter".

-    private Set<String> getAllFilerNamesFromTableConfig(TableConfig tableConfig)
+    private Set<String> getAllFilterNamesFromTableConfig(TableConfig tableConfig)

Also update the references to this method on lines 170, 173, and 177.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee181b8 and a9c4d8d.

📒 Files selected for processing (8)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (9 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (4 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4 hunks)
  • presto-clp/src/test/resources/test-metadata-filter.json (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (3 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-docs/src/main/sphinx/connector/clp.rst (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (3)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
  • ClpSchemaTree (44-211)
presto-common/src/main/java/com/facebook/presto/common/type/DoubleType.java (1)
  • DoubleType (31-173)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
  • ClpUtils (23-60)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (3)
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/SemanticException.java (1)
  • SemanticException (24-81)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
  • Test (40-170)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (1)
  • Test (29-86)
🪛 GitHub Actions: maven checks
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java

[error] 155-156: Checkstyle indentation errors: '||' has incorrect indentation level 12, expected level should be 16.

⏰ Context from checks skipped due to timeout of 90000ms (15)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (:presto-docs)
🔇 Additional comments (16)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)

56-59: LGTM - type checking logic is correct.

The method correctly identifies the numeric types supported by CLP as documented in the Javadoc reference to ClpSchemaTree.

presto-clp/src/test/resources/test-metadata-filter.json (1)

1-30: Well-structured metadata filter configuration.

The JSON configuration correctly demonstrates the hierarchical filter scopes (catalog, schema, table) and includes a good example of range mapping for timestamp filtering.

presto-docs/src/main/sphinx/connector/clp.rst (2)

32-32: Good addition of the new configuration property.

The metadata filter configuration property is properly documented in the configuration table.

Also applies to: 72-72


99-185: Excellent comprehensive documentation.

The metadata filter configuration documentation is thorough and well-structured. It clearly explains:

  • Hierarchical scopes (catalog, schema, table)
  • Filter definitions and range mappings
  • Practical examples with detailed explanations
  • How metadata filtering enables split pruning

This addresses previous feedback about adding examples and provides users with clear guidance.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)

82-141: Comprehensive test coverage for timestamp filtering scenarios.

The test method now covers a wide range of timestamp filtering scenarios including edge cases, boundary conditions, and non-overlapping ranges. This provides thorough validation of the metadata filtering logic.


143-169: Well-structured helper method for result comparison.

The helper method appropriately encapsulates the test logic for comparing expected vs actual split results, making the tests more maintainable and readable.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (3)

51-51: Good consistency improvement.

Renaming to ARCHIVES_STORAGE_DIRECTORY_BASE maintains consistency with the plural naming pattern used elsewhere.


123-123: Well-implemented timestamp metadata support.

The changes properly extend the archive table schema to include begin_timestamp and end_timestamp columns, updating both the table creation and data insertion logic to handle the new ArchivesTableRow structure.

Also applies to: 140-147, 151-162


231-258: Well-designed ArchivesTableRow class.

The new class properly encapsulates split metadata with immutable fields and appropriate getter methods. The design supports the timestamp-based filtering requirements effectively.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (7)

17-18: Import additions support new functionality well.

The new imports for SemanticException, ImmutableSet, Set, assertNull, and assertThrows are appropriate for the metadata SQL testing functionality being added.

Also applies to: 22-22, 26-27


35-56: Method refactoring improves maintainability.

The consistent update of all test methods to use the new tryPushDown helper method with metadata filter columns parameter is a good refactoring that centralizes the conversion logic and makes the tests more maintainable.


95-111: Comprehensive BETWEEN operator testing.

The new test cases for the BETWEEN operator cover both normal cases and edge cases (NULL values), and properly test semantic exceptions for invalid operations using assertThrows. This provides excellent coverage for the new BETWEEN support.


275-309: Thorough metadata SQL generation testing.

The new test method comprehensively tests metadata SQL generation across various scenarios including AND/OR combinations, BETWEEN operations, and cases where metadata SQL should not be generated. The test cases properly verify both KQL and metadata SQL outputs.


311-321: Well-implemented helper method.

The tryPushDown method effectively centralizes the conversion logic by creating a ClpFilterToKqlConverter with the appropriate metadata filter columns and returning a ClpExpression. This reduces code duplication and improves test maintainability.


323-352: Flexible test method design.

The modified testFilter method now accepts a ClpExpression directly and includes a remainingToRowExpression parameter to handle both RowExpression and string comparisons. This flexible design accommodates different test scenarios while maintaining clear assertions.


354-367: Effective metadata SQL testing helper.

The testFilterWithMetadataSql method provides a clean way to test both KQL expression generation and metadata SQL generation in a single call, with proper null handling for cases where metadata SQL should not be generated.

@wraymo
Copy link

wraymo commented Jun 26, 2025

Please also run mvn to fix format issues.

@anlowee anlowee force-pushed the xwei/metadata-filter-v2 branch from 6ef78f9 to 616e889 Compare June 27, 2025 08:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (2)

30-30: Remove unused logger declaration.

The logger is declared but never used in the implementation.

-    private static final Logger log = Logger.get(ClpUtils.class);

44-52: Add null check for input validation.

The method doesn't handle null input which could cause a NullPointerException.

 public static String escapeKqlSpecialCharsForStringValue(String literalString)
 {
+    if (literalString == null) {
+        return null;
+    }
     String escaped = literalString;
     escaped = escaped.replace("\\", "\\\\");
     escaped = escaped.replace("\"", "\\\"");
     escaped = escaped.replace("?", "\\?");
     escaped = escaped.replace("*", "\\*");
     return escaped;
 }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

92-97: Optimize mandatory filter validation logic.

The checkContainsAllFilters method is called even when metadataSql is empty, which could throw an exception unnecessarily.

-            // Check metadata filters before the KQL query
-            metadataFilterProvider.checkContainsAllFilters(clpTableHandle.getSchemaTableName(), metadataSql.orElse(""));
             if (metadataSql.isPresent()) {
+                metadataFilterProvider.checkContainsAllFilters(clpTableHandle.getSchemaTableName(), metadataSql.get());
                 metadataSql = Optional.of(metadataFilterProvider.remapFilterSql(scope, metadataSql.get()));
                 log.debug("Metadata filter SQL query: %s", metadataSql.get());
             }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)

75-89: Improve exception handling and add null check.

The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.

 @Inject
 public ClpMetadataFilterProvider(ClpConfig config)
 {
     requireNonNull(config, "config is null");
+    requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null");

     ObjectMapper mapper = new ObjectMapper();
     try {
         filterMap = mapper.readValue(
                 new File(config.getMetadataFilterConfig()),
                 new TypeReference<Map<String, TableConfig>>() {});
     }
     catch (IOException e) {
-        throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
+        throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e);
     }
 }

91-106: Improve filter validation logic.

The current implementation uses string contains to check for filter columns, which is fragile and could match substrings incorrectly. For example, if checking for column "id", it would incorrectly match "user_id" or "identifier".

Consider parsing the metadata SQL properly or using more precise matching:

-        for (String columnName : getFilterNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) {
-            if (!metadataFilterSql.contains(columnName)) {
+        Set<String> requiredFilters = getFilterNames(format("%s.%s", CONNECTOR_NAME, schemaTableName));
+        for (String columnName : requiredFilters) {
+            // Use word boundaries to ensure exact column name match
+            String pattern = "\\b\"?" + Pattern.quote(columnName) + "\"?\\b";
+            if (!Pattern.compile(pattern).matcher(metadataFilterSql).find()) {
                 hasAllMetadataFilterColumns = false;
                 notFoundFilterColumnNameListBuilder.add(columnName);
             }
         }

133-162: Security concern: SQL injection risk in remapFilterSql.

The remapFilterSql method uses regex-based string replacement to modify SQL expressions, which is inherently risky and could lead to SQL injection vulnerabilities. The regex patterns don't properly validate the SQL structure and could match unintended parts of the query.

Consider using a proper SQL parser to safely manipulate SQL expressions instead of regex replacements. If regex must be used, ensure proper escaping and validation:

         for (Map.Entry<String, RangeMapping> entry : mappings.entrySet()) {
-            String key = entry.getKey();
+            String key = Pattern.quote(entry.getKey());
             RangeMapping value = entry.getValue();
             remappedSql = remappedSql.replaceAll(
-                    format("\"(%s)\"\\s(>=?)\\s([0-9]*)", key),
+                    format("\\b\"%s\"\\s+(>=?)\\s+([0-9]+)\\b", key),
                     format("%s $1 $2", value.upperBound));

201-243: Add proper access modifiers and validation to nested classes.

The nested classes use package-private fields which could lead to unintended modifications. Consider making them private with proper accessors or using immutable patterns.

 private static class TableConfig
 {
-    public List<Filter> filters;
+    @JsonProperty("filters")
+    private List<Filter> filters;
+    
+    public List<Filter> getFilters()
+    {
+        return filters != null ? ImmutableList.copyOf(filters) : ImmutableList.of();
+    }
 }

 private static class RangeMapping
 {
     @JsonProperty("lowerBound")
-    public String lowerBound;
+    private String lowerBound;

     @JsonProperty("upperBound")
-    public String upperBound;
+    private String upperBound;
+    
+    public String getLowerBound()
+    {
+        return lowerBound;
+    }
+    
+    public String getUpperBound()
+    {
+        return upperBound;
+    }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef78f9 and 616e889.

📒 Files selected for processing (6)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (9 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
  • ClpUtils (28-76)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: test (:presto-docs)
🔇 Additional comments (12)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)

61-75: LGTM!

The numeric type checking logic correctly handles both CLP-specific types (BIGINT, DOUBLE) and broader numeric types based on the flag.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (2)

30-40: LGTM!

The addition of the metadataSql field and updated constructor properly extends the class to support metadata SQL alongside KQL expressions.


60-69: LGTM!

The new constructor for creating ClpExpression with both pushDownExpression and metadataSql is well-designed and follows the existing pattern.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)

131-184: LGTM!

The handleBetween method correctly:

  • Validates argument count and types
  • Ensures all arguments are numeric (with appropriate handling for CLP vs general numeric types)
  • Generates both KQL and metadata SQL expressions

269-277: LGTM!

The NOT handler correctly propagates metadata SQL by prefixing it with "NOT" when present.


655-700: LGTM!

The logical operator handlers correctly propagate metadata SQL:

  • AND includes metadata SQL when at least one child has it (appropriate for conjunction)
  • OR includes metadata SQL only when all children have it (appropriate for disjunction)

Also applies to: 714-740

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (6)

17-17: Import changes look appropriate.

The new imports (ImmutableSet, Set, Pattern) are necessary for the enhanced functionality and align with the learning preference for Guava's Immutable collections.

Also applies to: 21-22


33-55: Method naming is consistent and clear.

The standardized naming with "PushDown" suffix improves clarity and consistency across all test methods.


241-284: Comprehensive metadata SQL generation testing.

The new test method covers various scenarios including normal cases, BETWEEN operations, and complex expressions. The test logic correctly validates both the presence/absence of metadata SQL and its expected content.


286-309: Well-designed helper method overloads.

The multiple testPushDown overloads provide clean interfaces for different testing scenarios while maintaining backward compatibility. The design allows for flexible assertion of remaining expressions and metadata SQL validation.


311-321: Enhanced tryPushDown method supports new functionality.

The addition of the metadataFilterColumns parameter enables testing of the new metadata filter capability while maintaining a clean interface.


323-352: Flexible assertion logic handles different test scenarios.

The remainingToRowExpression parameter allows the method to handle both RowExpression comparison and string comparison, which is necessary for cases where expressions cannot be converted back to RowExpression (like the BETWEEN test with pattern matching).

public void testMetadataSqlGeneration()
{
SessionHolder sessionHolder = new SessionHolder();
Set<String> testMetadataFilterColumns = ImmutableSet.of("fare");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Extract repeated constant to improve maintainability.

The ImmutableSet.of("fare") is repeated multiple times throughout the test method. Consider extracting it to a class-level constant for better maintainability.

+    private static final Set<String> TEST_METADATA_FILTER_COLUMNS = ImmutableSet.of("fare");
+
     @Test
     public void testMetadataSqlGeneration()
     {
         SessionHolder sessionHolder = new SessionHolder();
-        Set<String> testMetadataFilterColumns = ImmutableSet.of("fare");
 
         // Normal case
         testPushDown(
                 sessionHolder,
                 "(fare > 0 AND city.Name like 'b%')",
                 "(fare > 0 AND city.Name: \"b*\")",
                 "(\"fare\" > 0)",
-                testMetadataFilterColumns);
+                TEST_METADATA_FILTER_COLUMNS);

Also applies to: 253-253, 261-261, 269-269, 275-275, 283-283

🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
around lines 245, 253, 261, 269, 275, and 283, the ImmutableSet.of("fare") is
repeatedly used. Extract this repeated ImmutableSet.of("fare") into a private
static final class-level constant and replace all occurrences with this constant
to improve maintainability and reduce duplication.

@anlowee anlowee requested a review from wraymo June 27, 2025 11:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (2)

30-30: Remove unused logger declaration.

The logger is declared but never used in the current implementation.


44-52: Consider adding input validation.

The method doesn't handle null input which could cause a NullPointerException. Consider adding a null check or documenting the precondition.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

92-97: Optimize mandatory filter validation logic.

The checkContainsAllFilters method is called even when metadataSql is empty, which could throw an exception unnecessarily. Consider checking only when metadata SQL is present.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)

226-226: Consider extracting metadata filter columns to a constant.

The ImmutableSet.of("fare") is used multiple times in this test method. Consider extracting it to improve maintainability:

+    private static final Set<String> FARE_METADATA_FILTER_COLUMNS = ImmutableSet.of("fare");
+
     @Test
     public void testMetadataSqlGeneration()
     {
         SessionHolder sessionHolder = new SessionHolder();
-        Set<String> testMetadataFilterColumns = ImmutableSet.of("fare");

         // Normal case
         testPushDown(
                 sessionHolder,
                 "(fare > 0 AND city.Name like 'b%')",
                 "(fare > 0 AND city.Name: \"b*\")",
                 "(\"fare\" > 0)",
-                testMetadataFilterColumns);
+                FARE_METADATA_FILTER_COLUMNS);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 616e889 and ed8e2b4.

📒 Files selected for processing (7)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (9 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (3 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
presto-docs/src/main/sphinx/connector/clp.rst (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
  • ClpUtils (28-71)
🪛 GitHub Actions: maven checks
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java

[error] 109-109: Checkstyle violation: WhitespaceAfter: ',' is not followed by whitespace.

⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (:presto-docs)
🔇 Additional comments (13)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)

61-70: LGTM!

The numeric type checking is comprehensive and correctly covers all numeric types supported by the CLP connector.

presto-docs/src/main/sphinx/connector/clp.rst (1)

99-189: Well-documented metadata filter configuration.

The documentation clearly explains the hierarchical metadata filtering feature with a comprehensive example. The note about all filters being required and pushdown-eligible is particularly important for users to understand.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)

30-89: Clean integration of metadata SQL support.

The new metadataSql field is properly integrated with appropriate constructors and getter method. The backward compatibility is maintained through constructor overloading.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

61-107: Comprehensive timestamp-based metadata filtering tests.

The test enhancements effectively validate split filtering with various timestamp conditions. The synthetic timestamp generation is well-documented and the test cases cover important boundary conditions including edge cases.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (7)

90-101: Constructor changes look good!

The new metadataFilterColumns parameter is properly validated and stored.


123-125: BETWEEN operator handling properly integrated.

The check and delegation to handleBetween is correctly placed.


131-177: Well-implemented BETWEEN handler!

The method properly validates all inputs, correctly generates KQL expressions, and conditionally creates metadata SQL with proper quoting.


263-269: Metadata SQL propagation through NOT is correct.

Good use of a variable to avoid duplicating the "NOT " prefix.


416-443: Metadata SQL generation logic is well-structured.

Correctly handles different operators and properly quotes variable names in metadata SQL while only escaping special characters in KQL expressions.


644-687: AND operator correctly propagates metadata SQL.

The logic to include metadata SQL when at least one child has it is appropriate for AND operations.


703-727: OR operator metadata SQL logic is correct.

Requiring all children to have metadata SQL for OR operations is the right approach, contrasting appropriately with the AND logic.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2)

32-221: Test method naming is now consistent.

Good refactoring to use the testPushDown helper method throughout.


267-321: Helper methods are well-designed.

The overloaded testPushDown methods and refactored testFilter method provide good separation of concerns.

@anlowee anlowee requested a review from kirkrodrigues June 30, 2025 14:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (7)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (2)

77-103: Consider adding edge case tests.

While the current test coverage is good, consider adding tests for:

  • Invalid SQL expressions
  • Filters without range mapping
  • Empty or malformed configuration files

77-103: Add edge case tests for SQL remapping.

The current SQL remapping tests cover basic comparison operators well, but consider adding tests for edge cases as mentioned in previous reviews.

Add the following test cases to improve coverage:

@Test
public void remapSqlEdgeCases()
{
    ClpConfig config = new ClpConfig();
    config.setMetadataFilterConfig(filterConfigPath);
    ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config);

    // Test filters without range mapping
    String metadataFilterSql = "(\"level\" >= 1 AND \"level\" <= 3)";
    String remappedSql = filterProvider.remapFilterSql("clp.default.table_1", metadataFilterSql);
    assertEquals(remappedSql, metadataFilterSql); // Should remain unchanged

    // Test invalid SQL expressions
    assertThrows(Exception.class, () -> filterProvider.remapFilterSql("clp.default.table_1", "INVALID SQL"));
    
    // Test empty filter expression
    String emptyFilter = "";
    String remappedEmpty = filterProvider.remapFilterSql("clp.default.table_1", emptyFilter);
    assertEquals(remappedEmpty, emptyFilter);
}

@Test
public void testMalformedConfig()
{
    // Test with non-existent config file
    ClpConfig config = new ClpConfig();
    config.setMetadataFilterConfig("/non/existent/path.json");
    assertThrows(Exception.class, () -> new ClpMetadataFilterProvider(config));
}
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

92-97: Optimize required filter validation logic.

The checkContainsRequiredFilters method is called even when metadataSqlQuery is empty, which could throw an exception unnecessarily. Consider checking only when metadata SQL is present.

Apply this diff to optimize the logic:

-            // Check required metadata filters before the KQL query
-            metadataFilterProvider.checkContainsRequiredFilters(clpTableHandle.getSchemaTableName(), metadataSqlQuery.orElse(""));
             if (metadataSqlQuery.isPresent()) {
+                metadataFilterProvider.checkContainsRequiredFilters(clpTableHandle.getSchemaTableName(), metadataSqlQuery.get());
                 metadataSqlQuery = Optional.of(metadataFilterProvider.remapFilterSql(scope, metadataSqlQuery.get()));
                 log.debug("Metadata SQL query: %s", metadataSqlQuery);
             }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)

81-95: Improve exception handling and add null check.

The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.

Apply this diff:

     @Inject
     public ClpMetadataFilterProvider(ClpConfig config)
     {
         requireNonNull(config, "config is null");
+        requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null");
 
         ObjectMapper mapper = new ObjectMapper();
         try {
             filterMap = mapper.readValue(
                     new File(config.getMetadataFilterConfig()),
                     new TypeReference<Map<String, List<Filter>>>() {});
         }
         catch (IOException e) {
-            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
+            throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e);
         }
     }

97-112: Improve filter validation logic.

The current implementation uses string contains to check for filter columns, which is fragile and could match substrings incorrectly. For example, if checking for column "id", it would incorrectly match "user_id" or "identifier".

Consider parsing the metadata SQL properly or using more precise matching:

+        Set<String> requiredFilters = getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName));
         for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) {
-            if (!metadataFilterSql.contains(columnName)) {
+            // Use word boundaries to ensure exact column name match
+            String pattern = "\\b\"?" + Pattern.quote(columnName) + "\"?\\b";
+            if (!Pattern.compile(pattern).matcher(metadataFilterSql).find()) {
                 hasRequiredMetadataFilterColumns = false;
                 notFoundListBuilder.add(columnName);
             }
         }

139-168: Security concern: SQL injection risk in remapFilterSql.

The method uses regex-based string replacement to modify SQL expressions, which is inherently risky and could lead to SQL injection vulnerabilities. The regex patterns don't properly validate the SQL structure and could match unintended parts of the query.

Consider using a proper SQL parser to safely manipulate SQL expressions instead of regex replacements. If regex must be used, ensure proper escaping and validation:

         for (Map.Entry<String, RangeMapping> entry : mappings.entrySet()) {
-            String key = entry.getKey();
+            String key = Pattern.quote(entry.getKey());
             RangeMapping value = entry.getValue();
             remappedSql = remappedSql.replaceAll(
-                    format("\"(%s)\"\\s(>=?)\\s([0-9]*)", key),
+                    format("\\b\"%s\"\\s+(>=?)\\s+([0-9]+)\\b", key),
                     format("%s $2 $3", value.upperBound));

224-251: Add proper access modifiers to RangeMapping class.

The nested class uses package-private fields which could lead to unintended modifications. Consider making them private with proper accessors.

Apply this diff:

     private static class RangeMapping
     {
         @JsonProperty("lowerBound")
-        public String lowerBound;
+        private String lowerBound;
 
         @JsonProperty("upperBound")
-        public String upperBound;
+        private String upperBound;
+        
+        public String getLowerBound()
+        {
+            return lowerBound;
+        }
+        
+        public String getUpperBound()
+        {
+            return upperBound;
+        }

Also update the remapFilterSql method to use the getters:

-                    format("%s $2 $3", value.upperBound));
+                    format("%s $2 $3", value.getUpperBound()));
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f716bf and 08cf54e.

📒 Files selected for processing (3)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (2)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (33-72)
⏰ Context from checks skipped due to timeout of 90000ms (15)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: test (:presto-docs)
🔇 Additional comments (11)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (5)

37-46: LGTM!

The setUp method correctly loads the test resource file from the classpath with proper null checking and exception handling.


48-61: Good test coverage for required filter validation.

The test properly validates both the failure case (missing required filter) and success case (required filter present).


63-75: Well-structured test for hierarchical filter retrieval.

The test effectively validates filter name aggregation across catalog, schema, and table scopes, properly using ImmutableSet for assertions.


32-46: LGTM: Proper resource loading with exception handling.

The setup method correctly handles resource loading with appropriate exception handling for missing files and URI conversion issues.


63-75: LGTM: Comprehensive hierarchical filter name validation.

The test properly validates filter name retrieval at catalog, schema, and table levels with expected hierarchical inheritance.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3)

35-35: Appropriate additions for metadata filter support.

The static import and new field are correctly added to support metadata filtering functionality.

Also applies to: 45-45


47-52: Constructor properly updated with null check.

The constructor correctly accepts the new dependency and validates it with requireNonNull, following the existing pattern.


81-90: Clean integration of metadata filter columns.

The code properly constructs the scope and passes metadata filter column names to the converter, enabling metadata SQL generation alongside KQL.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (3)

43-77: Excellent documentation!

The class-level JavaDoc provides comprehensive information about metadata filter configuration, including clear examples and field descriptions.


180-196: Well-designed abstraction for scope-based collection.

The collectColumnNamesFromScopes method effectively reduces code duplication and properly uses ImmutableSet.Builder as preferred in the codebase.


198-222: Clean implementation of filter processing helpers.

All helper methods properly handle null cases and correctly use Guava's immutable collections as preferred in the codebase.

Comment on lines +253 to +263
private static class Filter
{
@JsonProperty("columnName")
public String columnName;

@JsonProperty("rangeMapping")
public RangeMapping rangeMapping;

@JsonProperty("required")
public boolean required;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve encapsulation for Filter class.

Similar to RangeMapping, the Filter class should use private fields with proper accessors.

Apply this diff:

     private static class Filter
     {
         @JsonProperty("columnName")
-        public String columnName;
+        private String columnName;
 
         @JsonProperty("rangeMapping")
-        public RangeMapping rangeMapping;
+        private RangeMapping rangeMapping;
 
         @JsonProperty("required")
-        public boolean required;
+        private boolean required;
+        
+        public String getColumnName()
+        {
+            return columnName;
+        }
+        
+        public RangeMapping getRangeMapping()
+        {
+            return rangeMapping;
+        }
+        
+        public boolean isRequired()
+        {
+            return required;
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private static class Filter
{
@JsonProperty("columnName")
public String columnName;
@JsonProperty("rangeMapping")
public RangeMapping rangeMapping;
@JsonProperty("required")
public boolean required;
}
private static class Filter
{
@JsonProperty("columnName")
private String columnName;
@JsonProperty("rangeMapping")
private RangeMapping rangeMapping;
@JsonProperty("required")
private boolean required;
public String getColumnName()
{
return columnName;
}
public RangeMapping getRangeMapping()
{
return rangeMapping;
}
public boolean isRequired()
{
return required;
}
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
around lines 253 to 263, the Filter class currently uses public fields which
breaks encapsulation. Change the fields columnName, rangeMapping, and required
to private and add corresponding public getter and setter methods for each field
to properly encapsulate the data, similar to the RangeMapping class.

Comment on lines +48 to +61
@Test
public void checkRequiredFilters()
{
ClpConfig config = new ClpConfig();
config.setMetadataFilterConfig(filterConfigPath);
ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config);
SchemaTableName testTableSchemaTableName = new SchemaTableName("default", "table_1");
assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
testTableSchemaTableName,
"(\"level\" >= 1 AND \"level\" <= 3)"));
filterProvider.checkContainsRequiredFilters(
testTableSchemaTableName,
"(\"msg.timestamp\" > 1234 AND \"msg.timestamp\" < 5678)");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Enhance exception validation specificity.

The test correctly validates required filter enforcement, but consider making the exception assertion more specific by checking the exception message or cause.

-        assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
-                testTableSchemaTableName,
-                "(\"level\" >= 1 AND \"level\" <= 3)"));
+        PrestoException exception = assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
+                testTableSchemaTableName,
+                "(\"level\" >= 1 AND \"level\" <= 3)"));
+        // Optionally verify exception message contains expected content
+        // assertTrue(exception.getMessage().contains("required filter"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Test
public void checkRequiredFilters()
{
ClpConfig config = new ClpConfig();
config.setMetadataFilterConfig(filterConfigPath);
ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config);
SchemaTableName testTableSchemaTableName = new SchemaTableName("default", "table_1");
assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
testTableSchemaTableName,
"(\"level\" >= 1 AND \"level\" <= 3)"));
filterProvider.checkContainsRequiredFilters(
testTableSchemaTableName,
"(\"msg.timestamp\" > 1234 AND \"msg.timestamp\" < 5678)");
}
@Test
public void checkRequiredFilters()
{
ClpConfig config = new ClpConfig();
config.setMetadataFilterConfig(filterConfigPath);
ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config);
SchemaTableName testTableSchemaTableName = new SchemaTableName("default", "table_1");
- assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
- testTableSchemaTableName,
- "(\"level\" >= 1 AND \"level\" <= 3)"));
+ PrestoException exception = assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
+ testTableSchemaTableName,
+ "(\"level\" >= 1 AND \"level\" <= 3)"));
+ // Optionally verify exception message contains expected content
+ // assertTrue(exception.getMessage().contains("required filter"));
filterProvider.checkContainsRequiredFilters(
testTableSchemaTableName,
"(\"msg.timestamp\" > 1234 AND \"msg.timestamp\" < 5678)");
}
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java
around lines 48 to 61, the test asserts that a PrestoException is thrown but
does not verify the exception message or cause. Modify the assertThrows call to
capture the thrown exception and add assertions to check that the exception
message or cause matches the expected error details, ensuring more precise
validation of the failure condition.

wraymo
wraymo previously approved these changes Jul 4, 2025
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of clp.rst.

Comment on lines 67 to 74
* A translator to translate Presto RowExpressions into KQL (Kibana Query Language) filters used as
* CLP queries. This is used primarily for pushing down supported filters to the CLP engine. This
* class implements the RowExpressionVisitor interface and recursively walks Presto filter
* expressions, attempting to convert supported expressions into corresponding KQL filter strings.
* Any part of the expression that cannot be translated is preserved as a "remaining expression" for
* potential fallback processing.
* CLP queries and SQL filters used for filtering splits by metadata database. This is used
* primarily for pushing down supported filters to the CLP engine. This class implements the
* {@link RowExpressionVisitor} interface and recursively walks Presto filter expressions,
* attempting to convert supported expressions into corresponding KQL filter strings and SQL
* filter strings for metadata filtering. Any part of the expression that cannot be translated to
* KQL is preserved as a "remaining expression" for potential fallback processing.
* <p></p>
Copy link
Member

@kirkrodrigues kirkrodrigues Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't suggest but:

 * A translator to translate Presto RowExpressions into:
 * <ul>
 *     <li>KQL (Kibana Query Language) filters used to push down supported filters to the CLP
 *     engine.</li>
 *     <li>SQL filters used for filtering splits in CLP's metadata database.</li>
 * </ul>
 * This class implements the {@link RowExpressionVisitor} interface and recursively walks Presto
 * filter expressions, attempting to convert supported expressions into corresponding KQL filter
 * strings and SQL filter strings for metadata filtering. Any part of the expression that cannot be
 * translated to KQL is preserved as a "remaining expression" for potential fallback processing.
 * <p></p>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the first line I changed to:

A translator to translate Presto {@link RowExpression}s into:

Comment on lines 212 to 214
* <ul>All arguments must have numeric types.</ul>
* <ul>The first argument is a variable reference expression.</ul>
* <ul>The second and third arguments are constant expressions.</ul>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think you meant to do <li>?
  • If the first sentence before the list is incomplete, then each bullet should start with a small letter as if it's continuing the sentence before the list.

Optional<String> metadataSqlQuery = clpExpression.getMetadataSqlQuery();
Optional<RowExpression> remainingPredicate = clpExpression.getRemainingExpression();

// Check required metadata filters before the KQL query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becuase the kqlQuery.isPresent() might be false, and if that case it will directly return the node (see L99~101) and won't execute the rest code. So we have to put the required fields check before it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some of that detail to the comment? Otherwise it won't be clear to a future developer unless they read the code in detail, in which case the current comment wouldn't help.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

List<ClpSplit> splits = clpSplitProvider.listSplits(layoutHandle);
assertEquals(splits.size(), expectedSplits.size());
for (Map.Entry<String, List<ArchivesTableRow>> entry : tableSplits.entrySet()) {
// Without metadata filters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have an empty line before each comment in this method.

@anlowee anlowee force-pushed the xwei/metadata-filter-v2 branch from a75d694 to a6341a3 Compare July 7, 2025 13:10
@anlowee anlowee force-pushed the xwei/metadata-filter-v2 branch from a6341a3 to 3d3eda4 Compare July 7, 2025 13:27
@anlowee anlowee requested a review from kirkrodrigues July 7, 2025 19:25
* <p></p>
* The configuration file is specified by the {@code clp.metadata-filter-config} property
* and defines metadata filters used to optimize query execution through split pruning.
* <p></p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* <p></p>
* <p></p>
* Each filter config indicates how a data column--a column in the Presto table--should be mapped to
* a metadata column--a column in CLPs metadata database.
* <p></p>

Comment on lines 55 to 56
* <li><b>{@code columnName}</b>: the name of a column in the table's logical schema. Currently,
* only numeric-type columns can be used as metadata filters.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* <li><b>{@code columnName}</b>: the name of a column in the table's logical schema. Currently,
* only numeric-type columns can be used as metadata filters.</li>
* <li><b>{@code columnName}</b>: the data column's name. Currently, only numeric-type columns can
* be used as metadata filters.</li>

Comment on lines 70 to 87
* <p>
* For example, a condition such as:
* </p>
* <pre>{@code
* "msg.timestamp" > 1234 AND "msg.timestamp" < 5678
* }</pre>
*
* <p>
* will be rewritten as:
* </p>
* <pre>{@code
* "end_timestamp" > 1234 AND "begin_timestamp" < 5678
* }</pre>
*
* <p>
* This ensures the filter applies to a superset of the actual result set, enabling safe
* pruning.
* </p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we need this example?
  • I think the last sentence might confuse readers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, removed the example and the last sentence

Comment on lines 92 to 94
// Perform required metadata filter checks before handling the KQL query. If kqlQuery
// is not present, the method returns early skipping all subsequent logic. Therefore,
// the check must happen before that return.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Perform required metadata filter checks before handling the KQL query. If kqlQuery
// is not present, the method returns early skipping all subsequent logic. Therefore,
// the check must happen before that return.
// Perform required metadata filter checks before handling the KQL query (if kqlQuery
// isn't present, we'll return early, skipping subsequent checks).

@anlowee anlowee requested a review from kirkrodrigues July 7, 2025 22:26
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat: Add support for defining and using specific columns for metadata filtering.

@anlowee anlowee changed the title feat: Add metadata filtering based on archive-level timestamp. feat: Add support for defining and using specific columns for metadata filtering. Jul 7, 2025
@anlowee anlowee merged commit 9c86114 into y-scope:release-0.293-clp-connector Jul 7, 2025
29 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Oct 27, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet