Skip to content

Conversation

@wraymo
Copy link

@wraymo wraymo commented Jun 20, 2025

Description

This PR adds ClpConnectorPlanOptimizer and ClpFilterToKqlConverter and related unit tests to allow the CLP connector to push down relevant filters to CLP for improved query performance. More details can be found in RFC.

There are three main steps

  1. Transforming filter predicates into KQL queries
  2. Adding the generated KQL query to ClpTableLayoutHandle and constructing a new TableScanNode
  3. Reconstructing a FilterNode with any remaining predicates and the new TableScanNode

ClpFilterToKqlConverter implements RowExpressionVisitor<ClpExpression, Void> and handles expression transformation and pushdown. Since KQL is not SQL-compatible, only certain types of filters can be converted, including:

  • String exact match
  • LIKE predicate (the "^%[^%_]*%$" pattern is not pushed down)
  • Numeric comparisons
  • Logical operators (AND, OR, NOT)
  • IS NULL
  • substr(a, start, length) = 'abc' and substr(a, start) = 'abc' forms

For comparison or match expressions, one side must contain a VariableReferenceExpression, and the other must be a ConstantExpression--specifically a string or numeric literal. Expressions like a > c or substr(a, 2) = lower(c) are not eligible for pushdown. In simple cases without logical operators, the SQL plan can be directly translated into a KQL query. However, for expressions involving logical operators, it's critical to ensure that all conditions are handled correctly.

A naive approach would be to convert only the pushdown-eligible parts of the SQL query into KQL, letting Presto or Prestissimo handle the rest. But this can lead to incorrect execution plans and unintended behavior.

Consider the following query:

SELECT * from clp_table WHERE regexp_like(a, '\d+b') OR b = 2

Since CLP doesn’t currently support regexp_like, if we simply push down b = 2 to CLP, Presto will only receive results where b = 2, effectively changing the query semantics to regexp_like(a, '\d+b') AND b = 2.

To prevent such issues, the pushdown logic follows these rules:

  • OR pushdown: An OR condition is pushable if and only if all child expressions can be pushed down. In this case, all child expressions are pushed down together with the OR operator.
  • AND pushdown: An AND condition is not pushable if and only if none of its child expressions are pushable. Otherwise, pushable expressions are pushed down with the AND operator, while non-pushable expressions remain in the original plan.
  • General Pushdown Condition: An expression is considered pushable if it’s a CallExpression and can be transformed into KQL syntax.

Example transformations:

  1. SQL WHERE Clause: a = 2 AND b = '3'

    • KQL: a: 2 AND b: "3"
    • Effect: The FilterNode is removed.
  2. SQL WHERE Clause: a.b LIKE '%string another_string' OR "c.d" = TRUE

    • KQL: a.b: "*string another_string" OR c.d: true
    • Effect: The FilterNode is removed.
  3. SQL WHERE Clause: a > 2 OR regexp_like(b, '\d+b')

    • KQL: "*"
    • Effect: a > 2 OR regexp_like(b, '\d+b') remains in the FilterNode.
  4. SQL WHERE Clause: a > 2 AND regexp_like(b, '\d+b')

    • KQL: a > 2
    • Effect: regexp_like(b, '\d+b') remains in the FilterNode.
  5. SQL WHERE Clause: NOT substr(a, 2, 5) = 'Hello' and b IS NULL

    • KQL: NOT a: "?Hello*" AND NOT b: *
    • Effect: The FilterNode is removed.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

  • New Features
    • Added support for pushing down Presto filter expressions to CLP using KQL queries, improving query performance by reducing data scanned.
    • Introduced advanced filter translation, including support for logical operators, comparison operators, LIKE, IN, IS NULL, and substring matching.
    • Implemented a plan optimizer to rewrite query plans by embedding KQL filters into table scans for efficient pushdown.
    • Provided a plan optimizer provider to integrate the new optimizer into the query planning process.
  • Tests
    • Added comprehensive tests to verify correct translation of SQL filters to KQL and proper handling of various filter scenarios.
  • Chores
    • Introduced utility classes to support test infrastructure and expression handling for the new features.
    • Corrected naming inconsistencies related to archive table suffixes in MySQL split provider and test setup.

@coderabbitai
Copy link

coderabbitai bot commented Jun 20, 2025

Walkthrough

This change introduces new classes to enable filter pushdown in the CLP connector by translating Presto filter expressions into KQL queries. It adds a converter for expression translation, a plan optimizer to rewrite query plans with pushdown, a provider for optimizer registration, and comprehensive tests plus utilities for validation. Additionally, it fixes naming inconsistencies related to archive table suffix constants.

Changes

File(s) Change Summary
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java Added ClpExpression class encapsulating the result of converting a Presto RowExpression into a KQL query, supporting fully, partially, or non-translatable expressions with multiple constructors and getters.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java Introduced ClpFilterToKqlConverter implementing RowExpressionVisitor to translate Presto filter expressions into KQL strings, supporting logical, comparison, LIKE, IN, IS NULL, dereference, and substring expressions for pushdown.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java Added ClpPlanOptimizer implementing ConnectorPlanOptimizer to rewrite query plans by pushing down filter predicates into table scans using KQL, handling remaining predicates as needed via a nested rewriter class.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java Added ClpPlanOptimizerProvider implementing ConnectorPlanOptimizerProvider to supply the physical plan optimizer for the CLP connector, constructed with required dependencies.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java Renamed constant ARCHIVE_TABLE_SUFFIX to ARCHIVES_TABLE_SUFFIX and updated SQL query template string accordingly to maintain consistent plural naming.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java Fixed naming inconsistencies by replacing singular ARCHIVE_TABLE_SUFFIX and ARCHIVE_TABLE_COLUMN_PAGINATION_ID with plural forms imported from ClpMySqlSplitProvider, aligning table name and column name constants.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java Added TestClpFilterToKql test class with multiple test methods validating SQL filter expression conversion to KQL, covering string matching, substring, numeric comparisons, logical operators, IN, IS NULL, and complex pushdown scenarios.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java Added TestClpQueryBase utility class providing metadata setup, expression parsing, translation utilities, and session management to support filter-to-KQL conversion tests.

Sequence Diagram(s)

sequenceDiagram
    participant PrestoPlanner
    participant ClpPlanOptimizer
    participant ClpFilterToKqlConverter
    participant ClpExpression
    participant TableScanNode

    PrestoPlanner->>ClpPlanOptimizer: optimize(plan, session, ...)
    ClpPlanOptimizer->>ClpFilterToKqlConverter: convert(filter predicate)
    ClpFilterToKqlConverter->>ClpExpression: return (KQL string, remaining expr)
    ClpPlanOptimizer-->>TableScanNode: create new TableScanNode with KQL (if possible)
    alt Remaining predicate exists
        ClpPlanOptimizer-->>PrestoPlanner: create new FilterNode on top of TableScanNode
    else No remaining predicate
        ClpPlanOptimizer-->>PrestoPlanner: replace FilterNode with TableScanNode
    end
Loading
sequenceDiagram
    participant TestClass
    participant TestClpQueryBase
    participant ClpFilterToKqlConverter
    participant ClpExpression

    TestClass->>TestClpQueryBase: getRowExpression(SQL, session)
    TestClass->>ClpFilterToKqlConverter: convert(RowExpression)
    ClpFilterToKqlConverter-->>ClpExpression: return (KQL string, remaining expr)
    TestClass->>TestClass: assert KQL and remaining as expected
Loading

Suggested reviewers

  • anlowee
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f33875 and e94e218.

📒 Files selected for processing (6)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (16)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)

26-51: Clean and well-structured optimizer provider implementation.

The implementation correctly follows the provider pattern with proper dependency injection and clear separation between logical and physical plan optimizers. The code is concise and follows Java best practices.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)

20-28: Excellent class documentation.

The class-level documentation clearly explains the three conversion scenarios and how the fields are used in each case. This makes the API intent very clear to users.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3)

72-88: Proper pattern matching and early exit.

The implementation correctly identifies the target optimization pattern (FilterNode over TableScanNode) and exits early when the pattern doesn't match. The type casting is safe after the instanceof check.


90-113: Correct plan reconstruction logic.

The implementation properly handles both scenarios:

  1. Fully pushed down predicates: returns only the new TableScanNode
  2. Partially pushed down predicates: creates a new FilterNode with remaining predicates

The TableScanNode construction preserves all necessary properties while embedding the KQL query in the layout handle.


82-88: We should inspect the surrounding code in ClpPlanOptimizer to confirm whether exceptions from the converter are caught. Please run:

#!/bin/bash
# Show entire optimize() in ClpPlanOptimizer to check exception handling
sed -n '1,200p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (2)

77-93: Well-designed test schema with realistic complexity.

The test schema includes complex nested types (RowType with Region containing Id and Name fields) which will help validate that the converter handles complex column references correctly. The variable-to-column mapping is properly constructed.


95-122: Comprehensive expression conversion utilities.

The utility methods provide a complete pipeline from SQL strings to RowExpressions, including proper type analysis and session context. This matches how expressions are processed in the actual Presto engine.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (6)

29-48: Well-structured test helper method.

The testFilter method provides a clean interface for testing conversion scenarios, properly handling both the KQL output and remaining expression validation. The assertion logic correctly handles optional values.


50-73: Comprehensive string matching test coverage.

The tests cover various LIKE predicate patterns and their KQL translations, including edge cases like middle wildcards that cannot be pushed down. The test cases demonstrate understanding of KQL limitations.


85-87: Good edge case testing for invalid substring operations.

The tests correctly verify that invalid substring operations (like requesting more characters than the target string length) are not pushed down, maintaining query correctness.


112-134: Excellent logical operator testing with mixed scenarios.

The OR tests properly validate that when any child expression cannot be pushed down, the entire OR expression remains in the filter. This is crucial for maintaining query semantics.


136-159: Correct AND operator partial pushdown logic.

The AND tests demonstrate that pushable expressions are correctly separated from non-pushable ones, allowing partial optimization while preserving correctness. This is a sophisticated optimization that the tests validate well.


204-214: Complex scenario testing validates real-world usage.

The complex pushdown tests combine multiple logical operators and expression types, ensuring the converter handles realistic query patterns correctly. These tests provide confidence in the implementation's robustness.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)

52-679: Well-structured implementation with comprehensive KQL conversion support.

The implementation provides thorough support for converting various SQL constructs to KQL, with good documentation and error handling. The visitor pattern is well-utilized, and the code handles complex cases like substring matching elegantly.


289-303: Verify KQL syntax for IS NULL predicate.

The implementation converts IS NULL to NOT col: *. Please verify this is the correct KQL syntax for NULL checks in your CLP implementation, as KQL syntax can vary between systems.

What is the correct KQL syntax to check if a field is NULL or does not exist?

487-500: ```shell
#!/bin/bash
set -euo pipefail

Inspect the definition and invocation sites for parseLengthLiteralOrFunction

rg -n "parseLengthLiteralOrFunction" -A5 -B5 presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java


</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines 37 to 59
public ClpExpression(String definition, RowExpression remainingExpression)
{
this.definition = Optional.ofNullable(definition);
this.remainingExpression = Optional.ofNullable(remainingExpression);
}

// Creates an empty ClpExpression (no KQL definition, no remaining expression).
public ClpExpression()
{
this (null, null);
}

// Creates a ClpExpression from a fully translatable KQL string.
public ClpExpression(String definition)
{
this(definition, null);
}

// Creates a ClpExpression from a non-translatable RowExpression.
public ClpExpression(RowExpression remainingExpression)
{
this(null, remainingExpression);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider improving constructor parameter validation.

The constructors currently accept null parameters and wrap them in Optional, which could lead to confusion. Consider making the intent clearer by either:

  1. Using explicit null checks with meaningful parameter names
  2. Providing static factory methods with clearer semantics
// Alternative approach with static factory methods
+public static ClpExpression fullyTranslatable(String definition) {
+    return new ClpExpression(requireNonNull(definition), null);
+}
+
+public static ClpExpression partiallyTranslatable(String definition, RowExpression remaining) {
+    return new ClpExpression(requireNonNull(definition), requireNonNull(remaining));
+}
+
+public static ClpExpression nonTranslatable(RowExpression remaining) {
+    return new ClpExpression(null, requireNonNull(remaining));
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public ClpExpression(String definition, RowExpression remainingExpression)
{
this.definition = Optional.ofNullable(definition);
this.remainingExpression = Optional.ofNullable(remainingExpression);
}
// Creates an empty ClpExpression (no KQL definition, no remaining expression).
public ClpExpression()
{
this (null, null);
}
// Creates a ClpExpression from a fully translatable KQL string.
public ClpExpression(String definition)
{
this(definition, null);
}
// Creates a ClpExpression from a non-translatable RowExpression.
public ClpExpression(RowExpression remainingExpression)
{
this(null, remainingExpression);
}
public ClpExpression(String definition, RowExpression remainingExpression)
{
this.definition = Optional.ofNullable(definition);
this.remainingExpression = Optional.ofNullable(remainingExpression);
}
// Creates an empty ClpExpression (no KQL definition, no remaining expression).
public ClpExpression()
{
this(null, null);
}
// Creates a ClpExpression from a fully translatable KQL string.
public ClpExpression(String definition)
{
this(definition, null);
}
// Creates a ClpExpression from a non-translatable RowExpression.
public ClpExpression(RowExpression remainingExpression)
{
this(null, remainingExpression);
}
// Alternative approach with static factory methods
public static ClpExpression fullyTranslatable(String definition) {
return new ClpExpression(requireNonNull(definition), null);
}
public static ClpExpression partiallyTranslatable(String definition, RowExpression remaining) {
return new ClpExpression(requireNonNull(definition), requireNonNull(remaining));
}
public static ClpExpression nonTranslatable(RowExpression remaining) {
return new ClpExpression(null, requireNonNull(remaining));
}
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 37 to 59, the constructors accept null parameters and wrap them in
Optional, which can be confusing. To fix this, add explicit null checks in the
constructors to validate parameters and throw IllegalArgumentException if
invalid. Alternatively, refactor by making constructors private and provide
static factory methods with descriptive names for creating instances with
definition, remainingExpression, or empty, to clarify intent and avoid null
usage.

Comment on lines +124 to +144
protected static class SessionHolder
{
private final ConnectorSession connectorSession;
private final Session session;

public SessionHolder()
{
connectorSession = SESSION;
session = TestingSession.testSessionBuilder(createTestingSessionPropertyManager(new SystemSessionProperties().getSessionProperties())).build();
}

public ConnectorSession getConnectorSession()
{
return connectorSession;
}

public Session getSession()
{
return session;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider extracting session creation to reduce duplication.

The SessionHolder class creates sessions in each test, which could lead to duplication if many test classes extend this base.

Consider providing a static factory method or singleton pattern for commonly used session configurations:

+protected static final SessionHolder DEFAULT_SESSION = new SessionHolder();
+
+protected static SessionHolder getDefaultSession() {
+    return DEFAULT_SESSION;
+}
🤖 Prompt for AI Agents
In presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
around lines 124 to 144, the SessionHolder class creates new session instances
in its constructor, which may cause duplication across tests. Refactor by
extracting the session creation logic into a static factory method or implement
a singleton pattern to provide commonly used session instances, so tests can
reuse these sessions instead of creating new ones each time.

Comment on lines 149 to 155
private static String getLiteralString(ConstantExpression literal)
{
if (literal.getValue() instanceof Slice) {
return ((Slice) literal.getValue()).toStringUtf8();
}
return literal.toString();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect literal string extraction for non-Slice types.

The toString() method on ConstantExpression returns the full object representation (e.g., "ConstantExpression{value=5, type=bigint}") rather than just the value. This will generate incorrect KQL queries.

Apply this fix to properly extract the literal value:

 private static String getLiteralString(ConstantExpression literal)
 {
     if (literal.getValue() instanceof Slice) {
         return ((Slice) literal.getValue()).toStringUtf8();
     }
-    return literal.toString();
+    return String.valueOf(literal.getValue());
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private static String getLiteralString(ConstantExpression literal)
{
if (literal.getValue() instanceof Slice) {
return ((Slice) literal.getValue()).toStringUtf8();
}
return literal.toString();
}
private static String getLiteralString(ConstantExpression literal)
{
if (literal.getValue() instanceof Slice) {
return ((Slice) literal.getValue()).toStringUtf8();
}
return String.valueOf(literal.getValue());
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
around lines 149 to 155, the method getLiteralString incorrectly uses
literal.toString() for non-Slice values, which returns the full object
representation instead of just the value. To fix this, extract the actual value
from the ConstantExpression for non-Slice types by accessing literal.getValue()
and converting it to a string appropriately, ensuring the returned string
represents only the literal value.

Comment on lines 458 to 481
private Optional<Integer> parseIntValue(RowExpression expression)
{
if (expression instanceof ConstantExpression) {
try {
return Optional.of(Integer.parseInt(getLiteralString((ConstantExpression) expression)));
}
catch (NumberFormatException ignored) { }
}
else if (expression instanceof CallExpression) {
CallExpression call = (CallExpression) expression;
FunctionMetadata functionMetadata = functionMetadataManager.getFunctionMetadata(call.getFunctionHandle());
Optional<OperatorType> operatorTypeOptional = functionMetadata.getOperatorType();
if (operatorTypeOptional.isPresent() && operatorTypeOptional.get().equals(OperatorType.NEGATION)) {
RowExpression arg0 = call.getArguments().get(0);
if (arg0 instanceof ConstantExpression) {
try {
return Optional.of(-Integer.parseInt(getLiteralString((ConstantExpression) arg0)));
}
catch (NumberFormatException ignored) { }
}
}
}
return Optional.empty();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider logging parsing failures for debugging.

The empty catch blocks at lines 464 and 476 silently swallow NumberFormatException. While returning Optional.empty() is appropriate, consider adding debug logging to help troubleshoot parsing issues.

Example improvement:

 catch (NumberFormatException ignored) { }
+// Could add: log.debug("Failed to parse integer from expression: {}", expression);
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
between lines 458 and 481, the catch blocks for NumberFormatException are empty,
which silently swallows parsing errors. To fix this, add debug-level logging
inside both catch blocks to record the exception details when parsing fails,
while still returning Optional.empty(). This will aid in troubleshooting without
changing the method's behavior.

}
}

return new ClpExpression();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use explicit constructor for clarity.

The empty ClpExpression() constructor usage is unclear. Consider using a more explicit constructor to indicate "no pushdown possible" for better code clarity.

-        return new ClpExpression();
+        return new ClpExpression(null);  // Explicitly indicate no pushdown possible
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return new ClpExpression();
return new ClpExpression(null); // Explicitly indicate no pushdown possible
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
at line 553, replace the usage of the empty ClpExpression() constructor with a
more explicit constructor or factory method that clearly indicates "no pushdown
possible." This will improve code clarity by explicitly conveying the intent
rather than relying on an ambiguous empty constructor.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the inline comments, there are still some files that get reformatted when I run Intellij's formatter. Can you check all of them and ensure any differences from what the formatter does and what's in the PR are intentional?

Comment on lines 57 to 72
/**
* ClpFilterToKqlConverter translates Presto RowExpressions into KQL (Kibana Query Language) filters
* used as CLP queries. This is used primarily for pushing down supported filters to the CLP engine.
* This class implements the RowExpressionVisitor interface and recursively walks Presto filter expressions,
* attempting to convert supported expressions (e.g., comparisons, logical AND/OR, LIKE, IN, IS NULL,
* and SUBSTR-based expressions) into corresponding KQL filter strings. Any part of the expression that
* cannot be translated is preserved as a "remaining expression" for potential fallback processing.
* Supported translations include:
* - Variable-to-literal comparisons (e.g., =, !=, <, >, <=, >=)
* - String pattern matches using LIKE
* - Membership checks using IN
* - NULL checks via IS NULL
* - Substring comparisons (e.g., SUBSTR(x, start, len) = "val") mapped to wildcard KQL queries
* - Dereferencing fields from row-typed variables
* - Logical operators AND, OR, and NOT
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* ClpFilterToKqlConverter translates Presto RowExpressions into KQL (Kibana Query Language) filters
* used as CLP queries. This is used primarily for pushing down supported filters to the CLP engine.
* This class implements the RowExpressionVisitor interface and recursively walks Presto filter expressions,
* attempting to convert supported expressions (e.g., comparisons, logical AND/OR, LIKE, IN, IS NULL,
* and SUBSTR-based expressions) into corresponding KQL filter strings. Any part of the expression that
* cannot be translated is preserved as a "remaining expression" for potential fallback processing.
* Supported translations include:
* - Variable-to-literal comparisons (e.g., =, !=, <, >, <=, >=)
* - String pattern matches using LIKE
* - Membership checks using IN
* - NULL checks via IS NULL
* - Substring comparisons (e.g., SUBSTR(x, start, len) = "val") mapped to wildcard KQL queries
* - Dereferencing fields from row-typed variables
* - Logical operators AND, OR, and NOT
*/
/**
* A translator to translate Presto RowExpressions into KQL (Kibana Query Language) filters used as
* CLP queries. This is used primarily for pushing down supported filters to the CLP engine. This
* class implements the RowExpressionVisitor interface and recursively walks Presto filter
* expressions, attempting to convert supported expressions into corresponding KQL filter strings.
* Any part of the expression that cannot be translated is preserved as a "remaining expression" for
* potential fallback processing.
* <p></p>
* Supported translations include:
* <ul>
* <li>Comparisons between variables and literals (e.g., =, !=, <, >, <=, >=)</li>
* <li>String pattern matches using LIKE (some patterns are unsupported)</li>
* <li>Membership checks using IN</li>
* <li>NULL checks via IS NULL</li>
* <li>Substring comparisons (e.g., SUBSTR(x, start, len) = "val")</li>
* <li>Dereferencing fields from row-typed variables</li>
* <li>Logical operators AND, OR, and NOT</li>
* </ul>
*/

Can we also mention we can only do comparisons between variable & constant but not variable & variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change Comparisons between variables and literals to Comparisons between variables and constants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Comment on lines 180 to 187
/**
* Handles the logical NOT expression.
* Example:
* - NOT (col1 = 5) → NOT col1: 5
*
* @param node the NOT call expression
* @return a ClpExpression with the translated KQL query, or the original expression if unsupported
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* Handles the logical NOT expression.
* Example:
* - NOT (col1 = 5) → NOT col1: 5
*
* @param node the NOT call expression
* @return a ClpExpression with the translated KQL query, or the original expression if unsupported
*/
/**
* Handles the logical NOT expression.
* <p></p>
* Example: <code>NOT (col1 = 5)</code> → <code>NOT col1: 5</code>
*
* @param node the NOT call expression
* @return a ClpExpression containing the equivalent KQL query or the original expression if it
* couldn't be translated
*/

Can you apply this same style to all the other docstrings? Can we also use the format of the return doc suggested above across the other docstrings?

else {
return new ClpExpression(node);
}
pattern = pattern.replace("%", "*").replace("_", "?");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we handle escapes? (E.g., col LIKE 'abc#%def' ESCAPE '#' --> col: "abc%def"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It's Likepattern predicate, not like

Comment on lines 320 to 322
* - col = 'abc' → col: "abc"
* - col != 42 → NOT col: 42
* - 5 < col → col > 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use HTML lists so that the javadocs will render correctly?

import static com.facebook.presto.transaction.InMemoryTransactionManager.createTestTransactionManager;
import static java.util.stream.Collectors.toMap;

public class TestClpQueryBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ClpQueryTestUtils?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But I'm following the naming convention of other connectors (Test...QueryBase)

Comment on lines 205 to 206
* Converts SQL LIKE patterns into equivalent KQL queries using * (for %) and ? (for _).
* Only supports constant patterns or constant cast patterns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also mention what pattern isn't supported?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will mention it in the class docstring

StringBuilder queryBuilder = new StringBuilder();
queryBuilder.append("(");
List<RowExpression> remainingExpressions = new ArrayList<>();
boolean hasDefinition = false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe hasKqlQuery

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (10)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)

33-65: Consider using static factory methods for clearer intent.

The constructors accept null parameters which could be confusing. Static factory methods would make the intent clearer and avoid ambiguity about null handling.

Consider this alternative approach:

-    public ClpExpression(String pushDownExpression, RowExpression remainingExpression)
+    private ClpExpression(String pushDownExpression, RowExpression remainingExpression)
     {
         this.pushDownExpression = Optional.ofNullable(pushDownExpression);
         this.remainingExpression = Optional.ofNullable(remainingExpression);
     }

-    /**
-     * Creates an empty ClpExpression with neither pushdown nor remaining expressions.
-     */
-    public ClpExpression()
-    {
-        this(null, null);
-    }
+    public static ClpExpression empty()
+    {
+        return new ClpExpression(null, null);
+    }

-    /**
-     * Creates a ClpExpression from a fully translatable KQL query or column name.
-     *
-     * @param pushDownExpression the KQL query string or column name
-     */
-    public ClpExpression(String pushDownExpression)
-    {
-        this(pushDownExpression, null);
-    }
+    public static ClpExpression fullyTranslatable(String pushDownExpression)
+    {
+        return new ClpExpression(requireNonNull(pushDownExpression, "pushDownExpression is null"), null);
+    }

-    /**
-     * Creates a ClpExpression from a non-translatable RowExpression.
-     *
-     * @param remainingExpression the non-translatable RowExpression
-     */
-    public ClpExpression(RowExpression remainingExpression)
-    {
-        this(null, remainingExpression);
-    }
+    public static ClpExpression nonTranslatable(RowExpression remainingExpression)
+    {
+        return new ClpExpression(null, requireNonNull(remainingExpression, "remainingExpression is null"));
+    }
+
+    public static ClpExpression partiallyTranslatable(String pushDownExpression, RowExpression remainingExpression)
+    {
+        return new ClpExpression(
+                requireNonNull(pushDownExpression, "pushDownExpression is null"),
+                requireNonNull(remainingExpression, "remainingExpression is null"));
+    }
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)

195-213: Move private method after public methods.

According to code style, private methods should be placed after all public methods.

Move the testFilter method to the end of the class, after all the public test methods.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (8)

166-172: Fix incorrect literal string extraction for non-Slice types.

The toString() method on ConstantExpression returns the full object representation (e.g., "ConstantExpression{value=5, type=bigint}") rather than just the value. This will generate incorrect KQL queries.

Apply this fix:

 private String getLiteralString(ConstantExpression literal)
 {
     if (literal.getValue() instanceof Slice) {
         return ((Slice) literal.getValue()).toStringUtf8();
     }
-    return literal.toString();
+    return String.valueOf(literal.getValue());
 }

255-255: Document wildcard conversion logic.

The LIKE pattern conversion should be documented to clarify the wildcard mapping.

+        // Convert SQL LIKE wildcards to KQL wildcards: % -> * (zero or more chars), _ -> ? (exactly one char)
         pattern = pattern.replace("%", "*").replace("_", "?");

503-528: Consider adding debug logging for parsing failures.

The empty catch blocks silently swallow NumberFormatException. While returning Optional.empty() is appropriate, debug logging could help troubleshoot parsing issues.


412-439: Move method after its caller.

The parseSubstringCall method should be placed after tryInterpretSubstringEquality which calls it, to respect the code style ordering.


455-495: Move method after its caller.

The interpretSubstringEquality method should be placed after tryInterpretSubstringEquality which calls it.


537-551: Add JavaDoc for parseLengthLiteral method.

This method is missing documentation.


748-760: SubstrInfo class should be defined at the end.

Inner classes should be defined at the end of the enclosing class according to the code style.


494-494: Use explicit constructor for clarity.

Using the empty constructor is unclear about intent.

-        return new ClpExpression();
+        return new ClpExpression(null, null);  // No pushdown possible
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6ee2ea and 7dcc7fc.

📒 Files selected for processing (5)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)

26-50: LGTM!

The implementation is clean and follows the standard pattern for optimizer providers.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

38-111: LGTM!

The optimizer implementation correctly handles filter pushdown with proper null checks and error handling.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)

26-194: Comprehensive test coverage!

The test methods thoroughly cover various filter pushdown scenarios including string matching, numeric comparisons, logical operators, and complex combinations.

Comment on lines +57 to +65
/**
* Creates a ClpExpression from a non-translatable RowExpression.
*
* @param remainingExpression
*/
public ClpExpression(RowExpression remainingExpression)
{
this(null, remainingExpression);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add description to @param tag.

The JavaDoc has an empty @param tag without description.

     /**
      * Creates a ClpExpression from a non-translatable RowExpression.
      *
-     * @param remainingExpression
+     * @param remainingExpression the non-translatable RowExpression
      */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Creates a ClpExpression from a non-translatable RowExpression.
*
* @param remainingExpression
*/
public ClpExpression(RowExpression remainingExpression)
{
this(null, remainingExpression);
}
/**
* Creates a ClpExpression from a non-translatable RowExpression.
*
* @param remainingExpression the non-translatable RowExpression
*/
public ClpExpression(RowExpression remainingExpression)
{
this(null, remainingExpression);
}
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 57 to 65, the JavaDoc for the constructor has an empty @param tag
for remainingExpression. Add a clear description to the @param tag explaining
what remainingExpression represents, such as "the non-translatable RowExpression
to create the ClpExpression from."

Comment on lines +47 to +55
/**
* Creates a ClpExpression from a fully translatable KQL query or column name.
*
* @param pushDownExpression
*/
public ClpExpression(String pushDownExpression)
{
this(pushDownExpression, null);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add description to @param tag.

The JavaDoc has an empty @param tag without description.

     /**
      * Creates a ClpExpression from a fully translatable KQL query or column name.
      *
-     * @param pushDownExpression
+     * @param pushDownExpression the KQL query string or column name
      */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Creates a ClpExpression from a fully translatable KQL query or column name.
*
* @param pushDownExpression
*/
public ClpExpression(String pushDownExpression)
{
this(pushDownExpression, null);
}
/**
* Creates a ClpExpression from a fully translatable KQL query or column name.
*
* @param pushDownExpression the KQL query string or column name
*/
public ClpExpression(String pushDownExpression)
{
this(pushDownExpression, null);
}
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 47 to 55, the JavaDoc for the constructor has an empty @param tag
for pushDownExpression. Add a clear description explaining what
pushDownExpression represents, such as "the fully translatable KQL query or
column name to create the ClpExpression from."

@wraymo wraymo requested a review from kirkrodrigues June 23, 2025 21:28
anlowee
anlowee previously approved these changes Jun 24, 2025
Comment on lines 68 to 75
* <li>String pattern matches using LIKE with constant patterns only. <code>"^%[^%_]*%$"</code>
* is not supported.</li>
* <li>Membership checks using IN with a list of constants only</li>
* <li>NULL checks via IS NULL</li>
* <li>Substring comparisons (e.g., <code>SUBSTR(x, start, len) = "val"</code>) are supported
* only when compared against a constant.</li>
* <li>Dereferencing fields from row-typed variables</li>
* <li>Logical operators AND, OR, and NOT</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* <li>String pattern matches using LIKE with constant patterns only. <code>"^%[^%_]*%$"</code>
* is not supported.</li>
* <li>Membership checks using IN with a list of constants only</li>
* <li>NULL checks via IS NULL</li>
* <li>Substring comparisons (e.g., <code>SUBSTR(x, start, len) = "val"</code>) are supported
* only when compared against a constant.</li>
* <li>Dereferencing fields from row-typed variables</li>
* <li>Logical operators AND, OR, and NOT</li>
* <li>String pattern matches using LIKE with constant patterns only. Patterns that begin and
* end with <code>%</code> (i.e., <code>"^%[^%_]*%$"</code>) are not supported.</li>
* <li>Membership checks using IN with a list of constants only.</li>
* <li>NULL checks via IS NULL.</li>
* <li>Substring comparisons (e.g., <code>SUBSTR(x, start, len) = "val"</code>) against a
* constant.</li>
* <li>Dereferencing fields from row-typed variables.</li>
* <li>Logical operators AND, OR, and NOT.</li>

Comment on lines 191 to 192
* @return a ClpExpression containing the equivalent KQL query or the original expression if it
* couldn't be translated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return a ClpExpression containing the equivalent KQL query or the original expression if it
* couldn't be translated
* @return a ClpExpression containing either the equivalent KQL query, or the original
* expression if it couldn't be translated

Comment on lines 212 to 213
* Converts SQL LIKE patterns into equivalent KQL queries using <code>*</code> (for <code>%</code>)
* and <code>?</code> (for <code>_</code>). Only supports constant or casted constant patterns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Converts SQL LIKE patterns into equivalent KQL queries using <code>*</code> (for <code>%</code>)
* and <code>?</code> (for <code>_</code>). Only supports constant or casted constant patterns.
* Converts SQL LIKE patterns into equivalent KQL queries using <code>*</code> (for
* <code>%</code>) and <code>?</code> (for <code>_</code>). Only supports constant or casted
* constant patterns.

Comment on lines 218 to 219
* @return a ClpExpression containing the equivalent KQL query, or the original expression if it
* couldn't be translated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return a ClpExpression containing the equivalent KQL query, or the original expression if it
* couldn't be translated
* @return a ClpExpression containing either the equivalent KQL query, or the original
* expression if it couldn't be translated

Comment on lines 262 to 263
* Supports constant values on either side and flips the operator if necessary. Also delegates to a
* substring handler for <code>SUBSTR(x, ...) = 'value'</code> patterns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Supports constant values on either side and flips the operator if necessary. Also delegates to a
* substring handler for <code>SUBSTR(x, ...) = 'value'</code> patterns.
* Supports constant values on either side and flips the operator if necessary. Also delegates
* to a substring handler for <code>SUBSTR(x, ...) = 'value'</code> patterns.

queryBuilder.append(expression.getPushDownExpression().get());
queryBuilder.append(" OR ");
}
// Remove the last " OR " from the query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Remove the last " OR " from the query
// Remove the last " OR " from the query

Comment on lines 631 to 632
* @return a ClpExpression containing the equivalent KQL query or the original expression if it
* couldn't be translated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return a ClpExpression containing the equivalent KQL query or the original expression if it
* couldn't be translated
* @param node the <code>IN</code> special form expression
* @return a ClpExpression containing either the equivalent KQL query, or the original
* expression if it couldn't be translated

Comment on lines 668 to 669
* @return a ClpExpression containing the equivalent KQL query or the original expression if it
* couldn't be translated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return a ClpExpression containing the equivalent KQL query or the original expression if it
* couldn't be translated
* @return a ClpExpression containing either the equivalent KQL query, or the original
* expression if it couldn't be translated

/**
* Handles dereference expressions on RowTypes (e.g., <code>col.row_field</code>).
* <p></p>
* Converts nested row field access into dot-separated KQL-compatible field names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Converts nested row field access into dot-separated KQL-compatible field names.
* Converts nested row field accesses into dot-separated KQL-compatible field names.

Comment on lines 694 to 696
* @param expression the dereference expression (SpecialFormExpression or VariableReferenceExpression)
* @return a ClpExpression containing the dot-separated field name or the original expression if it
* couldn't be translated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param expression the dereference expression (SpecialFormExpression or VariableReferenceExpression)
* @return a ClpExpression containing the dot-separated field name or the original expression if it
* couldn't be translated
* @param expression the dereference expression ({@link SpecialFormExpression} or
* {@link VariableReferenceExpression})
* @return a ClpExpression containing either the dot-separated field name, or the original
* expression if it couldn't be translated

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)

166-172: Fix incorrect literal string extraction for non-Slice types.

The toString() method on ConstantExpression returns the full object representation (e.g., "ConstantExpression{value=5, type=bigint}") rather than just the value. This will generate incorrect KQL queries.

Apply this fix to properly extract the literal value:

 private String getLiteralString(ConstantExpression literal)
 {
     if (literal.getValue() instanceof Slice) {
         return ((Slice) literal.getValue()).toStringUtf8();
     }
-    return literal.toString();
+    return String.valueOf(literal.getValue());
 }

508-533: Consider logging parsing failures for debugging.

The empty catch blocks at lines 514 and 527 silently swallow NumberFormatException. While returning Optional.empty() is appropriate, consider adding debug logging to help troubleshoot parsing issues.

Example improvement:

 catch (NumberFormatException ignored) { }
+// Could add: log.debug("Failed to parse integer from expression: {}", expression);

499-499: Use explicit constructor for clarity.

The empty ClpExpression() constructor usage is unclear. Consider using a more explicit constructor to indicate "no pushdown possible" for better code clarity.

-        return new ClpExpression();
+        return new ClpExpression(null);  // Explicitly indicate no pushdown possible
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7dcc7fc and 636d35e.

📒 Files selected for processing (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)

Comment on lines +542 to +556
private Optional<Integer> parseLengthLiteral(RowExpression lengthExpression, String targetString)
{
if (lengthExpression instanceof ConstantExpression) {
String val = getLiteralString((ConstantExpression) lengthExpression);
try {
int parsed = parseInt(val);
if (parsed == targetString.length()) {
return Optional.of(parsed);
}
}
catch (NumberFormatException ignored) {
}
}
return Optional.empty();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add debug logging for parsing failures.

The empty catch block at line 552 silently swallows NumberFormatException. Consider adding debug logging here as well.

 catch (NumberFormatException ignored) { }
+// Could add: log.debug("Failed to parse length literal: {}", val);

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
between lines 542 and 556, the catch block for NumberFormatException is empty,
which silently ignores parsing errors. Modify the catch block to include
debug-level logging that records the exception details and the value that failed
to parse, helping with troubleshooting without interrupting the flow.

@kirkrodrigues kirkrodrigues changed the title feat: Add a plan optimizer for CLP connector. feat(presto-clp): Add a plan optimizer for the CLP connector. Jun 24, 2025
@kirkrodrigues kirkrodrigues changed the title feat(presto-clp): Add a plan optimizer for the CLP connector. feat(presto-clp): Add a plan optimizer to push down supported filters to CLP. Jun 24, 2025
kirkrodrigues
kirkrodrigues previously approved these changes Jun 24, 2025
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the docstrings. Deferring to @anlowee's review for everything else.

Edited the PR title directly.


/**
* Represents the result of converting a Presto RowExpression into a CLP-compatible KQL query. In
* every case, `kqlQuery` represents the part of the RowExpression that could be converted to a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update this docstring? We have renamed the kqlQuery to pushDownExpression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (3)

51-51: Add description to @param tag.

The JavaDoc has an empty @param tag without a description.

-     * @param pushDownExpression
+     * @param pushDownExpression the KQL query string or column name

61-61: Add description to @param tag.

The JavaDoc has an empty @param tag without a description.

-     * @param remainingExpression
+     * @param remainingExpression the non-translatable RowExpression

34-38: Consider improving constructor parameter validation.

The constructor currently accepts null parameters and wraps them in Optional, which could lead to confusion about the intended usage patterns.

Consider adding explicit validation or providing clearer static factory methods as previously suggested to improve code clarity and prevent misuse.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 636d35e and 537c424.

📒 Files selected for processing (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)

Comment on lines +68 to +76
public Optional<String> getPushDownExpression()
{
return pushDownExpression;
}

public Optional<RowExpression> getRemainingExpression()
{
return remainingExpression;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider adding JavaDoc for getter methods.

While the getter methods are self-explanatory, adding JavaDoc would improve consistency with the rest of the class and provide clarity about the Optional return types.

+    /**
+     * Returns the KQL query or column name that can be pushed down, if any.
+     *
+     * @return Optional containing the pushdown expression, or empty if none exists
+     */
     public Optional<String> getPushDownExpression()
     {
         return pushDownExpression;
     }

+    /**
+     * Returns the remaining non-translatable RowExpression, if any.
+     *
+     * @return Optional containing the remaining expression, or empty if fully translatable
+     */
     public Optional<RowExpression> getRemainingExpression()
     {
         return remainingExpression;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public Optional<String> getPushDownExpression()
{
return pushDownExpression;
}
public Optional<RowExpression> getRemainingExpression()
{
return remainingExpression;
}
/**
* Returns the KQL query or column name that can be pushed down, if any.
*
* @return Optional containing the pushdown expression, or empty if none exists
*/
public Optional<String> getPushDownExpression()
{
return pushDownExpression;
}
/**
* Returns the remaining non-translatable RowExpression, if any.
*
* @return Optional containing the remaining expression, or empty if fully translatable
*/
public Optional<RowExpression> getRemainingExpression()
{
return remainingExpression;
}
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 68 to 76, the getter methods getPushDownExpression and
getRemainingExpression lack JavaDoc comments. Add concise JavaDoc comments above
each method explaining what the method returns, emphasizing that the return type
is Optional and what it may contain or represent, to maintain consistency and
improve clarity.

Comment on lines +26 to +77
public class ClpExpression
{
// Optional KQL query or column name representing the fully or partially translatable part of the expression.
private final Optional<String> pushDownExpression;

// The remaining (non-translatable) portion of the RowExpression, if any.
private final Optional<RowExpression> remainingExpression;

public ClpExpression(String pushDownExpression, RowExpression remainingExpression)
{
this.pushDownExpression = Optional.ofNullable(pushDownExpression);
this.remainingExpression = Optional.ofNullable(remainingExpression);
}

/**
* Creates an empty ClpExpression with neither pushdown nor remaining expressions.
*/
public ClpExpression()
{
this(null, null);
}

/**
* Creates a ClpExpression from a fully translatable KQL query or column name.
*
* @param pushDownExpression
*/
public ClpExpression(String pushDownExpression)
{
this(pushDownExpression, null);
}

/**
* Creates a ClpExpression from a non-translatable RowExpression.
*
* @param remainingExpression
*/
public ClpExpression(RowExpression remainingExpression)
{
this(null, remainingExpression);
}

public Optional<String> getPushDownExpression()
{
return pushDownExpression;
}

public Optional<RowExpression> getRemainingExpression()
{
return remainingExpression;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider adding standard object methods for better usability.

The class lacks equals(), hashCode(), and toString() methods, which are typically expected for data classes and would improve testability and debugging capabilities.

+    @Override
+    public boolean equals(Object obj)
+    {
+        if (this == obj) {
+            return true;
+        }
+        if (obj == null || getClass() != obj.getClass()) {
+            return false;
+        }
+        ClpExpression that = (ClpExpression) obj;
+        return pushDownExpression.equals(that.pushDownExpression) &&
+               remainingExpression.equals(that.remainingExpression);
+    }
+
+    @Override
+    public int hashCode()
+    {
+        return java.util.Objects.hash(pushDownExpression, remainingExpression);
+    }
+
+    @Override
+    public String toString()
+    {
+        return "ClpExpression{" +
+               "pushDownExpression=" + pushDownExpression +
+               ", remainingExpression=" + remainingExpression +
+               '}';
+    }
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
between lines 26 and 77, the class ClpExpression lacks the standard methods
equals(), hashCode(), and toString(). To fix this, implement these methods to
compare instances based on pushDownExpression and remainingExpression fields,
generate consistent hash codes from these fields, and provide a clear string
representation of the object including its field values. This will improve
usability, testability, and debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants