-
Notifications
You must be signed in to change notification settings - Fork 3
feat(presto-clp): Add a plan optimizer to push down supported filters to CLP. #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(presto-clp): Add a plan optimizer to push down supported filters to CLP. #16
Conversation
WalkthroughThis change introduces new classes to enable filter pushdown in the CLP connector by translating Presto filter expressions into KQL queries. It adds a converter for expression translation, a plan optimizer to rewrite query plans with pushdown, a provider for optimizer registration, and comprehensive tests plus utilities for validation. Additionally, it fixes naming inconsistencies related to archive table suffix constants. Changes
Sequence Diagram(s)sequenceDiagram
participant PrestoPlanner
participant ClpPlanOptimizer
participant ClpFilterToKqlConverter
participant ClpExpression
participant TableScanNode
PrestoPlanner->>ClpPlanOptimizer: optimize(plan, session, ...)
ClpPlanOptimizer->>ClpFilterToKqlConverter: convert(filter predicate)
ClpFilterToKqlConverter->>ClpExpression: return (KQL string, remaining expr)
ClpPlanOptimizer-->>TableScanNode: create new TableScanNode with KQL (if possible)
alt Remaining predicate exists
ClpPlanOptimizer-->>PrestoPlanner: create new FilterNode on top of TableScanNode
else No remaining predicate
ClpPlanOptimizer-->>PrestoPlanner: replace FilterNode with TableScanNode
end
sequenceDiagram
participant TestClass
participant TestClpQueryBase
participant ClpFilterToKqlConverter
participant ClpExpression
TestClass->>TestClpQueryBase: getRowExpression(SQL, session)
TestClass->>ClpFilterToKqlConverter: convert(RowExpression)
ClpFilterToKqlConverter-->>ClpExpression: return (KQL string, remaining expr)
TestClass->>TestClass: assert KQL and remaining as expected
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (16)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
26-51: Clean and well-structured optimizer provider implementation.The implementation correctly follows the provider pattern with proper dependency injection and clear separation between logical and physical plan optimizers. The code is concise and follows Java best practices.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
20-28: Excellent class documentation.The class-level documentation clearly explains the three conversion scenarios and how the fields are used in each case. This makes the API intent very clear to users.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3)
72-88: Proper pattern matching and early exit.The implementation correctly identifies the target optimization pattern (FilterNode over TableScanNode) and exits early when the pattern doesn't match. The type casting is safe after the instanceof check.
90-113: Correct plan reconstruction logic.The implementation properly handles both scenarios:
- Fully pushed down predicates: returns only the new TableScanNode
- Partially pushed down predicates: creates a new FilterNode with remaining predicates
The TableScanNode construction preserves all necessary properties while embedding the KQL query in the layout handle.
82-88: We should inspect the surrounding code in ClpPlanOptimizer to confirm whether exceptions from the converter are caught. Please run:#!/bin/bash # Show entire optimize() in ClpPlanOptimizer to check exception handling sed -n '1,200p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (2)
77-93: Well-designed test schema with realistic complexity.The test schema includes complex nested types (RowType with Region containing Id and Name fields) which will help validate that the converter handles complex column references correctly. The variable-to-column mapping is properly constructed.
95-122: Comprehensive expression conversion utilities.The utility methods provide a complete pipeline from SQL strings to RowExpressions, including proper type analysis and session context. This matches how expressions are processed in the actual Presto engine.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (6)
29-48: Well-structured test helper method.The
testFiltermethod provides a clean interface for testing conversion scenarios, properly handling both the KQL output and remaining expression validation. The assertion logic correctly handles optional values.
50-73: Comprehensive string matching test coverage.The tests cover various LIKE predicate patterns and their KQL translations, including edge cases like middle wildcards that cannot be pushed down. The test cases demonstrate understanding of KQL limitations.
85-87: Good edge case testing for invalid substring operations.The tests correctly verify that invalid substring operations (like requesting more characters than the target string length) are not pushed down, maintaining query correctness.
112-134: Excellent logical operator testing with mixed scenarios.The OR tests properly validate that when any child expression cannot be pushed down, the entire OR expression remains in the filter. This is crucial for maintaining query semantics.
136-159: Correct AND operator partial pushdown logic.The AND tests demonstrate that pushable expressions are correctly separated from non-pushable ones, allowing partial optimization while preserving correctness. This is a sophisticated optimization that the tests validate well.
204-214: Complex scenario testing validates real-world usage.The complex pushdown tests combine multiple logical operators and expression types, ensuring the converter handles realistic query patterns correctly. These tests provide confidence in the implementation's robustness.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)
52-679: Well-structured implementation with comprehensive KQL conversion support.The implementation provides thorough support for converting various SQL constructs to KQL, with good documentation and error handling. The visitor pattern is well-utilized, and the code handles complex cases like substring matching elegantly.
289-303: Verify KQL syntax for IS NULL predicate.The implementation converts
IS NULLtoNOT col: *. Please verify this is the correct KQL syntax for NULL checks in your CLP implementation, as KQL syntax can vary between systems.What is the correct KQL syntax to check if a field is NULL or does not exist?
487-500: ```shell
#!/bin/bash
set -euo pipefailInspect the definition and invocation sites for parseLengthLiteralOrFunction
rg -n "parseLengthLiteralOrFunction" -A5 -B5 presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
</details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| public ClpExpression(String definition, RowExpression remainingExpression) | ||
| { | ||
| this.definition = Optional.ofNullable(definition); | ||
| this.remainingExpression = Optional.ofNullable(remainingExpression); | ||
| } | ||
|
|
||
| // Creates an empty ClpExpression (no KQL definition, no remaining expression). | ||
| public ClpExpression() | ||
| { | ||
| this (null, null); | ||
| } | ||
|
|
||
| // Creates a ClpExpression from a fully translatable KQL string. | ||
| public ClpExpression(String definition) | ||
| { | ||
| this(definition, null); | ||
| } | ||
|
|
||
| // Creates a ClpExpression from a non-translatable RowExpression. | ||
| public ClpExpression(RowExpression remainingExpression) | ||
| { | ||
| this(null, remainingExpression); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider improving constructor parameter validation.
The constructors currently accept null parameters and wrap them in Optional, which could lead to confusion. Consider making the intent clearer by either:
- Using explicit null checks with meaningful parameter names
- Providing static factory methods with clearer semantics
// Alternative approach with static factory methods
+public static ClpExpression fullyTranslatable(String definition) {
+ return new ClpExpression(requireNonNull(definition), null);
+}
+
+public static ClpExpression partiallyTranslatable(String definition, RowExpression remaining) {
+ return new ClpExpression(requireNonNull(definition), requireNonNull(remaining));
+}
+
+public static ClpExpression nonTranslatable(RowExpression remaining) {
+ return new ClpExpression(null, requireNonNull(remaining));
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public ClpExpression(String definition, RowExpression remainingExpression) | |
| { | |
| this.definition = Optional.ofNullable(definition); | |
| this.remainingExpression = Optional.ofNullable(remainingExpression); | |
| } | |
| // Creates an empty ClpExpression (no KQL definition, no remaining expression). | |
| public ClpExpression() | |
| { | |
| this (null, null); | |
| } | |
| // Creates a ClpExpression from a fully translatable KQL string. | |
| public ClpExpression(String definition) | |
| { | |
| this(definition, null); | |
| } | |
| // Creates a ClpExpression from a non-translatable RowExpression. | |
| public ClpExpression(RowExpression remainingExpression) | |
| { | |
| this(null, remainingExpression); | |
| } | |
| public ClpExpression(String definition, RowExpression remainingExpression) | |
| { | |
| this.definition = Optional.ofNullable(definition); | |
| this.remainingExpression = Optional.ofNullable(remainingExpression); | |
| } | |
| // Creates an empty ClpExpression (no KQL definition, no remaining expression). | |
| public ClpExpression() | |
| { | |
| this(null, null); | |
| } | |
| // Creates a ClpExpression from a fully translatable KQL string. | |
| public ClpExpression(String definition) | |
| { | |
| this(definition, null); | |
| } | |
| // Creates a ClpExpression from a non-translatable RowExpression. | |
| public ClpExpression(RowExpression remainingExpression) | |
| { | |
| this(null, remainingExpression); | |
| } | |
| // Alternative approach with static factory methods | |
| public static ClpExpression fullyTranslatable(String definition) { | |
| return new ClpExpression(requireNonNull(definition), null); | |
| } | |
| public static ClpExpression partiallyTranslatable(String definition, RowExpression remaining) { | |
| return new ClpExpression(requireNonNull(definition), requireNonNull(remaining)); | |
| } | |
| public static ClpExpression nonTranslatable(RowExpression remaining) { | |
| return new ClpExpression(null, requireNonNull(remaining)); | |
| } |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 37 to 59, the constructors accept null parameters and wrap them in
Optional, which can be confusing. To fix this, add explicit null checks in the
constructors to validate parameters and throw IllegalArgumentException if
invalid. Alternatively, refactor by making constructors private and provide
static factory methods with descriptive names for creating instances with
definition, remainingExpression, or empty, to clarify intent and avoid null
usage.
| protected static class SessionHolder | ||
| { | ||
| private final ConnectorSession connectorSession; | ||
| private final Session session; | ||
|
|
||
| public SessionHolder() | ||
| { | ||
| connectorSession = SESSION; | ||
| session = TestingSession.testSessionBuilder(createTestingSessionPropertyManager(new SystemSessionProperties().getSessionProperties())).build(); | ||
| } | ||
|
|
||
| public ConnectorSession getConnectorSession() | ||
| { | ||
| return connectorSession; | ||
| } | ||
|
|
||
| public Session getSession() | ||
| { | ||
| return session; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider extracting session creation to reduce duplication.
The SessionHolder class creates sessions in each test, which could lead to duplication if many test classes extend this base.
Consider providing a static factory method or singleton pattern for commonly used session configurations:
+protected static final SessionHolder DEFAULT_SESSION = new SessionHolder();
+
+protected static SessionHolder getDefaultSession() {
+ return DEFAULT_SESSION;
+}🤖 Prompt for AI Agents
In presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
around lines 124 to 144, the SessionHolder class creates new session instances
in its constructor, which may cause duplication across tests. Refactor by
extracting the session creation logic into a static factory method or implement
a singleton pattern to provide commonly used session instances, so tests can
reuse these sessions instead of creating new ones each time.
| private static String getLiteralString(ConstantExpression literal) | ||
| { | ||
| if (literal.getValue() instanceof Slice) { | ||
| return ((Slice) literal.getValue()).toStringUtf8(); | ||
| } | ||
| return literal.toString(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect literal string extraction for non-Slice types.
The toString() method on ConstantExpression returns the full object representation (e.g., "ConstantExpression{value=5, type=bigint}") rather than just the value. This will generate incorrect KQL queries.
Apply this fix to properly extract the literal value:
private static String getLiteralString(ConstantExpression literal)
{
if (literal.getValue() instanceof Slice) {
return ((Slice) literal.getValue()).toStringUtf8();
}
- return literal.toString();
+ return String.valueOf(literal.getValue());
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private static String getLiteralString(ConstantExpression literal) | |
| { | |
| if (literal.getValue() instanceof Slice) { | |
| return ((Slice) literal.getValue()).toStringUtf8(); | |
| } | |
| return literal.toString(); | |
| } | |
| private static String getLiteralString(ConstantExpression literal) | |
| { | |
| if (literal.getValue() instanceof Slice) { | |
| return ((Slice) literal.getValue()).toStringUtf8(); | |
| } | |
| return String.valueOf(literal.getValue()); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
around lines 149 to 155, the method getLiteralString incorrectly uses
literal.toString() for non-Slice values, which returns the full object
representation instead of just the value. To fix this, extract the actual value
from the ConstantExpression for non-Slice types by accessing literal.getValue()
and converting it to a string appropriately, ensuring the returned string
represents only the literal value.
| private Optional<Integer> parseIntValue(RowExpression expression) | ||
| { | ||
| if (expression instanceof ConstantExpression) { | ||
| try { | ||
| return Optional.of(Integer.parseInt(getLiteralString((ConstantExpression) expression))); | ||
| } | ||
| catch (NumberFormatException ignored) { } | ||
| } | ||
| else if (expression instanceof CallExpression) { | ||
| CallExpression call = (CallExpression) expression; | ||
| FunctionMetadata functionMetadata = functionMetadataManager.getFunctionMetadata(call.getFunctionHandle()); | ||
| Optional<OperatorType> operatorTypeOptional = functionMetadata.getOperatorType(); | ||
| if (operatorTypeOptional.isPresent() && operatorTypeOptional.get().equals(OperatorType.NEGATION)) { | ||
| RowExpression arg0 = call.getArguments().get(0); | ||
| if (arg0 instanceof ConstantExpression) { | ||
| try { | ||
| return Optional.of(-Integer.parseInt(getLiteralString((ConstantExpression) arg0))); | ||
| } | ||
| catch (NumberFormatException ignored) { } | ||
| } | ||
| } | ||
| } | ||
| return Optional.empty(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider logging parsing failures for debugging.
The empty catch blocks at lines 464 and 476 silently swallow NumberFormatException. While returning Optional.empty() is appropriate, consider adding debug logging to help troubleshoot parsing issues.
Example improvement:
catch (NumberFormatException ignored) { }
+// Could add: log.debug("Failed to parse integer from expression: {}", expression);🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
between lines 458 and 481, the catch blocks for NumberFormatException are empty,
which silently swallows parsing errors. To fix this, add debug-level logging
inside both catch blocks to record the exception details when parsing fails,
while still returning Optional.empty(). This will aid in troubleshooting without
changing the method's behavior.
| } | ||
| } | ||
|
|
||
| return new ClpExpression(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Use explicit constructor for clarity.
The empty ClpExpression() constructor usage is unclear. Consider using a more explicit constructor to indicate "no pushdown possible" for better code clarity.
- return new ClpExpression();
+ return new ClpExpression(null); // Explicitly indicate no pushdown possible📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return new ClpExpression(); | |
| return new ClpExpression(null); // Explicitly indicate no pushdown possible |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
at line 553, replace the usage of the empty ClpExpression() constructor with a
more explicit constructor or factory method that clearly indicates "no pushdown
possible." This will improve code clarity by explicitly conveying the intent
rather than relying on an ambiguous empty constructor.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the inline comments, there are still some files that get reformatted when I run Intellij's formatter. Can you check all of them and ensure any differences from what the formatter does and what's in the PR are intentional?
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Show resolved
Hide resolved
| /** | ||
| * ClpFilterToKqlConverter translates Presto RowExpressions into KQL (Kibana Query Language) filters | ||
| * used as CLP queries. This is used primarily for pushing down supported filters to the CLP engine. | ||
| * This class implements the RowExpressionVisitor interface and recursively walks Presto filter expressions, | ||
| * attempting to convert supported expressions (e.g., comparisons, logical AND/OR, LIKE, IN, IS NULL, | ||
| * and SUBSTR-based expressions) into corresponding KQL filter strings. Any part of the expression that | ||
| * cannot be translated is preserved as a "remaining expression" for potential fallback processing. | ||
| * Supported translations include: | ||
| * - Variable-to-literal comparisons (e.g., =, !=, <, >, <=, >=) | ||
| * - String pattern matches using LIKE | ||
| * - Membership checks using IN | ||
| * - NULL checks via IS NULL | ||
| * - Substring comparisons (e.g., SUBSTR(x, start, len) = "val") mapped to wildcard KQL queries | ||
| * - Dereferencing fields from row-typed variables | ||
| * - Logical operators AND, OR, and NOT | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** | |
| * ClpFilterToKqlConverter translates Presto RowExpressions into KQL (Kibana Query Language) filters | |
| * used as CLP queries. This is used primarily for pushing down supported filters to the CLP engine. | |
| * This class implements the RowExpressionVisitor interface and recursively walks Presto filter expressions, | |
| * attempting to convert supported expressions (e.g., comparisons, logical AND/OR, LIKE, IN, IS NULL, | |
| * and SUBSTR-based expressions) into corresponding KQL filter strings. Any part of the expression that | |
| * cannot be translated is preserved as a "remaining expression" for potential fallback processing. | |
| * Supported translations include: | |
| * - Variable-to-literal comparisons (e.g., =, !=, <, >, <=, >=) | |
| * - String pattern matches using LIKE | |
| * - Membership checks using IN | |
| * - NULL checks via IS NULL | |
| * - Substring comparisons (e.g., SUBSTR(x, start, len) = "val") mapped to wildcard KQL queries | |
| * - Dereferencing fields from row-typed variables | |
| * - Logical operators AND, OR, and NOT | |
| */ | |
| /** | |
| * A translator to translate Presto RowExpressions into KQL (Kibana Query Language) filters used as | |
| * CLP queries. This is used primarily for pushing down supported filters to the CLP engine. This | |
| * class implements the RowExpressionVisitor interface and recursively walks Presto filter | |
| * expressions, attempting to convert supported expressions into corresponding KQL filter strings. | |
| * Any part of the expression that cannot be translated is preserved as a "remaining expression" for | |
| * potential fallback processing. | |
| * <p></p> | |
| * Supported translations include: | |
| * <ul> | |
| * <li>Comparisons between variables and literals (e.g., =, !=, <, >, <=, >=)</li> | |
| * <li>String pattern matches using LIKE (some patterns are unsupported)</li> | |
| * <li>Membership checks using IN</li> | |
| * <li>NULL checks via IS NULL</li> | |
| * <li>Substring comparisons (e.g., SUBSTR(x, start, len) = "val")</li> | |
| * <li>Dereferencing fields from row-typed variables</li> | |
| * <li>Logical operators AND, OR, and NOT</li> | |
| * </ul> | |
| */ |
Can we also mention we can only do comparisons between variable & constant but not variable & variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change Comparisons between variables and literals to Comparisons between variables and constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
| /** | ||
| * Handles the logical NOT expression. | ||
| * Example: | ||
| * - NOT (col1 = 5) → NOT col1: 5 | ||
| * | ||
| * @param node the NOT call expression | ||
| * @return a ClpExpression with the translated KQL query, or the original expression if unsupported | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** | |
| * Handles the logical NOT expression. | |
| * Example: | |
| * - NOT (col1 = 5) → NOT col1: 5 | |
| * | |
| * @param node the NOT call expression | |
| * @return a ClpExpression with the translated KQL query, or the original expression if unsupported | |
| */ | |
| /** | |
| * Handles the logical NOT expression. | |
| * <p></p> | |
| * Example: <code>NOT (col1 = 5)</code> → <code>NOT col1: 5</code> | |
| * | |
| * @param node the NOT call expression | |
| * @return a ClpExpression containing the equivalent KQL query or the original expression if it | |
| * couldn't be translated | |
| */ |
Can you apply this same style to all the other docstrings? Can we also use the format of the return doc suggested above across the other docstrings?
| else { | ||
| return new ClpExpression(node); | ||
| } | ||
| pattern = pattern.replace("%", "*").replace("_", "?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we handle escapes? (E.g., col LIKE 'abc#%def' ESCAPE '#' --> col: "abc%def"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's Likepattern predicate, not like
| * - col = 'abc' → col: "abc" | ||
| * - col != 42 → NOT col: 42 | ||
| * - 5 < col → col > 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use HTML lists so that the javadocs will render correctly?
| import static com.facebook.presto.transaction.InMemoryTransactionManager.createTestTransactionManager; | ||
| import static java.util.stream.Collectors.toMap; | ||
|
|
||
| public class TestClpQueryBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ClpQueryTestUtils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But I'm following the naming convention of other connectors (Test...QueryBase)
| * Converts SQL LIKE patterns into equivalent KQL queries using * (for %) and ? (for _). | ||
| * Only supports constant patterns or constant cast patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also mention what pattern isn't supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will mention it in the class docstring
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
| StringBuilder queryBuilder = new StringBuilder(); | ||
| queryBuilder.append("("); | ||
| List<RowExpression> remainingExpressions = new ArrayList<>(); | ||
| boolean hasDefinition = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe hasKqlQuery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (10)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
33-65: Consider using static factory methods for clearer intent.The constructors accept null parameters which could be confusing. Static factory methods would make the intent clearer and avoid ambiguity about null handling.
Consider this alternative approach:
- public ClpExpression(String pushDownExpression, RowExpression remainingExpression) + private ClpExpression(String pushDownExpression, RowExpression remainingExpression) { this.pushDownExpression = Optional.ofNullable(pushDownExpression); this.remainingExpression = Optional.ofNullable(remainingExpression); } - /** - * Creates an empty ClpExpression with neither pushdown nor remaining expressions. - */ - public ClpExpression() - { - this(null, null); - } + public static ClpExpression empty() + { + return new ClpExpression(null, null); + } - /** - * Creates a ClpExpression from a fully translatable KQL query or column name. - * - * @param pushDownExpression the KQL query string or column name - */ - public ClpExpression(String pushDownExpression) - { - this(pushDownExpression, null); - } + public static ClpExpression fullyTranslatable(String pushDownExpression) + { + return new ClpExpression(requireNonNull(pushDownExpression, "pushDownExpression is null"), null); + } - /** - * Creates a ClpExpression from a non-translatable RowExpression. - * - * @param remainingExpression the non-translatable RowExpression - */ - public ClpExpression(RowExpression remainingExpression) - { - this(null, remainingExpression); - } + public static ClpExpression nonTranslatable(RowExpression remainingExpression) + { + return new ClpExpression(null, requireNonNull(remainingExpression, "remainingExpression is null")); + } + + public static ClpExpression partiallyTranslatable(String pushDownExpression, RowExpression remainingExpression) + { + return new ClpExpression( + requireNonNull(pushDownExpression, "pushDownExpression is null"), + requireNonNull(remainingExpression, "remainingExpression is null")); + }presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
195-213: Move private method after public methods.According to code style, private methods should be placed after all public methods.
Move the
testFiltermethod to the end of the class, after all the public test methods.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (8)
166-172: Fix incorrect literal string extraction for non-Slice types.The
toString()method onConstantExpressionreturns the full object representation (e.g., "ConstantExpression{value=5, type=bigint}") rather than just the value. This will generate incorrect KQL queries.Apply this fix:
private String getLiteralString(ConstantExpression literal) { if (literal.getValue() instanceof Slice) { return ((Slice) literal.getValue()).toStringUtf8(); } - return literal.toString(); + return String.valueOf(literal.getValue()); }
255-255: Document wildcard conversion logic.The LIKE pattern conversion should be documented to clarify the wildcard mapping.
+ // Convert SQL LIKE wildcards to KQL wildcards: % -> * (zero or more chars), _ -> ? (exactly one char) pattern = pattern.replace("%", "*").replace("_", "?");
503-528: Consider adding debug logging for parsing failures.The empty catch blocks silently swallow NumberFormatException. While returning Optional.empty() is appropriate, debug logging could help troubleshoot parsing issues.
412-439: Move method after its caller.The
parseSubstringCallmethod should be placed aftertryInterpretSubstringEqualitywhich calls it, to respect the code style ordering.
455-495: Move method after its caller.The
interpretSubstringEqualitymethod should be placed aftertryInterpretSubstringEqualitywhich calls it.
537-551: Add JavaDoc for parseLengthLiteral method.This method is missing documentation.
748-760: SubstrInfo class should be defined at the end.Inner classes should be defined at the end of the enclosing class according to the code style.
494-494: Use explicit constructor for clarity.Using the empty constructor is unclear about intent.
- return new ClpExpression(); + return new ClpExpression(null, null); // No pushdown possible
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
26-50: LGTM!The implementation is clean and follows the standard pattern for optimizer providers.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
38-111: LGTM!The optimizer implementation correctly handles filter pushdown with proper null checks and error handling.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
26-194: Comprehensive test coverage!The test methods thoroughly cover various filter pushdown scenarios including string matching, numeric comparisons, logical operators, and complex combinations.
| /** | ||
| * Creates a ClpExpression from a non-translatable RowExpression. | ||
| * | ||
| * @param remainingExpression | ||
| */ | ||
| public ClpExpression(RowExpression remainingExpression) | ||
| { | ||
| this(null, remainingExpression); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add description to @param tag.
The JavaDoc has an empty @param tag without description.
/**
* Creates a ClpExpression from a non-translatable RowExpression.
*
- * @param remainingExpression
+ * @param remainingExpression the non-translatable RowExpression
*/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Creates a ClpExpression from a non-translatable RowExpression. | |
| * | |
| * @param remainingExpression | |
| */ | |
| public ClpExpression(RowExpression remainingExpression) | |
| { | |
| this(null, remainingExpression); | |
| } | |
| /** | |
| * Creates a ClpExpression from a non-translatable RowExpression. | |
| * | |
| * @param remainingExpression the non-translatable RowExpression | |
| */ | |
| public ClpExpression(RowExpression remainingExpression) | |
| { | |
| this(null, remainingExpression); | |
| } |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 57 to 65, the JavaDoc for the constructor has an empty @param tag
for remainingExpression. Add a clear description to the @param tag explaining
what remainingExpression represents, such as "the non-translatable RowExpression
to create the ClpExpression from."
| /** | ||
| * Creates a ClpExpression from a fully translatable KQL query or column name. | ||
| * | ||
| * @param pushDownExpression | ||
| */ | ||
| public ClpExpression(String pushDownExpression) | ||
| { | ||
| this(pushDownExpression, null); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add description to @param tag.
The JavaDoc has an empty @param tag without description.
/**
* Creates a ClpExpression from a fully translatable KQL query or column name.
*
- * @param pushDownExpression
+ * @param pushDownExpression the KQL query string or column name
*/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Creates a ClpExpression from a fully translatable KQL query or column name. | |
| * | |
| * @param pushDownExpression | |
| */ | |
| public ClpExpression(String pushDownExpression) | |
| { | |
| this(pushDownExpression, null); | |
| } | |
| /** | |
| * Creates a ClpExpression from a fully translatable KQL query or column name. | |
| * | |
| * @param pushDownExpression the KQL query string or column name | |
| */ | |
| public ClpExpression(String pushDownExpression) | |
| { | |
| this(pushDownExpression, null); | |
| } |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 47 to 55, the JavaDoc for the constructor has an empty @param tag
for pushDownExpression. Add a clear description explaining what
pushDownExpression represents, such as "the fully translatable KQL query or
column name to create the ClpExpression from."
| * <li>String pattern matches using LIKE with constant patterns only. <code>"^%[^%_]*%$"</code> | ||
| * is not supported.</li> | ||
| * <li>Membership checks using IN with a list of constants only</li> | ||
| * <li>NULL checks via IS NULL</li> | ||
| * <li>Substring comparisons (e.g., <code>SUBSTR(x, start, len) = "val"</code>) are supported | ||
| * only when compared against a constant.</li> | ||
| * <li>Dereferencing fields from row-typed variables</li> | ||
| * <li>Logical operators AND, OR, and NOT</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * <li>String pattern matches using LIKE with constant patterns only. <code>"^%[^%_]*%$"</code> | |
| * is not supported.</li> | |
| * <li>Membership checks using IN with a list of constants only</li> | |
| * <li>NULL checks via IS NULL</li> | |
| * <li>Substring comparisons (e.g., <code>SUBSTR(x, start, len) = "val"</code>) are supported | |
| * only when compared against a constant.</li> | |
| * <li>Dereferencing fields from row-typed variables</li> | |
| * <li>Logical operators AND, OR, and NOT</li> | |
| * <li>String pattern matches using LIKE with constant patterns only. Patterns that begin and | |
| * end with <code>%</code> (i.e., <code>"^%[^%_]*%$"</code>) are not supported.</li> | |
| * <li>Membership checks using IN with a list of constants only.</li> | |
| * <li>NULL checks via IS NULL.</li> | |
| * <li>Substring comparisons (e.g., <code>SUBSTR(x, start, len) = "val"</code>) against a | |
| * constant.</li> | |
| * <li>Dereferencing fields from row-typed variables.</li> | |
| * <li>Logical operators AND, OR, and NOT.</li> |
| * @return a ClpExpression containing the equivalent KQL query or the original expression if it | ||
| * couldn't be translated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @return a ClpExpression containing the equivalent KQL query or the original expression if it | |
| * couldn't be translated | |
| * @return a ClpExpression containing either the equivalent KQL query, or the original | |
| * expression if it couldn't be translated |
| * Converts SQL LIKE patterns into equivalent KQL queries using <code>*</code> (for <code>%</code>) | ||
| * and <code>?</code> (for <code>_</code>). Only supports constant or casted constant patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Converts SQL LIKE patterns into equivalent KQL queries using <code>*</code> (for <code>%</code>) | |
| * and <code>?</code> (for <code>_</code>). Only supports constant or casted constant patterns. | |
| * Converts SQL LIKE patterns into equivalent KQL queries using <code>*</code> (for | |
| * <code>%</code>) and <code>?</code> (for <code>_</code>). Only supports constant or casted | |
| * constant patterns. |
| * @return a ClpExpression containing the equivalent KQL query, or the original expression if it | ||
| * couldn't be translated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @return a ClpExpression containing the equivalent KQL query, or the original expression if it | |
| * couldn't be translated | |
| * @return a ClpExpression containing either the equivalent KQL query, or the original | |
| * expression if it couldn't be translated |
| * Supports constant values on either side and flips the operator if necessary. Also delegates to a | ||
| * substring handler for <code>SUBSTR(x, ...) = 'value'</code> patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Supports constant values on either side and flips the operator if necessary. Also delegates to a | |
| * substring handler for <code>SUBSTR(x, ...) = 'value'</code> patterns. | |
| * Supports constant values on either side and flips the operator if necessary. Also delegates | |
| * to a substring handler for <code>SUBSTR(x, ...) = 'value'</code> patterns. |
| queryBuilder.append(expression.getPushDownExpression().get()); | ||
| queryBuilder.append(" OR "); | ||
| } | ||
| // Remove the last " OR " from the query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Remove the last " OR " from the query | |
| // Remove the last " OR " from the query |
| * @return a ClpExpression containing the equivalent KQL query or the original expression if it | ||
| * couldn't be translated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @return a ClpExpression containing the equivalent KQL query or the original expression if it | |
| * couldn't be translated | |
| * @param node the <code>IN</code> special form expression | |
| * @return a ClpExpression containing either the equivalent KQL query, or the original | |
| * expression if it couldn't be translated |
| * @return a ClpExpression containing the equivalent KQL query or the original expression if it | ||
| * couldn't be translated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @return a ClpExpression containing the equivalent KQL query or the original expression if it | |
| * couldn't be translated | |
| * @return a ClpExpression containing either the equivalent KQL query, or the original | |
| * expression if it couldn't be translated |
| /** | ||
| * Handles dereference expressions on RowTypes (e.g., <code>col.row_field</code>). | ||
| * <p></p> | ||
| * Converts nested row field access into dot-separated KQL-compatible field names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Converts nested row field access into dot-separated KQL-compatible field names. | |
| * Converts nested row field accesses into dot-separated KQL-compatible field names. |
| * @param expression the dereference expression (SpecialFormExpression or VariableReferenceExpression) | ||
| * @return a ClpExpression containing the dot-separated field name or the original expression if it | ||
| * couldn't be translated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param expression the dereference expression (SpecialFormExpression or VariableReferenceExpression) | |
| * @return a ClpExpression containing the dot-separated field name or the original expression if it | |
| * couldn't be translated | |
| * @param expression the dereference expression ({@link SpecialFormExpression} or | |
| * {@link VariableReferenceExpression}) | |
| * @return a ClpExpression containing either the dot-separated field name, or the original | |
| * expression if it couldn't be translated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)
166-172: Fix incorrect literal string extraction for non-Slice types.The
toString()method onConstantExpressionreturns the full object representation (e.g., "ConstantExpression{value=5, type=bigint}") rather than just the value. This will generate incorrect KQL queries.Apply this fix to properly extract the literal value:
private String getLiteralString(ConstantExpression literal) { if (literal.getValue() instanceof Slice) { return ((Slice) literal.getValue()).toStringUtf8(); } - return literal.toString(); + return String.valueOf(literal.getValue()); }
508-533: Consider logging parsing failures for debugging.The empty catch blocks at lines 514 and 527 silently swallow
NumberFormatException. While returningOptional.empty()is appropriate, consider adding debug logging to help troubleshoot parsing issues.Example improvement:
catch (NumberFormatException ignored) { } +// Could add: log.debug("Failed to parse integer from expression: {}", expression);
499-499: Use explicit constructor for clarity.The empty
ClpExpression()constructor usage is unclear. Consider using a more explicit constructor to indicate "no pushdown possible" for better code clarity.- return new ClpExpression(); + return new ClpExpression(null); // Explicitly indicate no pushdown possible
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
| private Optional<Integer> parseLengthLiteral(RowExpression lengthExpression, String targetString) | ||
| { | ||
| if (lengthExpression instanceof ConstantExpression) { | ||
| String val = getLiteralString((ConstantExpression) lengthExpression); | ||
| try { | ||
| int parsed = parseInt(val); | ||
| if (parsed == targetString.length()) { | ||
| return Optional.of(parsed); | ||
| } | ||
| } | ||
| catch (NumberFormatException ignored) { | ||
| } | ||
| } | ||
| return Optional.empty(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add debug logging for parsing failures.
The empty catch block at line 552 silently swallows NumberFormatException. Consider adding debug logging here as well.
catch (NumberFormatException ignored) { }
+// Could add: log.debug("Failed to parse length literal: {}", val);Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
between lines 542 and 556, the catch block for NumberFormatException is empty,
which silently ignores parsing errors. Modify the catch block to include
debug-level logging that records the exception details and the value that failed
to parse, helping with troubleshooting without interrupting the flow.
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the docstrings. Deferring to @anlowee's review for everything else.
Edited the PR title directly.
|
|
||
| /** | ||
| * Represents the result of converting a Presto RowExpression into a CLP-compatible KQL query. In | ||
| * every case, `kqlQuery` represents the part of the RowExpression that could be converted to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update this docstring? We have renamed the kqlQuery to pushDownExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (3)
51-51: Add description to @param tag.The JavaDoc has an empty @param tag without a description.
- * @param pushDownExpression + * @param pushDownExpression the KQL query string or column name
61-61: Add description to @param tag.The JavaDoc has an empty @param tag without a description.
- * @param remainingExpression + * @param remainingExpression the non-translatable RowExpression
34-38: Consider improving constructor parameter validation.The constructor currently accepts null parameters and wraps them in Optional, which could lead to confusion about the intended usage patterns.
Consider adding explicit validation or providing clearer static factory methods as previously suggested to improve code clarity and prevent misuse.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
| public Optional<String> getPushDownExpression() | ||
| { | ||
| return pushDownExpression; | ||
| } | ||
|
|
||
| public Optional<RowExpression> getRemainingExpression() | ||
| { | ||
| return remainingExpression; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider adding JavaDoc for getter methods.
While the getter methods are self-explanatory, adding JavaDoc would improve consistency with the rest of the class and provide clarity about the Optional return types.
+ /**
+ * Returns the KQL query or column name that can be pushed down, if any.
+ *
+ * @return Optional containing the pushdown expression, or empty if none exists
+ */
public Optional<String> getPushDownExpression()
{
return pushDownExpression;
}
+ /**
+ * Returns the remaining non-translatable RowExpression, if any.
+ *
+ * @return Optional containing the remaining expression, or empty if fully translatable
+ */
public Optional<RowExpression> getRemainingExpression()
{
return remainingExpression;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public Optional<String> getPushDownExpression() | |
| { | |
| return pushDownExpression; | |
| } | |
| public Optional<RowExpression> getRemainingExpression() | |
| { | |
| return remainingExpression; | |
| } | |
| /** | |
| * Returns the KQL query or column name that can be pushed down, if any. | |
| * | |
| * @return Optional containing the pushdown expression, or empty if none exists | |
| */ | |
| public Optional<String> getPushDownExpression() | |
| { | |
| return pushDownExpression; | |
| } | |
| /** | |
| * Returns the remaining non-translatable RowExpression, if any. | |
| * | |
| * @return Optional containing the remaining expression, or empty if fully translatable | |
| */ | |
| public Optional<RowExpression> getRemainingExpression() | |
| { | |
| return remainingExpression; | |
| } |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
around lines 68 to 76, the getter methods getPushDownExpression and
getRemainingExpression lack JavaDoc comments. Add concise JavaDoc comments above
each method explaining what the method returns, emphasizing that the return type
is Optional and what it may contain or represent, to maintain consistency and
improve clarity.
| public class ClpExpression | ||
| { | ||
| // Optional KQL query or column name representing the fully or partially translatable part of the expression. | ||
| private final Optional<String> pushDownExpression; | ||
|
|
||
| // The remaining (non-translatable) portion of the RowExpression, if any. | ||
| private final Optional<RowExpression> remainingExpression; | ||
|
|
||
| public ClpExpression(String pushDownExpression, RowExpression remainingExpression) | ||
| { | ||
| this.pushDownExpression = Optional.ofNullable(pushDownExpression); | ||
| this.remainingExpression = Optional.ofNullable(remainingExpression); | ||
| } | ||
|
|
||
| /** | ||
| * Creates an empty ClpExpression with neither pushdown nor remaining expressions. | ||
| */ | ||
| public ClpExpression() | ||
| { | ||
| this(null, null); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a ClpExpression from a fully translatable KQL query or column name. | ||
| * | ||
| * @param pushDownExpression | ||
| */ | ||
| public ClpExpression(String pushDownExpression) | ||
| { | ||
| this(pushDownExpression, null); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a ClpExpression from a non-translatable RowExpression. | ||
| * | ||
| * @param remainingExpression | ||
| */ | ||
| public ClpExpression(RowExpression remainingExpression) | ||
| { | ||
| this(null, remainingExpression); | ||
| } | ||
|
|
||
| public Optional<String> getPushDownExpression() | ||
| { | ||
| return pushDownExpression; | ||
| } | ||
|
|
||
| public Optional<RowExpression> getRemainingExpression() | ||
| { | ||
| return remainingExpression; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider adding standard object methods for better usability.
The class lacks equals(), hashCode(), and toString() methods, which are typically expected for data classes and would improve testability and debugging capabilities.
+ @Override
+ public boolean equals(Object obj)
+ {
+ if (this == obj) {
+ return true;
+ }
+ if (obj == null || getClass() != obj.getClass()) {
+ return false;
+ }
+ ClpExpression that = (ClpExpression) obj;
+ return pushDownExpression.equals(that.pushDownExpression) &&
+ remainingExpression.equals(that.remainingExpression);
+ }
+
+ @Override
+ public int hashCode()
+ {
+ return java.util.Objects.hash(pushDownExpression, remainingExpression);
+ }
+
+ @Override
+ public String toString()
+ {
+ return "ClpExpression{" +
+ "pushDownExpression=" + pushDownExpression +
+ ", remainingExpression=" + remainingExpression +
+ '}';
+ }🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
between lines 26 and 77, the class ClpExpression lacks the standard methods
equals(), hashCode(), and toString(). To fix this, implement these methods to
compare instances based on pushDownExpression and remainingExpression fields,
generate consistent hash codes from these fields, and provide a clear string
representation of the object including its field values. This will improve
usability, testability, and debugging.
Description
This PR adds
ClpConnectorPlanOptimizerandClpFilterToKqlConverterand related unit tests to allow the CLP connector to push down relevant filters to CLP for improved query performance. More details can be found in RFC.There are three main steps
ClpTableLayoutHandleand constructing a newTableScanNodeFilterNodewith any remaining predicates and the newTableScanNodeClpFilterToKqlConverterimplementsRowExpressionVisitor<ClpExpression, Void>and handles expression transformation and pushdown. Since KQL is not SQL-compatible, only certain types of filters can be converted, including:LIKEpredicate (the"^%[^%_]*%$"pattern is not pushed down)AND,OR,NOT)IS NULLsubstr(a, start, length) = 'abc'andsubstr(a, start) = 'abc'formsFor comparison or match expressions, one side must contain a
VariableReferenceExpression, and the other must be aConstantExpression--specifically a string or numeric literal. Expressions likea > corsubstr(a, 2) = lower(c)are not eligible for pushdown. In simple cases without logical operators, the SQL plan can be directly translated into a KQL query. However, for expressions involving logical operators, it's critical to ensure that all conditions are handled correctly.A naive approach would be to convert only the pushdown-eligible parts of the SQL query into KQL, letting Presto or Prestissimo handle the rest. But this can lead to incorrect execution plans and unintended behavior.
Consider the following query:
Since CLP doesn’t currently support
regexp_like, if we simply push downb = 2to CLP, Presto will only receive results whereb = 2, effectively changing the query semantics toregexp_like(a, '\d+b') AND b = 2.To prevent such issues, the pushdown logic follows these rules:
ORpushdown: AnORcondition is pushable if and only if all child expressions can be pushed down. In this case, all child expressions are pushed down together with the OR operator.ANDpushdown: AnANDcondition is not pushable if and only if none of its child expressions are pushable. Otherwise, pushable expressions are pushed down with theANDoperator, while non-pushable expressions remain in the original plan.CallExpressionand can be transformed into KQL syntax.Example transformations:
SQL WHERE Clause:
a = 2 AND b = '3'a: 2 AND b: "3"FilterNodeis removed.SQL WHERE Clause:
a.b LIKE '%string another_string' OR "c.d" = TRUEa.b: "*string another_string" OR c.d: trueFilterNodeis removed.SQL WHERE Clause:
a > 2 OR regexp_like(b, '\d+b')"*"a > 2 OR regexp_like(b, '\d+b')remains in theFilterNode.SQL WHERE Clause:
a > 2 AND regexp_like(b, '\d+b')a > 2regexp_like(b, '\d+b')remains in theFilterNode.SQL WHERE Clause:
NOT substr(a, 2, 5) = 'Hello' and b IS NULLNOT a: "?Hello*" AND NOT b: *FilterNodeis removed.Checklist
breaking change.
Validation performed
Summary by CodeRabbit