Skip to content

Conversation

@wraymo
Copy link

@wraymo wraymo commented Jul 14, 2025

Description

This is a split of #31. When implementing CLP UDFs, we need to transform each CLP_GET_* function into a VariableReferenceExpression. During this transformation, it's important to track all variables derived from CLP_GET_* so that they can later be included as outputs in the corresponding TableScanNode.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

All unit tests passed

Summary by CodeRabbit

Summary by CodeRabbit

  • Refactor

    • Enhanced filter conversion by consistently passing context information during expression processing, improving accuracy and maintainability.
    • Improved internal handling of variable-to-column mappings to ensure reliable expression translation.
  • Tests

    • Adjusted test cases to accommodate context tracking updates, maintaining validation of filter conversion behaviour.

@coderabbitai
Copy link

coderabbitai bot commented Jul 14, 2025

Walkthrough

The changes update the ClpFilterToKqlConverter and related code to consistently pass a Map<VariableReferenceExpression, ColumnHandle> context parameter through all visitor and helper methods, replacing the previous use of Void. The plan optimizer creates a copy of the assignments map and passes it as context. Test code changes include creating and passing a local copy of the assignments map to the converter.

Changes

File(s) Change Summary
.../ClpFilterToKqlConverter.java Changed all visitor and helper method signatures to use Map<VariableReferenceExpression, ColumnHandle> as context; updated recursive calls; substring equality methods now return Optional<ClpExpression>; logic in handleOr updated to check full pushdown; getVariableName updated to use context.
.../ClpPlanOptimizer.java In visitFilter, created a new HashMap copy of assignments and passed it as context to ClpFilterToKqlConverter.
.../TestClpFilterToKql.java Updated tryPushDown to create and pass a local copy of the assignments map to ClpFilterToKqlConverter.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Method
    participant Converter as ClpFilterToKqlConverter
    participant Expr as RowExpression

    Test->>Converter: Create local assignments map copy
    Test->>Expr: Prepare RowExpression
    Test->>Converter: Expr.accept(Converter, assignmentsMap)
    loop Visitor Traversal
        Converter->>Expr: accept(Converter, assignmentsMap)
        Expr-->>Converter: visitX(..., assignmentsMap)
        Converter->>assignmentsMap: (use or propagate context)
    end
Loading
sequenceDiagram
    participant PlanOpt as ClpPlanOptimizer
    participant ContextMap as Map<VariableReferenceExpression, ColumnHandle>
    participant Converter as ClpFilterToKqlConverter
    participant Predicate as RowExpression

    PlanOpt->>ContextMap: Create HashMap copy of assignments
    PlanOpt->>Converter: new ClpFilterToKqlConverter(...)
    PlanOpt->>Predicate: getPredicate()
    PlanOpt->>Converter: Predicate.accept(Converter, ContextMap)
Loading

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42028de and 61f828c.

📒 Files selected for processing (1)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
🔇 Additional comments (3)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (3)

16-16: New imports look appropriate for the functionality.

The added imports for ColumnHandle, VariableReferenceExpression, HashMap, and Map are necessary to support the new context parameter functionality in the ClpFilterToKqlConverter.

Also applies to: 18-18, 22-23


301-301: Correct implementation of context parameter passing.

The assignments map is properly passed as the context parameter to the ClpFilterToKqlConverter, which aligns with the PR objective of tracking VariableReferenceExpression objects.


304-304: Style formatting is consistent.

The method signature reformatting improves readability without changing functionality.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)

273-332: Add test coverage for CLP UDF variable collection

The tests create and pass the clpUdfVariables set but never verify its contents. Since the PR's main objective is to track variables from CLP_GET_* functions, consider adding test cases that:

  1. Include CLP_GET_* functions in the filter expressions
  2. Assert that the expected variables are collected in the set
  3. Verify the set remains empty when no CLP UDF functions are present

This would ensure the variable tracking functionality works as intended.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)

98-183: Context parameter is passed but never populated

The Set<VariableReferenceExpression> context parameter is now passed through all visitor methods, but there's no code that actually adds variables to this set. According to the PR objectives, this set should track variables from CLP_GET_* functions.

Should there be logic in visitCall or other methods to detect CLP UDF function calls and add their output variables to the context set?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20d8f9c and 9edad66.

📒 Files selected for processing (3)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (23 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (2 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)

84-91: Missing usage of collected CLP UDF variables

The clpUdfVariablesInFilterNode set is created and passed to the converter, but it's never used afterwards. According to the PR objectives, these variables should be tracked so they can be included as outputs in the corresponding TableScanNode. However, I don't see any code that adds these collected variables to the table scan node outputs.

Should there be additional logic after line 91 to handle the collected variables?

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)

365-373: Good refactoring to use Optional for substring methods

The change to return Optional<ClpExpression> in substring-related methods is a good improvement. It makes the API clearer by explicitly representing the absence of a valid interpretation rather than returning empty ClpExpression instances.

Also applies to: 481-503, 556-596

Comment on lines 120 to 145
@Override
public ClpExpression visitCall(CallExpression node, Void context)
public ClpExpression visitCall(CallExpression node, Set<VariableReferenceExpression> context)
{
FunctionHandle functionHandle = node.getFunctionHandle();
if (standardFunctionResolution.isNotFunction(functionHandle)) {
return handleNot(node);
return handleNot(node, context);
}

if (standardFunctionResolution.isLikeFunction(functionHandle)) {
return handleLike(node);
return handleLike(node, context);
}

FunctionMetadata functionMetadata = functionMetadataManager.getFunctionMetadata(node.getFunctionHandle());
Optional<OperatorType> operatorTypeOptional = functionMetadata.getOperatorType();
if (operatorTypeOptional.isPresent()) {
OperatorType operatorType = operatorTypeOptional.get();
if (operatorType.isComparisonOperator() && operatorType != IS_DISTINCT_FROM) {
return handleLogicalBinary(operatorType, node);
return handleLogicalBinary(operatorType, node, context);
}
if (BETWEEN == operatorType) {
return handleBetween(node);
return handleBetween(node, context);
}
}

return new ClpExpression(node);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Missing CLP UDF function detection logic

The visitCall method should detect CLP_GET_* functions and track their output variables in the context set, but there's no such logic implemented. Without this, the context parameter serves no purpose.

Consider adding logic to check if the function name starts with "CLP_GET_" and add the corresponding variable to the context set.


🏁 Script executed:

#!/bin/bash
# Search for CLP_GET functions or references in the codebase
echo "Searching for CLP_GET function references..."
rg -i "clp_get" --type java -A 3 -B 3

# Search for any UDF-related code that might give hints about expected implementation
echo -e "\nSearching for UDF-related code..."
rg -i "udf|user.defined.function" --type java -g "*Clp*.java" -A 2 -B 2

Length of output: 5707


Add CLP_GET_ UDF detection to ClpFilterToKqlConverter*

The visitCall method never checks for CLP_GET_* functions, so the clpUdfVariables set remains empty and your push-down logic cannot track UDF outputs. We need to detect any function whose name starts with CLP_GET_ and add its output variable to the context.

Locations to update:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java, method visitCall(CallExpression node, Set<VariableReferenceExpression> context)

Suggested diff:

     if (BETWEEN == operatorType) {
         return handleBetween(node, context);
     }
 }

+    // ≫ Detect CLP_GET_* UDFs and collect their output variable
+    String functionName = node.getFunctionHandle()
+                                .getSignature()
+                                .getName()
+                                .toUpperCase(Locale.ROOT);
+    if (functionName.startsWith("CLP_GET_")) {
+        // Assuming the UDF returns a single VariableReferenceExpression argument
+        Expression arg0 = node.getArguments().get(0);
+        if (arg0 instanceof VariableReferenceExpression) {
+            context.add((VariableReferenceExpression) arg0);
+        }
+        return new ClpExpression(node);
+    }
+
     return new ClpExpression(node);

Don’t forget to add:

import java.util.Locale;

With this in place, clpUdfVariables will be populated and TestClpFilterToKql will pass as intended.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
around lines 120 to 145, the visitCall method does not detect functions with
names starting with "CLP_GET_", causing the clpUdfVariables set to remain empty
and preventing tracking of UDF outputs. To fix this, add a check in visitCall to
detect if the function name starts with "CLP_GET_" (case-insensitive), and if
so, add the output variable of the node to the context set. Also, add the import
statement for java.util.Locale to support case-insensitive checks.

@wraymo wraymo changed the title feat: Set context to Set<VariableReferenceExpression> in ClpFilterToKqlConverter feat: Set context in ClpFilterToKqlConverter to track VariableReferenceExpression Jul 14, 2025
@kirkrodrigues kirkrodrigues requested a review from anlowee July 15, 2025 13:47
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)

299-299: Consider using ImmutableMap for better immutability.

Based on the retrieved learnings and past review comments, prefer using Guava's Immutable collections when possible. Since this is a copy operation that doesn't appear to require mutation within the test context, consider using ImmutableMap.copyOf().

-Map<VariableReferenceExpression, ColumnHandle> assignments = new HashMap<>(variableToColumnHandleMap);
+Map<VariableReferenceExpression, ColumnHandle> assignments = ImmutableMap.copyOf(variableToColumnHandleMap);

Note: You'll need to add the import for ImmutableMap:

+import com.google.common.collect.ImmutableMap;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 346476f and 42028de.

📒 Files selected for processing (1)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2)

293-296: LGTM! Method signature updated correctly.

The method signature properly includes the metadataFilterColumns parameter, which aligns with the PR objectives.


305-305: LGTM! Correctly passes assignments to converter.

The change properly passes the assignments map as context to the ClpFilterToKqlConverter, which aligns with the PR objective to track VariableReferenceExpression through the conversion process.

@wraymo wraymo requested a review from anlowee July 16, 2025 14:08
@kirkrodrigues kirkrodrigues changed the title feat: Set context in ClpFilterToKqlConverter to track VariableReferenceExpression feat: Set context in ClpFilterToKqlConverter to track VariableReferenceExpression. Jul 17, 2025
@kirkrodrigues kirkrodrigues merged commit 787783b into y-scope:release-0.293-clp-connector Jul 17, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants