Skip to content

Conversation

@wraymo
Copy link

@wraymo wraymo commented Jul 17, 2025

Description

This PR introduces new user-defined functions (UDFs) for the CLP connector to improve querying of semi-structured logs:

CLP_GET_* functions for retrieving values from JSON paths, targeting specific CLP column types (ClpString, Integer, Float, etc.). These UDFs enable direct access to dynamic fields and return corresponding Presto native types.
They are integrated with the query rewriting layer. During query optimization, calls to these functions are rewritten into normal column references or KQL query column symbols, ensuring efficient execution with no additional parsing overhead.

The PR doesn't update the documentation and will leave it for a future PR.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

All unit tests passed. End-to-end testing worked.

Summary by CodeRabbit

  • New Features

    • Added placeholder scalar CLP functions for typed JSON-path extraction and registered them with the CLP plugin.
    • Added a plan rewriter to translate CLP UDFs into query variables to enable optimization and pushdown.
  • Refactor

    • Reorganized optimizer and converter components; centralized filter pushdown and extended it to cover metadata filters.
    • Simplified converter APIs by encapsulating variable-to-column mappings.
  • Tests

    • Added integration tests for UDF rewriting and compute-pushdown; improved test setup.
  • Chores

    • Expanded test-scoped dependencies and updated native-execution submodule pointer.

@coderabbitai
Copy link

coderabbitai bot commented Jul 17, 2025

Walkthrough

Adds CLP scalar function stubs and registers them; introduces a CLP UDF rewriter and a compute-pushdown optimizer (renamed and relocated); refactors filter-to-KQL conversion to store assignments as a field; wires optimizers into the provider; expands tests and test dependencies; updates Velox submodule pointer.

Changes

Cohort / File(s) Change Summary
Build
presto-clp/pom.xml
Added test-scoped dependencies: presto-tests and test-jar for presto-main-base.
CLP scalar functions
presto-clp/src/main/java/.../ClpFunctions.java
New ClpFunctions class adding placeholder scalar functions: CLP_GET_BIGINT, CLP_GET_DOUBLE, CLP_GET_BOOL, CLP_GET_STRING, CLP_GET_STRING_ARRAY. Each is a SQL-annotated stub that currently throws UnsupportedOperationException.
Plugin registration
presto-clp/src/main/java/.../ClpPlugin.java
Added getFunctions() returning ClpFunctions to register the new scalar functions.
Connector glue
presto-clp/src/main/java/.../ClpConnector.java
Added import for ClpPlanOptimizerProvider (moved to optimization package); no behavioral changes.
Plan optimizers — rename/relocate
presto-clp/src/main/java/.../optimization/ClpComputePushDown.java
presto-clp/src/main/java/.../optimization/ClpPlanOptimizerProvider.java
Renamed/moved ClpPlanOptimizerClpComputePushDown; extracted filter handling into processFilter; pushdown now considers KQL and metadata filters; provider registers ClpUdfRewriter (logical) and constructs ClpComputePushDown (physical).
Filter-to-KQL converter (refactor & move)
presto-clp/src/main/java/.../optimization/ClpFilterToKqlConverter.java
Moved package; removed visitor context parameter and made the variable-to-column assignments map a final instance field passed via constructor; visitor signatures and helpers updated to use stored assignments (calls now pass null context).
UDF rewriter (new optimizer)
presto-clp/src/main/java/.../optimization/ClpUdfRewriter.java
New connector plan optimizer that rewrites CLP_GET_* UDF calls into VariableReferenceExpressions backed by ClpColumnHandles, injects those columns into TableScanNodes, and rewrites Project/Filter nodes; enforces constant JSON-path argument and encodes JSON path into variable names.
Tests — metadata / utils
presto-clp/src/test/java/.../ClpMetadataDbSetUp.java
Added getDbPath() getter to DbHandle.
Tests — filter-to-KQL
presto-clp/src/test/java/.../TestClpFilterToKql.java
Minor formatting; construct ClpFilterToKqlConverter with an assignments map (constructor change); method signatures reformatted only.
Tests — query base
presto-clp/src/test/java/.../TestClpQueryBase.java
Static init registers ClpFunctions with the test function manager; modified city row type to add top-level Name field; removed one overloaded getRowExpression variant.
Tests — UDF rewriter (new)
presto-clp/src/test/java/.../TestClpUdfRewriter.java
New integration tests for UDF rewriting and compute-pushdown; includes setup/teardown, three test cases, and ClpTableScanMatcher to assert table-scan layout and columns.
Native execution submodule
presto-native-execution/velox
Updated Velox submodule pointer (metadata-only commit bump).

Sequence Diagram(s)

sequenceDiagram
    participant Planner as QueryPlanner
    participant UDF as ClpUdfRewriter
    participant Compute as ClpComputePushDown
    participant Converter as ClpFilterToKqlConverter
    participant Scan as TableScanNode

    Planner->>UDF: optimize(plan)
    Note right of UDF #E9F5FF: Rewrite CLP_GET_* calls\ncreate ClpColumnHandles & variables
    UDF-->>Planner: plan with injected variables/columns

    Planner->>Compute: optimize(plan)
    Compute->>Converter: convert filter (uses assignments field)
    Note right of Converter #FFF4E6: derive kqlQuery and metadataFilter
    alt kqlQuery or metadataFilter present
        Compute->>Scan: replace TableScan with updated layout & assignments
        Compute-->>Planner: plan with pushed-down KQL/metadata and updated TableScan
        alt remaining predicate exists
            Planner->>Planner: add FilterNode above updated TableScan
        end
    else
        Compute-->>Planner: return original plan
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

throw new UnsupportedOperationException("CLP_GET_INT is a placeholder function without implementation.");
}

@ScalarFunction(value = "CLP_GET_FLOAT", deterministic = false)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just find that in our RFC:

we propose adding a handful of CLP_GET_<TYPE>(<json-path>) UDFs, where <TYPE> refers to the Presto type of the column and <json_path> refers to the column's JSON path.

So the should be a Presto type. Shall we rename this UDF to CLP_GET_DOUBLE then?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we should update the RFC

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a table in RFC

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have the table. I mean the description in the RFC that I quoted is not precise what it means.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If that's the case, we also need to rename CLP_GET_INT to CLP_GET_BIGINT

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to rename this CLP_GET_INT to CLP_GET_BIGINT?

@anlowee
Copy link

anlowee commented Sep 4, 2025

Can you use the Velox commit: 7c55762bc37edc3f93cb8f5f1ff625e06cc10deb (which is used by current main branch)?
Yours is one commit behind of this.

@wraymo
Copy link
Author

wraymo commented Sep 5, 2025

Can you use the Velox commit: 7c55762bc37edc3f93cb8f5f1ff625e06cc10deb (which is used by current main branch)? Yours is one commit behind of this.

Need to update the commit hash to support duplicate columns in projection, otherwise you will run into the bugs you mentioned before.

@anlowee
Copy link

anlowee commented Sep 5, 2025

resolve review comments

I think 7c55762bc37edc3f93cb8f5f1ff625e06cc10deb should already have included your changes?

@wraymo
Copy link
Author

wraymo commented Sep 5, 2025

No, there's also a change in Velox, not just CLP

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)

68-71: Per-table mandatory-filter check is incorrect (false negatives on joins)

Using a single boolean hasVisitedFilter skips the required-filter enforcement for other tables when any one Filter was processed. Track and validate per table scope instead.

Apply:

@@
-        private final Set<String> tableScopeSet;
-        private boolean hasVisitedFilter;
+        private final Set<String> tableScopeSet;
+        // Tables for which we actually processed a Filter directly above the scan
+        private final Set<String> tablesWithFilter;
@@
-            hasVisitedFilter = false;
-            tableScopeSet = new HashSet<>();
+            tableScopeSet = new HashSet<>();
+            tablesWithFilter = new HashSet<>();
@@
-        // Throw exception if any required split filters are missing
-        if (!rewriter.tableScopeSet.isEmpty() && !rewriter.hasVisitedFilter) {
-            splitFilterProvider.checkContainsRequiredFilters(rewriter.tableScopeSet, "");
-        }
+        // Enforce required split filters on tables that had no qualifying Filter processed
+        if (!rewriter.tableScopeSet.isEmpty()) {
+            Set<String> tablesMissingFilter = new HashSet<>(rewriter.tableScopeSet);
+            tablesMissingFilter.removeAll(rewriter.tablesWithFilter);
+            if (!tablesMissingFilter.isEmpty()) {
+                splitFilterProvider.checkContainsRequiredFilters(tablesMissingFilter, "");
+            }
+        }

And mark a table as handled in processFilter (see separate diff below).

♻️ Duplicate comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4)

101-106: Don’t short‑circuit traversal; still walk children when Filter isn’t directly on TableScan

Returning node prevents visiting its source (e.g., Filter -> Project -> TableScan), causing missed scopes and pushdown opportunities.

-            if (!(node.getSource() instanceof TableScanNode)) {
-                return node;
-            }
+            if (!(node.getSource() instanceof TableScanNode)) {
+                return context.defaultRewrite(node);
+            }

Consider also supporting the intervening Project (or ensure optimizer ordering guarantees Filter is adjacent to TableScan).


129-136: Avoid unnecessary validation when no metadata filter is present

Only call checkContainsRequiredFilters when metadataSqlQuery is present and non-empty. Keeps behaviour aligned with metadata‑only requirements and saves work.

-            // Perform required metadata filter checks before handling the KQL query (if kqlQuery
-            // isn't present, we'll return early, skipping subsequent checks).
-            splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), metadataSqlQuery.orElse(""));
-            boolean hasMetadataFilter = metadataSqlQuery.isPresent() && !metadataSqlQuery.get().isEmpty();
-            if (hasMetadataFilter) {
+            boolean hasMetadataFilter = metadataSqlQuery.isPresent() && !metadataSqlQuery.get().isEmpty();
+            if (hasMetadataFilter) {
+                // Perform required metadata filter checks only when we actually have a metadata filter
+                splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), metadataSqlQuery.get());
                 metadataSqlQuery = Optional.of(splitFilterProvider.remapSplitFilterPushDownExpression(tableScope, metadataSqlQuery.get()));
                 log.debug("Metadata SQL query: %s", metadataSqlQuery.get());
             }

115-116: Nit: prefer format over string concatenation and avoid redundant toString()

Minor readability and consistency improvement.

-            String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString();
+            String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName());

110-116: Mark table as “handled” after deriving its scope

Move the “handled” marker to a point where tableScope is available.

-            hasVisitedFilter = true;
-
             TableHandle tableHandle = tableScanNode.getTable();
             ClpTableHandle clpTableHandle = (ClpTableHandle) tableHandle.getConnectorHandle();
 
-            String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString();
+            String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName());
+            tablesWithFilter.add(tableScope);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)

74-90: Fix Javadoc typo and improve type references

Correct “correh hsponding” and format referenced types.

-     * @return a map from column handle to its correh hsponding scan-level variable
+     * @return a map from {@code ColumnHandle} to its corresponding scan-level variable
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2)

181-229: Add tests for additional UDFs and alias-agnostic assertions

Cover CLP_GET_FLOAT, CLP_GET_BOOL, CLP_GET_STRING_ARRAY in filters and projects, and prefer alias-agnostic assertions to reduce brittleness.


142-179: Add error/edge-case tests for UDFs and special characters

Strengthen coverage:

  • Non-constant and multi-argument CLP_GET_* → expect PrestoException.
  • Special characters in JSON paths ($, @, -, spaces) → ensure encoded variable names don’t break planning.
  • Duplicate paths across different CLP_GET_* calls → dedupe behaviour.

I can provide concrete test methods mirroring the existing patterns.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a6ea16e and 1175af3.

📒 Files selected for processing (3)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
🧬 Code graph analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (3)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h (1)
  • RowExpression (394-416)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
  • Rewriter (75-174)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
  • ClpComputePushDown (47-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
  • ClpUdfRewriter (58-306)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (33-155)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (50-171)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
  • ClpMetadataDbSetUp (45-264)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
  • ClpExpression (29-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: prestissimo-worker-images-build

Comment on lines +102 to +117
private class Rewriter
extends ConnectorPlanRewriter<Void>
{
private final PlanNodeIdAllocator idAllocator;
private final VariableAllocator variableAllocator;
private final Map<ColumnHandle, VariableReferenceExpression> globalColumnVarMap;

public Rewriter(
PlanNodeIdAllocator idAllocator,
VariableAllocator variableAllocator,
Map<ColumnHandle, VariableReferenceExpression> globalColumnVarMap)
{
this.idAllocator = idAllocator;
this.variableAllocator = variableAllocator;
this.globalColumnVarMap = globalColumnVarMap;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Global column→variable map can produce symbol collisions across multiple scans

globalColumnVarMap is shared for the whole plan; identical CLP paths/types on different scans will reuse the same VariableReferenceExpression, risking duplicate producers and ambiguous symbols when branches merge.

Recommended directions:

  • Scope mappings per TableScan (e.g., Map<PlanNodeId, Map<ColumnHandle, VariableReferenceExpression>>) and consult the mapping of the current subtree only.
  • Or ensure per-scan uniqueness by suffixing variable names with a scan-specific prefix/suffix.
  • Add a join test: SELECT … FROM t1 JOIN t2 ON … WHERE CLP_GET_INT('user_id') on both sides; verify unique symbols and correct scan assignments.

I can draft a scoped rewrite that tracks “current scan” via a stack during traversal if you want.


🏁 Script executed:

#!/bin/bash
set -e

FILE=$(fd ClpUdfRewriter.java | head -n 1)
echo "Using file: $FILE"
rg -n 'class Rewriter' -C5 "$FILE"
rg -n 'globalColumnVarMap' -C3 "$FILE"
rg -n 'visitTableScan' -C3 "$FILE"
rg -n 'new Rewriter' -C3 .

Length of output: 2873


🏁 Script executed:

#!/bin/bash
# Locate ClpColumnHandle.java, print class definition and equals/hashCode
file=$(fd ClpColumnHandle.java | head -n1)
echo "File: $file"
# Show class declaration and relevant fields
sed -n '1,200p' "$file"
# Search for equals() and hashCode() implementations
grep -R "boolean equals" -n "$file" || echo "No equals() found"
grep -R "hashCode" -n "$file" || echo "No hashCode() found"

Length of output: 3633


Scope column→variable mapping per scan
globalColumnVarMap is shared across all TableScans; identical ClpColumnHandle(jsonPath,type) on different scans reuses the same VariableReferenceExpression, leading to duplicate producers and ambiguous symbols in joins. Scope mappings per scan (e.g. Map<PlanNodeId,Map<ColumnHandle,VariableReferenceExpression>>) or append scan-specific prefixes, and add a join test (e.g. SELECT … FROM t1 JOIN t2 … WHERE CLP_GET_INT('user_id')) to verify unique symbols.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
around lines 102–117, the Rewriter currently uses a single globalColumnVarMap
shared across all TableScans which causes identical
ClpColumnHandle(jsonPath,type) instances from different scans to reuse the same
VariableReferenceExpression (producing duplicate producers and ambiguous symbols
in joins); change the mapping to be scoped per scan (for example use
Map<PlanNodeId, Map<ColumnHandle, VariableReferenceExpression>> or include a
scan-specific prefix when creating variable names keyed by PlanNodeId) so that
each TableScan gets its own column→variable map; update all places that
read/write globalColumnVarMap to use the per-scan map (create the inner map at
scan entry, lookup by PlanNodeId when allocating/rewriting variables), ensure
generated VariableReferenceExpression names include the scan id to avoid
collisions, and add a join integration test (e.g. SELECT … FROM t1 JOIN t2 …
WHERE CLP_GET_INT('user_id')) to verify symbols are unique across scans.

Comment on lines +160 to +171
if (expression instanceof CallExpression) {
CallExpression call = (CallExpression) expression;
String functionName = functionManager.getFunctionMetadata(call.getFunctionHandle()).getName().getObjectName().toUpperCase();

if (functionName.startsWith("CLP_GET_")) {
if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) {
throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
"CLP_GET_* UDF must have a single constant string argument");
}

ConstantExpression constant = (ConstantExpression) call.getArguments().get(0);
String jsonPath = ((Slice) constant.getValue()).toStringUtf8();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate argument type is VARCHAR and improve error message

Be explicit on expected type; produce actionable error.

-                if (functionName.startsWith("CLP_GET_")) {
-                    if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) {
-                        throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
-                                "CLP_GET_* UDF must have a single constant string argument");
-                    }
+                if (functionName.startsWith("CLP_GET_")) {
+                    if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) {
+                        throw new PrestoException(
+                                CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
+                                String.format("CLP_GET_* must have exactly one constant string argument; got %d",
+                                        call.getArguments().size()));
+                    }
 
-                    ConstantExpression constant = (ConstantExpression) call.getArguments().get(0);
+                    ConstantExpression constant = (ConstantExpression) call.getArguments().get(0);
+                    if (!constant.getType().equals(com.facebook.presto.common.type.VarcharType.VARCHAR)) {
+                        throw new PrestoException(
+                                CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
+                                "CLP_GET_* argument must be a VARCHAR literal");
+                    }
                     String jsonPath = ((Slice) constant.getValue()).toStringUtf8();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (expression instanceof CallExpression) {
CallExpression call = (CallExpression) expression;
String functionName = functionManager.getFunctionMetadata(call.getFunctionHandle()).getName().getObjectName().toUpperCase();
if (functionName.startsWith("CLP_GET_")) {
if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) {
throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
"CLP_GET_* UDF must have a single constant string argument");
}
ConstantExpression constant = (ConstantExpression) call.getArguments().get(0);
String jsonPath = ((Slice) constant.getValue()).toStringUtf8();
if (expression instanceof CallExpression) {
CallExpression call = (CallExpression) expression;
String functionName = functionManager.getFunctionMetadata(call.getFunctionHandle())
.getName()
.getObjectName()
.toUpperCase();
if (functionName.startsWith("CLP_GET_")) {
if (call.getArguments().size() != 1
|| !(call.getArguments().get(0) instanceof ConstantExpression)) {
throw new PrestoException(
CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
String.format(
"CLP_GET_* must have exactly one constant string argument; got %d",
call.getArguments().size()));
}
ConstantExpression constant = (ConstantExpression) call.getArguments().get(0);
if (!constant.getType().equals(com.facebook.presto.common.type.VarcharType.VARCHAR)) {
throw new PrestoException(
CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
"CLP_GET_* argument must be a VARCHAR literal");
}
String jsonPath = ((Slice) constant.getValue()).toStringUtf8();

Comment on lines +243 to +261
private String encodeJsonPath(String jsonPath)
{
StringBuilder sb = new StringBuilder();
for (char c : jsonPath.toCharArray()) {
if (Character.isUpperCase(c)) {
sb.append("_ux").append(Character.toLowerCase(c));
}
else if (c == '.') {
sb.append("_dot_");
}
else if (c == '_') {
sb.append("_und_");
}
else {
sb.append(c);
}
}
return sb.toString();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Robustly encode variable names: handle special characters, leading digits, and length

Current encoding leaves $, @, -, spaces, etc., which can break downstream parsing. Also no length bound.

-        private String encodeJsonPath(String jsonPath)
-        {
-            StringBuilder sb = new StringBuilder();
-            for (char c : jsonPath.toCharArray()) {
-                if (Character.isUpperCase(c)) {
-                    sb.append("_ux").append(Character.toLowerCase(c));
-                }
-                else if (c == '.') {
-                    sb.append("_dot_");
-                }
-                else if (c == '_') {
-                    sb.append("_und_");
-                }
-                else {
-                    sb.append(c);
-                }
-            }
-            return sb.toString();
-        }
+        private String encodeJsonPath(String jsonPath)
+        {
+            // Allow only [a-z0-9_]; encode everything else as _xHH_ (hex), and normalise case.
+            final int MAX_LEN = 255;
+            StringBuilder sb = new StringBuilder(jsonPath.length() * 2);
+            for (char c : jsonPath.toCharArray()) {
+                char lc = Character.toLowerCase(c);
+                if ((lc >= 'a' && lc <= 'z') || (lc >= '0' && lc <= '9') || lc == '_') {
+                    sb.append(lc);
+                }
+                else if (c == '.') {
+                    sb.append("_dot_");
+                }
+                else if (c == '_') {
+                    sb.append("_und_");
+                }
+                else if (Character.isUpperCase(c)) {
+                    sb.append("_ux").append(Character.toLowerCase(c));
+                }
+                else {
+                    sb.append(String.format("_x%02x_", (int) c));
+                }
+            }
+            // Ensure leading char is not a digit
+            if (sb.length() > 0 && Character.isDigit(sb.charAt(0))) {
+                sb.insert(0, "v_");
+            }
+            String encoded = sb.toString();
+            if (encoded.length() > MAX_LEN) {
+                // Truncate and add a stable hash suffix to avoid collisions
+                int keep = MAX_LEN - 9; // room for _hxxxxxx
+                int hash = Math.abs(encoded.hashCode());
+                encoded = encoded.substring(0, keep) + String.format("_h%06x", hash & 0xFFFFFF);
+            }
+            return encoded;
+        }

Consider also encoding characters like @ and $ explicitly in tests to validate this path.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +270 to +291
private TableScanNode buildNewTableScanNode(TableScanNode node)
{
Set<VariableReferenceExpression> outputVars = new LinkedHashSet<>(node.getOutputVariables());
Map<VariableReferenceExpression, ColumnHandle> newAssignments = new HashMap<>(node.getAssignments());

// Add any missing variables for known handles
globalColumnVarMap.forEach((handle, var) -> {
outputVars.add(var);
newAssignments.put(var, handle);
});

return new TableScanNode(
node.getSourceLocation(),
idAllocator.getNextId(),
node.getTable(),
new ArrayList<>(outputVars),
newAssignments,
node.getTableConstraints(),
node.getCurrentConstraint(),
node.getEnforcedConstraint(),
node.getCteMaterializationInfo());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Limit added assignments to only variables used in the current scan subtree

Adding all entries from globalColumnVarMap to every scan inflates outputs and I/O. After scoping the map per scan (previous comment), add only those handles required by this scan.

As an interim guard, filter by connector/table identity to avoid cross-table pollution.


🏁 Script executed:

#!/bin/bash
# Locate the ClpUdfRewriter.java file
FILE=$(find . -type f -name 'ClpUdfRewriter.java' | head -n1)
echo "Found file: $FILE"

# Show declaration and usages of globalColumnVarMap
grep -n "globalColumnVarMap" "$FILE"
grep -n "private.*globalColumnVarMap" "$FILE"

Length of output: 1190


Filter assignments to only columns used by this scan
In buildNewTableScanNode (lines 270–291), don’t unconditionally add every entry from globalColumnVarMap. Instead, replace the globalColumnVarMap.forEach(…) block with a filtered loop that only includes handles belonging to node.getTable() (e.g.

globalColumnVarMap.entrySet().stream()
    .filter(e -> e.getKey().getTableHandle().equals(node.getTable()))
    .forEach(e -> {
        outputVars.add(e.getValue());
        newAssignments.put(e.getValue(), e.getKey());
    });

) to avoid inflating outputs and cross-table pollution.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
around lines 270 to 291, the current globalColumnVarMap.forEach unconditionally
adds every global column variable to the new TableScanNode causing cross-table
pollution; replace that block with a filtered iteration that only adds entries
whose ColumnHandle belongs to the same table as node.getTable() (i.e. test
handle.getTableHandle().equals(node.getTable()) before adding), adding only the
matching var->handle pairs to outputVars and newAssignments so the scan only
exposes columns for its own table.

Comment on lines 231 to 266
@Test
public void testScanProjectFilter()
{
TransactionId transactionId = localQueryRunner.getTransactionManager().beginTransaction(false);
Session session = testSessionBuilder().setCatalog("clp").setSchema("default").setTransactionId(transactionId).build();

Plan plan = localQueryRunner.createPlan(
session,
"SELECT LOWER(city.Name), LOWER(CLP_GET_STRING('user')) from test WHERE CLP_GET_INT('user_id') = 0 AND LOWER(city.Name) = 'BEIJING'",
WarningCollector.NOOP);
ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);

PlanAssert.assertPlan(
session,
localQueryRunner.getMetadata(),
(node, sourceStats, lookup, s, types) -> PlanNodeStatsEstimate.unknown(),
new Plan(optimizedPlan, plan.getTypes(), StatsAndCosts.empty()),
anyTree(
project(
ImmutableMap.of(
"lower",
PlanMatchPattern.expression("lower(city.Name)"),
"lower_0",
PlanMatchPattern.expression("lower(user)")),
filter(
expression("lower(city.Name) = 'BEIJING'"),
ClpTableScanMatcher.clpTableScanPattern(
new ClpTableLayoutHandle(table, Optional.of("(user_id: 0)"), Optional.empty()),
ImmutableSet.of(
new ClpColumnHandle("user", VARCHAR),
new ClpColumnHandle("user_id", BIGINT),
city))))));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Add tests for Filter→Project→TableScan and multi-table join scenarios

  • Validate pushdown when Filter is not directly on TableScan (or confirm optimizer ordering).
  • Validate per-table mandatory filter enforcement and symbol uniqueness when two scans use the same JSON path.

I can draft these plans using LocalQueryRunner if desired.


🏁 Script executed:

#!/bin/bash
set -e

# Search for SQL JOIN usage in TestClpUdfRewriter.java
rg -n "SELECT .*JOIN" -C3 presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java || echo "No SQL join queries found"

# Search for any occurrences of 'join' (case-insensitive) in TestClpUdfRewriter.java
rg -n -i "join" -C3 presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java || echo "No 'join' occurrences found"

# Search for filter→project→tableScan sequence in test matchers
rg -nP "filter\\s*\\(.*project" -C3 presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java || echo "No filter→project pattern found"

Length of output: 537


Add tests for Filter→Project→TableScan ordering and multi-table joins

  • Cover a case where a Filter sits above a Project above a TableScan to confirm pushdown applies when Filter isn’t directly on the scan.
  • Add a join test with two TableScans over the same JSON path to verify per-table mandatory filters and unique symbol generation.
🤖 Prompt for AI Agents
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 231 to 266, add two tests: one for Filter→Project→TableScan
ordering and one for multi-table join with same JSON path. For the
Filter→Project→TableScan test, construct a query where a FILTER sits above a
PROJECT above the CLP table scan (e.g. SELECT LOWER(col) FROM test WHERE
CLP_GET_INT('id') = 0 AND LOWER(col) = 'X'), run the same plan creation,
ClpUdfRewriter and ClpComputePushDown passes, and assert the optimized plan
contains a Project over a Filter over a ClpTableScan with the pushdown predicate
represented in ClpTableLayoutHandle (Optional.of("(id: 0)")) and both projected
symbols present. For the multi-table join test, create a query joining the same
JSON path table twice (e.g. FROM test t1 JOIN test t2 ON ... with
CLP_GET_INT('id') filters on each side), run the same rewrite/optimize flow, and
assert the plan shows two ClpTableScan matchers each with their own
ClpTableLayoutHandle containing the correct per-table mandatory filter and that
symbols/column handles are distinct (unique names) to ensure per-table pushdown
isolation.

Comment on lines +268 to +323
private static final class ClpTableScanMatcher
implements Matcher
{
private final ClpTableLayoutHandle expectedLayoutHandle;
private final Set<ColumnHandle> expectedColumns;

private ClpTableScanMatcher(ClpTableLayoutHandle expectedLayoutHandle, Set<ColumnHandle> expectedColumns)
{
this.expectedLayoutHandle = expectedLayoutHandle;
this.expectedColumns = expectedColumns;
}

static PlanMatchPattern clpTableScanPattern(ClpTableLayoutHandle layoutHandle, Set<ColumnHandle> columns)
{
return node(TableScanNode.class).with(new ClpTableScanMatcher(layoutHandle, columns));
}

@Override
public boolean shapeMatches(PlanNode node)
{
return node instanceof TableScanNode;
}

@Override
public MatchResult detailMatches(
PlanNode node,
StatsProvider stats,
Session session,
Metadata metadata,
SymbolAliases symbolAliases)
{
checkState(shapeMatches(node), "Plan testing framework error: shapeMatches returned false");
TableScanNode tableScanNode = (TableScanNode) node;
ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get();

// Check layout handle
if (!expectedLayoutHandle.equals(actualLayoutHandle)) {
return NO_MATCH;
}

// Check assignments contain expected columns
Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments();
Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values());

if (!expectedColumns.equals(actualColumns)) {
return NO_MATCH;
}

SymbolAliases.Builder aliasesBuilder = SymbolAliases.builder();
for (VariableReferenceExpression variable : tableScanNode.getOutputVariables()) {
aliasesBuilder.put(variable.getName(), new SymbolReference(variable.getName()));
}

return match(aliasesBuilder.build());
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Matcher: Optional.get without presence check

If a layout isn’t present, get() will throw. Not an issue in current tests but easy to harden.

-            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get();
+            Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout();
+            if (!layout.isPresent()) {
+                return NO_MATCH;
+            }
+            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private static final class ClpTableScanMatcher
implements Matcher
{
private final ClpTableLayoutHandle expectedLayoutHandle;
private final Set<ColumnHandle> expectedColumns;
private ClpTableScanMatcher(ClpTableLayoutHandle expectedLayoutHandle, Set<ColumnHandle> expectedColumns)
{
this.expectedLayoutHandle = expectedLayoutHandle;
this.expectedColumns = expectedColumns;
}
static PlanMatchPattern clpTableScanPattern(ClpTableLayoutHandle layoutHandle, Set<ColumnHandle> columns)
{
return node(TableScanNode.class).with(new ClpTableScanMatcher(layoutHandle, columns));
}
@Override
public boolean shapeMatches(PlanNode node)
{
return node instanceof TableScanNode;
}
@Override
public MatchResult detailMatches(
PlanNode node,
StatsProvider stats,
Session session,
Metadata metadata,
SymbolAliases symbolAliases)
{
checkState(shapeMatches(node), "Plan testing framework error: shapeMatches returned false");
TableScanNode tableScanNode = (TableScanNode) node;
ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get();
// Check layout handle
if (!expectedLayoutHandle.equals(actualLayoutHandle)) {
return NO_MATCH;
}
// Check assignments contain expected columns
Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments();
Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values());
if (!expectedColumns.equals(actualColumns)) {
return NO_MATCH;
}
SymbolAliases.Builder aliasesBuilder = SymbolAliases.builder();
for (VariableReferenceExpression variable : tableScanNode.getOutputVariables()) {
aliasesBuilder.put(variable.getName(), new SymbolReference(variable.getName()));
}
return match(aliasesBuilder.build());
}
}
@Override
public MatchResult detailMatches(
PlanNode node,
StatsProvider stats,
Session session,
Metadata metadata,
SymbolAliases symbolAliases)
{
checkState(shapeMatches(node), "Plan testing framework error: shapeMatches returned false");
TableScanNode tableScanNode = (TableScanNode) node;
Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout();
if (!layout.isPresent()) {
return NO_MATCH;
}
ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();
// Check layout handle
if (!expectedLayoutHandle.equals(actualLayoutHandle)) {
return NO_MATCH;
}
// Check assignments contain expected columns
Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments();
Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values());
if (!expectedColumns.equals(actualColumns)) {
return NO_MATCH;
}
SymbolAliases.Builder aliasesBuilder = SymbolAliases.builder();
for (VariableReferenceExpression variable : tableScanNode.getOutputVariables()) {
aliasesBuilder.put(variable.getName(), new SymbolReference(variable.getName()));
}
return match(aliasesBuilder.build());
}
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 268 to 323, the matcher calls get() on the optional layout without
checking presence which can throw; modify the matcher to guard the optional
before calling get() — e.g., if the table's layout Optional is empty return
NO_MATCH (or assert presence with a clear failure message), and only then
retrieve the layout; do the same if any other Optional may be absent so the
matcher fails gracefully instead of throwing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (4)

197-216: Make project assertions alias-agnostic to decouple from encoded variable names

Asserting "user_und_id", "is_uxholiday", etc., ties tests to the current encoding scheme. Prefer alias-agnostic matchers (e.g., anySymbol) or assert that projected expressions reference the expected scan variables/ColumnHandles rather than exact names.


268-323: Guard Optional layout before get() to avoid NPE in matcher

tableScanNode.getTable().getLayout().get() can throw when no layout is set. Make the matcher fail gracefully.

Apply this diff:

@@
-            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get();
+            Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout();
+            if (!layout.isPresent()) {
+                return NO_MATCH;
+            }
+            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();

142-179: Add negative and special-character path tests

Cover error paths for: non-constant args (e.g., CLP_GET_STRING(city.Name)), wrong arity, and JSON paths with special characters like '$' or '@'. Also test duplicate-path dedup semantics across multiple CLP_GET_* calls.

I can add minimal tests expecting PrestoException for invalid usages and a behavioural test pinning current handling of special characters. Want me to draft them?


103-133: Add join and per-table scope tests (join isolation for pushdown)

Leveraging the retrieved learning about determining scope from each Filter’s immediate TableScan, add a join test with two scans and independent CLP_GET_* filters to ensure per-table mandatory filter enforcement and symbol isolation.

Would you like me to sketch a LocalQueryRunner-based join test asserting two distinct ClpTableScan matchers each with its own mandatory filter?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1175af3 and 644f77b.

📒 Files selected for processing (1)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
🧬 Code graph analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
  • ClpComputePushDown (47-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
  • ClpUdfRewriter (58-306)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (33-155)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (50-171)
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java (1)
  • LocalQueryRunner (304-1248)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
  • ClpMetadataDbSetUp (45-264)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2)

148-157: Solid end-to-end coverage of UDF rewrite + pushdown path

This validates rewriting for INT/FLOAT/STRING/BOOL/STRING_ARRAY and confirms residuals remain above the scan. Good.


231-266: Good: covers Project-over-Filter-over-Scan ordering

This ensures pushdown applies even when a Project sits between Filter and TableScan and validates residual predicate handling.

Comment on lines +167 to +178
new ClpTableLayoutHandle(
table,
Optional.of(
"(((user_id: 0 AND fare < 50.0) AND (city: \"SF\" AND isHoliday: true)))"),
Optional.empty()),
ImmutableSet.of(
city,
fare,
isHoliday,
new ClpColumnHandle("user_id", BIGINT),
new ClpColumnHandle("city", VARCHAR),
new ClpColumnHandle("tags", new ArrayType(VARCHAR)))))));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Optionally loosen layout equality to semantic checks

If ClpTableLayoutHandle exposes getters, compare table handle equality and assert kqlQuery Optional presence plus substring membership of required conjuncts instead of strict object equality. This will reduce future churn from benign formatting changes.

Also applies to: 303-306

🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 167-178 (and similarly at 303-306), the test currently asserts
strict equality on ClpTableLayoutHandle which is brittle to formatting; instead,
use the handle's getters to assert the table handle equals the expected table,
assert that the kqlQuery Optional is present, and verify the kqlQuery string
contains the required conjunct substrings (e.g., "user_id: 0", "fare < 50.0",
"city: \"SF\"", "isHoliday: true") rather than comparing the whole object
string; update the assertions accordingly so they check semantic membership and
presence instead of exact object equality.

🧹 Nitpick (assertive)

Reduce brittleness: avoid asserting exact KQL string formatting

The equality on a fully materialized KQL string makes the test fragile to harmless reformatting/parenthesis changes. Prefer asserting presence of required conjuncts (e.g., contains "user_id: 0", "fare < 50.0", 'city: "SF"', "isHoliday: true") and that Optional is present, rather than full-string equality. Consider enhancing ClpTableScanMatcher to compare by parts.


Plan plan = localQueryRunner.createPlan(
session,
"SELECT LOWER(city.Name), LOWER(CLP_GET_STRING('user')) from test WHERE CLP_GET_INT('user_id') = 0 AND LOWER(city.Name) = 'BEIJING'",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace the LOWER(city.Name) in the SELECT with LOWER(CLP_GET_STRING(city.Name)) to test the case when user uses both UDF and accesses the field directly?

Also does the LOWER(city.Name) = 'BEIJING' condition make sense? Do you mean UPPER or beijing? Actually there are other places above also have this issue.

it will be better to also check the rows of the returned result.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*the number of rows of the returned result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. But this is just the rewriter test case for query plan check. We can add e2e result check in presto-native-execution in a future PR.

@wraymo wraymo requested a review from anlowee September 5, 2025 19:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)

35-40: Add null-safety for injected dependencies

Guard constructor params with requireNonNull to fail fast on DI misconfiguration.

+import static java.util.Objects.requireNonNull;
@@
-    public ClpPlanOptimizerProvider(FunctionMetadataManager functionManager, StandardFunctionResolution functionResolution, ClpSplitFilterProvider splitFilterProvider)
+    public ClpPlanOptimizerProvider(FunctionMetadataManager functionManager, StandardFunctionResolution functionResolution, ClpSplitFilterProvider splitFilterProvider)
     {
-        this.functionManager = functionManager;
-        this.functionResolution = functionResolution;
-        this.splitFilterProvider = splitFilterProvider;
+        this.functionManager = requireNonNull(functionManager, "functionManager is null");
+        this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");
+        this.splitFilterProvider = requireNonNull(splitFilterProvider, "splitFilterProvider is null");
     }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (3)

272-291: Bug: metadata SQL uses Optional.toString() and misses parentheses.

"NOT " + expression.getMetadataSqlQuery() yields NOT Optional[...]. Also, wrap in parentheses to preserve precedence.

Apply:

-        if (expression.getMetadataSqlQuery().isPresent()) {
-            return new ClpExpression(notPushDownExpression, "NOT " + expression.getMetadataSqlQuery());
-        }
+        if (expression.getMetadataSqlQuery().isPresent()) {
+            return new ClpExpression(notPushDownExpression, "NOT (" + expression.getMetadataSqlQuery().get() + ")");
+        }

764-791: IN: escape string literals.

String values are emitted without escaping, which breaks when they contain ", \, *, ?.

Apply:

-            if (literal.getType() instanceof VarcharType) {
-                queryBuilder.append("\"").append(literalString).append("\"");
-            }
+            if (literal.getType() instanceof VarcharType) {
+                queryBuilder.append("\"")
+                        .append(escapeKqlSpecialCharsForStringValue(literalString))
+                        .append("\"");
+            }

306-343: Escape quotes and backslashes in LIKE patterns

Refactor handleLike so that after translating %→* and _→?, you escape only backslashes and double-quotes (preserving * and ? as wildcards):

-        pattern = pattern.replace("%", "*").replace("_", "?");
-        return new ClpExpression(format("%s: \"%s\"", variableName, pattern));
+        pattern = pattern.replace("%", "*").replace("_", "?");
+        // Only escape characters that break quoted KQL patterns; keep * and ? as wildcards
+        String escapedPattern = pattern.replace("\\", "\\\\").replace("\"", "\\\"");
+        return new ClpExpression(format("%s: \"%s\"", variableName, escapedPattern));
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 644f77b and 080ab18.

📒 Files selected for processing (2)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (22 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
  • ClpExpression (29-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)

14-14: Wiring confirmed — no old package references remain
All imports and usages updated to com.facebook.presto.plugin.clp.optimization.ClpPlanOptimizerProvider; ClpUdfRewriter and ClpComputePushDown resolve correctly.

Comment on lines +90 to 91
* <li><code>CLP_GET_*</code> UDFs.</li>
* </ul>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Docs claim CLP_GET_ support, but no handling is present here.*

This visitor doesn’t recognise CLP_GET_* calls; it will fall back to new ClpExpression(node) for such CallExpressions. If pushdown relies entirely on an upstream UDF rewriter, clarify that here (and in tests), or add minimal handling to recognise residual CLP_GET_* calls.

Would you like me to add a small handler that treats already-rewritten CLP_GET_* as variables and otherwise bails out to remaining expression?

Comment on lines 111 to 121
public ClpFilterToKqlConverter(
StandardFunctionResolution standardFunctionResolution,
FunctionMetadataManager functionMetadataManager,
Map<VariableReferenceExpression, ColumnHandle> assignments,
Set<String> metadataFilterColumns)
{
this.standardFunctionResolution = requireNonNull(standardFunctionResolution, "standardFunctionResolution is null");
this.functionMetadataManager = requireNonNull(functionMetadataManager, "function metadata manager is null");
this.assignments = requireNonNull(assignments, "assignments is null");
this.metadataFilterColumns = requireNonNull(metadataFilterColumns, "metadataFilterColumns is null");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Constructor signature changed—ensure all call sites updated.

New args: (StandardFunctionResolution, FunctionMetadataManager, Map assignments, Set metadataFilterColumns). Please verify all instantiations were updated.

Run:


🏁 Script executed:

#!/bin/bash
# Find all usages and show arguments context
rg -n -C2 -P --type=java '\bnew\s+ClpFilterToKqlConverter\s*\('

Length of output: 1490


Update all ClpFilterToKqlConverter instantiations to include the new parameters

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java:119 – add the assignments and metadataFilterColumns arguments.
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:301 – pass both assignments and metadataFilterColumns to the constructor.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 111-121, the constructor signature now requires two additional
parameters (assignments and metadataFilterColumns); update all instantiations
accordingly: in
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
at the constructor call around line 119, pass the current variable holding
assignments and a Set<String> for metadataFilterColumns (or an existing metadata
set), and in
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
at the test instantiation around line 301, supply appropriate test values (e.g.,
an empty map and empty set or the test’s existing assignments/metadata
collection) so the constructor receives both
Map<VariableReferenceExpression,ColumnHandle> assignments and Set<String>
metadataFilterColumns.

Comment on lines 156 to 161
@Override
public ClpExpression visitVariableReference(VariableReferenceExpression node, Map<VariableReferenceExpression, ColumnHandle> context)
public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context)
{
return new ClpExpression(getVariableName(node, context));
return new ClpExpression(getVariableName(node));
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid NPE when a variable has no CLP assignment.

assignments.get(node) can be null or a non-CLP handle. Safely fall back to “remaining expression” to prevent accidental pushdown with an invalid field name.

Apply:

 @Override
-    public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context)
-    {
-        return new ClpExpression(getVariableName(node));
-    }
+    public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context)
+    {
+        ColumnHandle handle = assignments.get(node);
+        if (handle instanceof ClpColumnHandle) {
+            return new ClpExpression(((ClpColumnHandle) handle).getOriginalColumnName());
+        }
+        // No CLP mapping: not pushdownable
+        return new ClpExpression(node);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public ClpExpression visitVariableReference(VariableReferenceExpression node, Map<VariableReferenceExpression, ColumnHandle> context)
public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context)
{
return new ClpExpression(getVariableName(node, context));
return new ClpExpression(getVariableName(node));
}
@Override
public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context)
{
ColumnHandle handle = assignments.get(node);
if (handle instanceof ClpColumnHandle) {
return new ClpExpression(((ClpColumnHandle) handle).getOriginalColumnName());
}
// No CLP mapping: not pushdownable
return new ClpExpression(node);
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 156-161, the visitVariableReference currently always returns a
ClpExpression created from the variable name; instead, first look up
assignments.get(node) and if it is null or not the expected CLP-assignment type,
return the "remaining expression" fallback to avoid NPEs and accidental pushdown
with an invalid field name; if the assignment is present and is the expected CLP
handle, return that mapped CLP expression.

Comment on lines +208 to 211
private String getVariableName(VariableReferenceExpression variable)
{
return ((ClpColumnHandle) context.get(variable)).getOriginalColumnName();
return ((ClpColumnHandle) assignments.get(variable)).getOriginalColumnName();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

getVariableName now redundant (and unsafe).

Given the change above, this method is unused and dereferences without null/type checks. Remove it or harden it if you prefer to keep the call in visitVariableReference.

I can push a follow-up diff to delete it and tidy imports.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 208-211, the getVariableName(VariableReferenceExpression) method is
now redundant and unsafe because it dereferences assignments.get(variable)
without null/type checks and is unused; remove this method and clean up imports,
or if you prefer to keep it, replace it with a safe implementation that checks
assignments.get(variable) for null and that the result is an instance of
ClpColumnHandle, returning a sensible fallback or throwing a clear, documented
exception; also update any caller (e.g., visitVariableReference) to use the safe
retrieval or inline the safe logic so no unchecked cast or NPE can occur.

Comment on lines +45 to 46
return ImmutableSet.of(new ClpUdfRewriter(functionManager));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Avoid per-call allocations by caching optimizers

These are stateless; build once and return cached sets.

 public class ClpPlanOptimizerProvider
         implements ConnectorPlanOptimizerProvider
 {
     private final FunctionMetadataManager functionManager;
     private final StandardFunctionResolution functionResolution;
     private final ClpSplitFilterProvider splitFilterProvider;
+    private final Set<ConnectorPlanOptimizer> logicalOptimizers;
+    private final Set<ConnectorPlanOptimizer> physicalOptimizers;
@@
-        this.splitFilterProvider = splitFilterProvider;
+        this.splitFilterProvider = splitFilterProvider;
+        this.logicalOptimizers = ImmutableSet.of(new ClpUdfRewriter(this.functionManager));
+        this.physicalOptimizers = ImmutableSet.of(new ClpComputePushDown(this.functionManager, this.functionResolution, this.splitFilterProvider));
@@
     @Override
     public Set<ConnectorPlanOptimizer> getLogicalPlanOptimizers()
     {
-        return ImmutableSet.of(new ClpUdfRewriter(functionManager));
+        return logicalOptimizers;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return ImmutableSet.of(new ClpUdfRewriter(functionManager));
}
public class ClpPlanOptimizerProvider
implements ConnectorPlanOptimizerProvider
{
private final FunctionMetadataManager functionManager;
private final StandardFunctionResolution functionResolution;
private final ClpSplitFilterProvider splitFilterProvider;
private final Set<ConnectorPlanOptimizer> logicalOptimizers;
private final Set<ConnectorPlanOptimizer> physicalOptimizers;
public ClpPlanOptimizerProvider(
FunctionMetadataManager functionManager,
StandardFunctionResolution functionResolution,
ClpSplitFilterProvider splitFilterProvider)
{
this.functionManager = functionManager;
this.functionResolution = functionResolution;
this.splitFilterProvider = splitFilterProvider;
this.logicalOptimizers = ImmutableSet.of(
new ClpUdfRewriter(this.functionManager)
);
this.physicalOptimizers = ImmutableSet.of(
new ClpComputePushDown(
this.functionManager,
this.functionResolution,
this.splitFilterProvider
)
);
}
@Override
public Set<ConnectorPlanOptimizer> getLogicalPlanOptimizers()
{
return logicalOptimizers;
}
// … you can similarly update getPhysicalPlanOptimizers() to return physicalOptimizers …
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
around lines 45-46, avoid allocating a new ImmutableSet on every call by
creating and caching the optimizer set once: add a private final ImmutableSet
field initialized in the provider constructor (e.g. ImmutableSet.of(new
ClpUdfRewriter(functionManager))) and then have the getter simply return that
cached field; ensure initialization uses the same functionManager passed to the
constructor and keep the field final for thread-safety.

Comment on lines 49 to 52
public Set<ConnectorPlanOptimizer> getPhysicalPlanOptimizers()
{
return ImmutableSet.of(new ClpPlanOptimizer(functionManager, functionResolution, splitFilterProvider));
return ImmutableSet.of(new ClpComputePushDown(functionManager, functionResolution, splitFilterProvider));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Planner phase ordering and symbol constraints — please verify

UDF rewrite must occur before pushdown (logical vs physical phases should ensure this). Given the open concern about special characters in column symbols, verify that ClpUdfRewriter emits names acceptable to Velox (or properly escapes them) so rewritten symbols won’t be rejected downstream. Consider a focused test covering $, @, and other non-underscore characters.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
around lines 49-52, the ClpComputePushDown is currently registered at the
physical plan optimizer phase which risks running before the UDF rewrite and
producing symbol names Velox will reject; update registration so the
ClpUdfRewriter runs in a logical-plan optimizer phase prior to any pushdown (or
move the pushdown registration to run after logical rewrites), and update
ClpUdfRewriter to produce Velox-safe symbol names by either escaping/quoting
special characters or normalizing to alphanumeric plus underscores (follow Velox
naming/escaping conventions used elsewhere), then add focused unit tests
asserting rewrites for column names containing $, @ and other non-underscore
characters are emitted and accepted (or properly escaped) and that optimizer
ordering ensures rewrite happens before pushdown.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (8)

153-156: DRY the rewrite + pushdown pipeline

The same 4-step optimise sequence is repeated in three tests. Extract a helper to reduce duplication and ease future changes.

Apply:

@@
     public void tearDown()
     {
         localQueryRunner.close();
         ClpMetadataDbSetUp.tearDown(dbHandle);
     }
 
+    private PlanNode optimizeThroughUdfAndPushdown(Plan plan, Session session)
+    {
+        ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
+        PlanNode rewritten = udfRewriter.optimize(
+                plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        ClpComputePushDown pushDown = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
+        return pushDown.optimize(rewritten, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+    }
@@
-        ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
-        PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
-        ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
-        optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session);
@@
-        ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
-        PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
-        ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
-        optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session);
@@
-        ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
-        PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
-        ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
-        optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session);

Also applies to: 192-196, 241-245


167-171: Avoid asserting exact KQL string formatting

Exact KQL string equality is brittle to harmless formatting/parentheses reordering. Prefer semantic checks (presence of required conjuncts) via a matcher that validates substrings rather than the full string literal.


203-216: Make projection assertions alias-agnostic

The assertions depend on specific encoded symbol names (e.g., user_und_id, is_uxholiday). If encoding or de-duping changes, these will break. Capture symbols via matcher aliases or use anySymbol()/anyExpression-style matchers instead of hard-coded names.


253-257: Same: avoid relying on encoded symbol names in expressions

Referencing city_dot__uxname ties the test to the current encoding. Use alias captures (via the TableScan matcher’s produced aliases) or alias-agnostic matchers to assert LOWER is applied to the symbol that corresponds to CLP_GET_STRING('city.Name').


268-306: Guard Optional.get() and fail gracefully

detailMatches calls get() on TableHandle.getLayout() without checking presence; this will throw instead of returning NO_MATCH.

Apply:

@@
-            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get();
+            Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout();
+            if (!layout.isPresent()) {
+                return NO_MATCH;
+            }
+            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();
@@
-            // Check layout handle
-            if (!expectedLayoutHandle.equals(actualLayoutHandle)) {
-                return NO_MATCH;
-            }
+            // Check layout handle (consider relaxing to semantic checks; see separate comment)
+            if (!expectedLayoutHandle.equals(actualLayoutHandle)) {
+                return NO_MATCH;
+            }

142-179: Add negative and edge-case coverage for UDF usage and paths

Missing tests for:

  • Non-constant UDF arg (e.g., CLP_GET_STRING(city.Name)) → expect failure or no rewrite.
  • Multiple args to CLP_GET_* → expect failure.
  • Special characters in JSON paths ($, @, quotes, brackets, unicode). Today’s encoder doesn’t normalise these; Presto symbols may reject them.
  • Duplicate path across differing CLP_GET_* types (e.g., INT vs STRING) and dedup semantics.

I can draft targeted tests that run the rewrite+pushdown and assert exception or plan shape. Want me to add them here?


167-178: Loosen layout equality to semantic checks (optional but recommended)

Instead of object equality on ClpTableLayoutHandle, compare:

  • Table handle equality.
  • Optional presence of KQL/metadata SQL.
  • Each required conjunct present in the actual KQL string (e.g., user_id: 0, fare < 50.0, city: "SF", isHoliday: true).

This avoids churn from harmless formatting changes.

Also applies to: 301-306


231-266: Add join + Filter→Project→TableScan scenarios

  • Multi-table JOIN with per-scan CLP filters to validate independent table scopes and required filter enforcement.
  • A case where Filter sits above a Project above the scan, to ensure pushdown still triggers (or document ordering assumptions). This aligns with prior guidance to compute scope per immediate TableScan.

I can add two concise tests exercising both shapes using the existing harness.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 080ab18 and 13bf115.

📒 Files selected for processing (1)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
🧬 Code graph analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
  • ClpComputePushDown (47-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
  • ClpUdfRewriter (58-306)
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java (1)
  • LocalQueryRunner (304-1248)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
  • ClpMetadataDbSetUp (45-264)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (1)

121-133: Test harness initialization looks solid

LocalQueryRunner/catalog wiring + CLP functions registration is correct and isolated via setUp/tearDown. Nothing to change here.

Comment on lines +308 to +314
// Check assignments contain expected columns
Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments();
Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values());

if (!expectedColumns.equals(actualColumns)) {
return NO_MATCH;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Reduce brittleness: check expected ⊆ actual columns

Planner may legally project extra columns. Require that actual contains at least the expected set instead of strict equality.

-            if (!expectedColumns.equals(actualColumns)) {
+            if (!actualColumns.containsAll(expectedColumns)) {
                 return NO_MATCH;
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Check assignments contain expected columns
Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments();
Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values());
if (!expectedColumns.equals(actualColumns)) {
return NO_MATCH;
}
// Check assignments contain expected columns
Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments();
Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values());
if (!actualColumns.containsAll(expectedColumns)) {
return NO_MATCH;
}
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 308 to 314, the test currently asserts exact equality between
expectedColumns and actualColumns which is brittle because the planner may add
extra projected columns; change the check to ensure expectedColumns is a subset
of actualColumns (i.e., verify actualColumns.containsAll(expectedColumns)) so
the test passes when additional legal columns are present while still enforcing
required columns are present.

anlowee
anlowee previously approved these changes Sep 5, 2025
@anlowee
Copy link

anlowee commented Sep 5, 2025

For the PR title how about:

feat: Add `CLP_GET_*` UDFs with rewrites for schemaless querying.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java (1)

37-43: Prefer “double” over “floating point” in the description

Aligns with StandardTypes.DOUBLE and avoids ambiguity.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (5)

148-156: DRY: factor the rewrite+pushdown pipeline into a helper

Reduce duplication and make tests easier to evolve alongside optimiser changes.

@@
         ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
-        PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
-        ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
-        optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session);
@@
-        ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
-        PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
-        ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
-        optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session);
@@
-        ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
-        PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
-        ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
-        optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session);
+
+    private PlanNode optimizeThroughUdfAndPushdown(Plan plan, Session session)
+    {
+        ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager);
+        PlanNode rewritten = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+        ClpComputePushDown pushDown = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider);
+        return pushDown.optimize(rewritten, session.toConnectorSession(), variableAllocator, planNodeIdAllocator);
+    }

Also applies to: 187-195, 241-244


301-302: Guard Optional before get() to avoid brittle NPEs

Matcher should fail gracefully when layout is absent.

-            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get();
+            Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout();
+            if (!layout.isPresent()) {
+                return NO_MATCH;
+            }
+            ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();

312-314: Allow planner to project extra columns (subset check instead of equality)

Exact set equality is brittle; assert expected ⊆ actual.

-            if (!expectedColumns.equals(actualColumns)) {
+            if (!actualColumns.containsAll(expectedColumns)) {
                 return NO_MATCH;
             }

303-306: Loosen layout comparison to semantic checks (table + required conjuncts)

Equality on the full ClpTableLayoutHandle (and exact KQL string/parentheses) is fragile. Prefer: assert same table handle, Optional present, and that the KQL contains required conjunct substrings; do similar for metadata SQL if present. If getters are available on ClpTableLayoutHandle, switch to them; otherwise, minimally compare by parts rather than whole-object equality.

Would you like me to draft a matcher variant that compares by table handle and verifies KQL/metadata fragments via substring membership?


142-179: Add coverage for special characters and invalid UDF usages

Given encodeJsonPath currently only normalises '.', '_' and uppercase, add tests covering paths with $, @, spaces, brackets, unicode; also add negative tests for non-constant args and wrong arity to assert the rewriter throws CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION (or leaves calls unrewritten and fails predictably).

I can add targeted tests modelled on these three methods to cover special characters and invalid usages. Want a patch?

Also applies to: 231-266

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 13bf115 and d501fb5.

📒 Files selected for processing (2)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
🧬 Code graph analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java (1)
presto-common/src/main/java/com/facebook/presto/common/type/StandardTypes.java (1)
  • StandardTypes (22-80)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
  • ClpComputePushDown (47-175)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
  • ClpUdfRewriter (58-306)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (33-155)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (50-171)
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java (1)
  • LocalQueryRunner (304-1248)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
  • ClpMetadataDbSetUp (45-264)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
  • ClpMetadata (57-206)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build

Comment on lines +16 to +22
import com.facebook.presto.common.block.Block;
import com.facebook.presto.common.type.StandardTypes;
import com.facebook.presto.spi.function.Description;
import com.facebook.presto.spi.function.ScalarFunction;
import com.facebook.presto.spi.function.SqlType;
import io.airlift.slice.Slice;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use PrestoException with NOT_SUPPORTED for clearer user-facing errors

Throwing UnsupportedOperationException surfaces as an internal error to users. Prefer a PrestoException with NOT_SUPPORTED and a clear action message.

 import com.facebook.presto.common.block.Block;
 import com.facebook.presto.common.type.StandardTypes;
 import com.facebook.presto.spi.function.Description;
 import com.facebook.presto.spi.function.ScalarFunction;
 import com.facebook.presto.spi.function.SqlType;
 import io.airlift.slice.Slice;
+import com.facebook.presto.spi.PrestoException;
+import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import com.facebook.presto.common.block.Block;
import com.facebook.presto.common.type.StandardTypes;
import com.facebook.presto.spi.function.Description;
import com.facebook.presto.spi.function.ScalarFunction;
import com.facebook.presto.spi.function.SqlType;
import io.airlift.slice.Slice;
import com.facebook.presto.common.block.Block;
import com.facebook.presto.common.type.StandardTypes;
import com.facebook.presto.spi.function.Description;
import com.facebook.presto.spi.function.ScalarFunction;
import com.facebook.presto.spi.function.SqlType;
import io.airlift.slice.Slice;
import com.facebook.presto.spi.PrestoException;
import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED;
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
around lines 16 to 22, the code currently throws UnsupportedOperationException
which surfaces as an internal error; replace those throws with throwing a
PrestoException using StandardErrorCode.NOT_SUPPORTED (NOT_SUPPORTED) and a
clear, actionable message for the user. Add the necessary imports for
PrestoException and StandardErrorCode.NOT_SUPPORTED, and when a function is not
implemented return throw new PrestoException(NOT_SUPPORTED, "Feature X is not
supported; please use Y or file a ticket"), substituting a concise description
of the unsupported operation and suggested action.

Comment on lines +23 to +28
public final class ClpFunctions
{
private ClpFunctions()
{
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Centralise placeholder throwing and improve messages

Consolidate the placeholder exception and use consistent, actionable messages so accidental execution fails clearly.

 public final class ClpFunctions
 {
     private ClpFunctions()
     {
     }
 
+    private static RuntimeException placeholderInvocation(String name)
+    {
+        return new PrestoException(
+                NOT_SUPPORTED,
+                name + " must be rewritten by the CLP query rewriter and should not be executed at runtime.");
+    }
+
     @ScalarFunction(value = "CLP_GET_BIGINT", deterministic = false)
     @Description("Retrieves an integer value corresponding to the given JSON path.")
     @SqlType(StandardTypes.BIGINT)
     public static long clpGetBigint(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
     {
-        throw new UnsupportedOperationException("CLP_GET_BIGINT is a placeholder function without implementation.");
+        throw placeholderInvocation("CLP_GET_BIGINT");
     }
 
     @ScalarFunction(value = "CLP_GET_DOUBLE", deterministic = false)
-    @Description("Retrieves a floating point value corresponding to the given JSON path.")
+    @Description("Retrieves a double value corresponding to the given JSON path.")
     @SqlType(StandardTypes.DOUBLE)
     public static double clpGetDouble(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
     {
-        throw new UnsupportedOperationException("CLP_GET_DOUBLE is a placeholder function without implementation.");
+        throw placeholderInvocation("CLP_GET_DOUBLE");
     }
 
     @ScalarFunction(value = "CLP_GET_BOOL", deterministic = false)
     @Description("Retrieves a boolean value corresponding to the given JSON path.")
     @SqlType(StandardTypes.BOOLEAN)
     public static boolean clpGetBool(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
     {
-        throw new UnsupportedOperationException("CLP_GET_BOOL is a placeholder function without implementation.");
+        throw placeholderInvocation("CLP_GET_BOOL");
     }
 
     @ScalarFunction(value = "CLP_GET_STRING", deterministic = false)
     @Description("Retrieves a string value corresponding to the given JSON path.")
     @SqlType(StandardTypes.VARCHAR)
     public static Slice clpGetString(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
     {
-        throw new UnsupportedOperationException("CLP_GET_STRING is a placeholder function without implementation.");
+        throw placeholderInvocation("CLP_GET_STRING");
     }
 
     @ScalarFunction(value = "CLP_GET_STRING_ARRAY", deterministic = false)
     @Description("Retrieves an array value corresponding to the given JSON path and converts each element into a string.")
-    @SqlType("ARRAY(VARCHAR)")
+    @SqlType("array(varchar)")
     public static Block clpGetStringArray(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
     {
-        throw new UnsupportedOperationException("CLP_GET_STRING_ARRAY is a placeholder function without implementation.");
+        throw placeholderInvocation("CLP_GET_STRING_ARRAY");
     }
 }

Also applies to: 29-36, 37-44, 45-52, 53-60, 61-67

🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
around lines 23-28 (and similarly for 29-36, 37-44, 45-52, 53-60, 61-67),
replace ad-hoc placeholder throws with a single centralized helper that always
throws a clear, consistent exception message; add a private static method (e.g.,
failPlaceholderExecution()) that throws an UnsupportedOperationException with a
descriptive actionable message like "CLP function placeholder - should not be
executed at runtime; ensure function wiring or implementation is provided" and
call that helper from each placeholder location so all placeholders fail
uniformly and with an actionable message.

Comment on lines +61 to +64
@ScalarFunction(value = "CLP_GET_STRING_ARRAY", deterministic = false)
@Description("Retrieves an array value corresponding to the given JSON path and converts each element into a string.")
@SqlType("ARRAY(VARCHAR)")
public static Block clpGetStringArray(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Normalise type literal casing in @SqlType

Presto annotations typically use lower-case type literals (array(varchar)). The parser is case-insensitive, but lower-case improves consistency.

🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
around lines 61 to 64, the @SqlType annotation uses an upper-case type literal
"ARRAY(VARCHAR)"; change it to the normalized lower-case form "array(varchar)"
to match Presto annotation style and improve consistency.

Comment on lines +121 to +133
localQueryRunner = new LocalQueryRunner(defaultSession);
localQueryRunner.createCatalog("clp", new ClpConnectorFactory(), ImmutableMap.of(
"clp.metadata-db-url", format(METADATA_DB_URL_TEMPLATE, dbHandle.getDbPath()),
"clp.metadata-db-user", METADATA_DB_USER,
"clp.metadata-db-password", METADATA_DB_PASSWORD,
"clp.metadata-table-prefix", METADATA_DB_TABLE_PREFIX));
localQueryRunner.getMetadata().registerBuiltInFunctions(extractFunctions(new ClpPlugin().getFunctions()));
functionAndTypeManager = localQueryRunner.getMetadata().getFunctionAndTypeManager();
functionResolution = new FunctionResolution(functionAndTypeManager.getFunctionAndTypeResolver());
splitFilterProvider = new ClpMySqlSplitFilterProvider(new ClpConfig());
planNodeIdAllocator = new PlanNodeIdAllocator();
variableAllocator = new VariableAllocator();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Note on dependency alignment for reviewers

Tests rely on planner behaviours that may differ with Velox/submodule bumps. Ensure the PR branch points to the Velox commit that includes duplicate-projection fixes mentioned in the PR discussion to avoid false negatives during review.

If helpful, I can open a follow-up task to sync the Velox pointer and re-run this suite.

Comment on lines +203 to +217
project(
ImmutableMap.of(
"clp_get_bigint",
PlanMatchPattern.expression("user_und_id"),
"clp_get_double",
PlanMatchPattern.expression("fare"),
"clp_get_string",
PlanMatchPattern.expression("user"),
"clp_get_bool",
PlanMatchPattern.expression("is_uxholiday"),
"clp_get_string_array",
PlanMatchPattern.expression("tags"),
"expr",
PlanMatchPattern.expression("city.Name")),
ClpTableScanMatcher.clpTableScanPattern(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Avoid coupling tests to internal variable-name encoding

Assertions that reference encoded names like user_und_id and is_uxholiday will break if encoding rules change (e.g., to handle $, @). Prefer alias-agnostic matchers (e.g., anySymbol/anyExpression) or assert presence by column handle set only; bind symbols via the TableScan’s output ordering rather than exact names.

I can rewrite these project assertions to be alias-agnostic while still verifying the expected symbols flow through. Proceed?

🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 203-217, the test is coupling assertions to encoded variable names
like user_und_id and is_uxholiday which can break if encoding rules change;
replace those literal name checks with alias-agnostic matchers (e.g.,
anySymbol()/anyExpression()) or match by the TableScan’s output ordering/column
handle set instead: bind the expected expressions to positions rather than
specific symbol names, use PlanMatchPattern.anySymbol()/anyExpression() where
the actual alias is irrelevant, and/or assert the presence and order of column
handles from ClpTableScanMatcher so the test verifies data flow without
depending on name encoding.

@anlowee
Copy link

anlowee commented Sep 6, 2025

For the PR title how about:

feat: Add `CLP_GET_*` UDFs with rewrites for schemaless querying.

Don't forget modify the PR title before merging

@wraymo wraymo changed the title feat: Add CLP_GET_* functions with query rewriting support feat: Add CLP_GET_* UDFs with rewrites for schemaless querying. Sep 6, 2025
@wraymo wraymo merged commit 0fc04bd into y-scope:release-0.293-clp-connector Sep 6, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants