-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add CLP_GET_* UDFs with rewrites for schemaless querying.
#42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add CLP_GET_* UDFs with rewrites for schemaless querying.
#42
Conversation
WalkthroughAdds CLP scalar function stubs and registers them; introduces a CLP UDF rewriter and a compute-pushdown optimizer (renamed and relocated); refactors filter-to-KQL conversion to store assignments as a field; wires optimizers into the provider; expands tests and test dependencies; updates Velox submodule pointer. Changes
Sequence Diagram(s)sequenceDiagram
participant Planner as QueryPlanner
participant UDF as ClpUdfRewriter
participant Compute as ClpComputePushDown
participant Converter as ClpFilterToKqlConverter
participant Scan as TableScanNode
Planner->>UDF: optimize(plan)
Note right of UDF #E9F5FF: Rewrite CLP_GET_* calls\ncreate ClpColumnHandles & variables
UDF-->>Planner: plan with injected variables/columns
Planner->>Compute: optimize(plan)
Compute->>Converter: convert filter (uses assignments field)
Note right of Converter #FFF4E6: derive kqlQuery and metadataFilter
alt kqlQuery or metadataFilter present
Compute->>Scan: replace TableScan with updated layout & assignments
Compute-->>Planner: plan with pushed-down KQL/metadata and updated TableScan
alt remaining predicate exists
Planner->>Planner: add FilterNode above updated TableScan
end
else
Compute-->>Planner: return original plan
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
| throw new UnsupportedOperationException("CLP_GET_INT is a placeholder function without implementation."); | ||
| } | ||
|
|
||
| @ScalarFunction(value = "CLP_GET_FLOAT", deterministic = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just find that in our RFC:
we propose adding a handful of
CLP_GET_<TYPE>(<json-path>)UDFs, where<TYPE>refers to the Presto type of the column and <json_path> refers to the column's JSON path.
So the should be a Presto type. Shall we rename this UDF to CLP_GET_DOUBLE then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe we should update the RFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a table in RFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we have the table. I mean the description in the RFC that I quoted is not precise what it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If that's the case, we also need to rename CLP_GET_INT to CLP_GET_BIGINT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to rename this CLP_GET_INT to CLP_GET_BIGINT?
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
Show resolved
Hide resolved
|
Can you use the Velox commit: 7c55762bc37edc3f93cb8f5f1ff625e06cc10deb (which is used by current main branch)? |
Need to update the commit hash to support duplicate columns in projection, otherwise you will run into the bugs you mentioned before. |
I think |
|
No, there's also a change in Velox, not just CLP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
68-71: Per-table mandatory-filter check is incorrect (false negatives on joins)Using a single boolean
hasVisitedFilterskips the required-filter enforcement for other tables when any one Filter was processed. Track and validate per table scope instead.Apply:
@@ - private final Set<String> tableScopeSet; - private boolean hasVisitedFilter; + private final Set<String> tableScopeSet; + // Tables for which we actually processed a Filter directly above the scan + private final Set<String> tablesWithFilter; @@ - hasVisitedFilter = false; - tableScopeSet = new HashSet<>(); + tableScopeSet = new HashSet<>(); + tablesWithFilter = new HashSet<>(); @@ - // Throw exception if any required split filters are missing - if (!rewriter.tableScopeSet.isEmpty() && !rewriter.hasVisitedFilter) { - splitFilterProvider.checkContainsRequiredFilters(rewriter.tableScopeSet, ""); - } + // Enforce required split filters on tables that had no qualifying Filter processed + if (!rewriter.tableScopeSet.isEmpty()) { + Set<String> tablesMissingFilter = new HashSet<>(rewriter.tableScopeSet); + tablesMissingFilter.removeAll(rewriter.tablesWithFilter); + if (!tablesMissingFilter.isEmpty()) { + splitFilterProvider.checkContainsRequiredFilters(tablesMissingFilter, ""); + } + }And mark a table as handled in processFilter (see separate diff below).
♻️ Duplicate comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4)
101-106: Don’t short‑circuit traversal; still walk children when Filter isn’t directly on TableScanReturning
nodeprevents visiting its source (e.g., Filter -> Project -> TableScan), causing missed scopes and pushdown opportunities.- if (!(node.getSource() instanceof TableScanNode)) { - return node; - } + if (!(node.getSource() instanceof TableScanNode)) { + return context.defaultRewrite(node); + }Consider also supporting the intervening Project (or ensure optimizer ordering guarantees Filter is adjacent to TableScan).
129-136: Avoid unnecessary validation when no metadata filter is presentOnly call
checkContainsRequiredFilterswhenmetadataSqlQueryis present and non-empty. Keeps behaviour aligned with metadata‑only requirements and saves work.- // Perform required metadata filter checks before handling the KQL query (if kqlQuery - // isn't present, we'll return early, skipping subsequent checks). - splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), metadataSqlQuery.orElse("")); - boolean hasMetadataFilter = metadataSqlQuery.isPresent() && !metadataSqlQuery.get().isEmpty(); - if (hasMetadataFilter) { + boolean hasMetadataFilter = metadataSqlQuery.isPresent() && !metadataSqlQuery.get().isEmpty(); + if (hasMetadataFilter) { + // Perform required metadata filter checks only when we actually have a metadata filter + splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), metadataSqlQuery.get()); metadataSqlQuery = Optional.of(splitFilterProvider.remapSplitFilterPushDownExpression(tableScope, metadataSqlQuery.get())); log.debug("Metadata SQL query: %s", metadataSqlQuery.get()); }
115-116: Nit: prefer format over string concatenation and avoid redundant toString()Minor readability and consistency improvement.
- String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString(); + String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName());
110-116: Mark table as “handled” after deriving its scopeMove the “handled” marker to a point where
tableScopeis available.- hasVisitedFilter = true; - TableHandle tableHandle = tableScanNode.getTable(); ClpTableHandle clpTableHandle = (ClpTableHandle) tableHandle.getConnectorHandle(); - String tableScope = CONNECTOR_NAME + "." + clpTableHandle.getSchemaTableName().toString(); + String tableScope = format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName()); + tablesWithFilter.add(tableScope);presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
74-90: Fix Javadoc typo and improve type referencesCorrect “correh hsponding” and format referenced types.
- * @return a map from column handle to its correh hsponding scan-level variable + * @return a map from {@code ColumnHandle} to its corresponding scan-level variablepresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2)
181-229: Add tests for additional UDFs and alias-agnostic assertionsCover CLP_GET_FLOAT, CLP_GET_BOOL, CLP_GET_STRING_ARRAY in filters and projects, and prefer alias-agnostic assertions to reduce brittleness.
142-179: Add error/edge-case tests for UDFs and special charactersStrengthen coverage:
- Non-constant and multi-argument CLP_GET_* → expect PrestoException.
- Special characters in JSON paths (
$,@,-, spaces) → ensure encoded variable names don’t break planning.- Duplicate paths across different CLP_GET_* calls → dedupe behaviour.
I can provide concrete test methods mirroring the existing patterns.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
🧬 Code graph analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (3)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
PrestoException(18-56)presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h (1)
RowExpression(394-416)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
Rewriter(75-174)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
ClpComputePushDown(47-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
ClpUdfRewriter(58-306)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(33-155)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(50-171)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
ClpMetadataDbSetUp(45-264)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
ClpExpression(29-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: prestissimo-worker-images-build
| private class Rewriter | ||
| extends ConnectorPlanRewriter<Void> | ||
| { | ||
| private final PlanNodeIdAllocator idAllocator; | ||
| private final VariableAllocator variableAllocator; | ||
| private final Map<ColumnHandle, VariableReferenceExpression> globalColumnVarMap; | ||
|
|
||
| public Rewriter( | ||
| PlanNodeIdAllocator idAllocator, | ||
| VariableAllocator variableAllocator, | ||
| Map<ColumnHandle, VariableReferenceExpression> globalColumnVarMap) | ||
| { | ||
| this.idAllocator = idAllocator; | ||
| this.variableAllocator = variableAllocator; | ||
| this.globalColumnVarMap = globalColumnVarMap; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Global column→variable map can produce symbol collisions across multiple scans
globalColumnVarMap is shared for the whole plan; identical CLP paths/types on different scans will reuse the same VariableReferenceExpression, risking duplicate producers and ambiguous symbols when branches merge.
Recommended directions:
- Scope mappings per TableScan (e.g.,
Map<PlanNodeId, Map<ColumnHandle, VariableReferenceExpression>>) and consult the mapping of the current subtree only. - Or ensure per-scan uniqueness by suffixing variable names with a scan-specific prefix/suffix.
- Add a join test: SELECT … FROM t1 JOIN t2 ON … WHERE CLP_GET_INT('user_id') on both sides; verify unique symbols and correct scan assignments.
I can draft a scoped rewrite that tracks “current scan” via a stack during traversal if you want.
🏁 Script executed:
#!/bin/bash
set -e
FILE=$(fd ClpUdfRewriter.java | head -n 1)
echo "Using file: $FILE"
rg -n 'class Rewriter' -C5 "$FILE"
rg -n 'globalColumnVarMap' -C3 "$FILE"
rg -n 'visitTableScan' -C3 "$FILE"
rg -n 'new Rewriter' -C3 .Length of output: 2873
🏁 Script executed:
#!/bin/bash
# Locate ClpColumnHandle.java, print class definition and equals/hashCode
file=$(fd ClpColumnHandle.java | head -n1)
echo "File: $file"
# Show class declaration and relevant fields
sed -n '1,200p' "$file"
# Search for equals() and hashCode() implementations
grep -R "boolean equals" -n "$file" || echo "No equals() found"
grep -R "hashCode" -n "$file" || echo "No hashCode() found"Length of output: 3633
Scope column→variable mapping per scan
globalColumnVarMap is shared across all TableScans; identical ClpColumnHandle(jsonPath,type) on different scans reuses the same VariableReferenceExpression, leading to duplicate producers and ambiguous symbols in joins. Scope mappings per scan (e.g. Map<PlanNodeId,Map<ColumnHandle,VariableReferenceExpression>>) or append scan-specific prefixes, and add a join test (e.g. SELECT … FROM t1 JOIN t2 … WHERE CLP_GET_INT('user_id')) to verify unique symbols.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
around lines 102–117, the Rewriter currently uses a single globalColumnVarMap
shared across all TableScans which causes identical
ClpColumnHandle(jsonPath,type) instances from different scans to reuse the same
VariableReferenceExpression (producing duplicate producers and ambiguous symbols
in joins); change the mapping to be scoped per scan (for example use
Map<PlanNodeId, Map<ColumnHandle, VariableReferenceExpression>> or include a
scan-specific prefix when creating variable names keyed by PlanNodeId) so that
each TableScan gets its own column→variable map; update all places that
read/write globalColumnVarMap to use the per-scan map (create the inner map at
scan entry, lookup by PlanNodeId when allocating/rewriting variables), ensure
generated VariableReferenceExpression names include the scan id to avoid
collisions, and add a join integration test (e.g. SELECT … FROM t1 JOIN t2 …
WHERE CLP_GET_INT('user_id')) to verify symbols are unique across scans.
| if (expression instanceof CallExpression) { | ||
| CallExpression call = (CallExpression) expression; | ||
| String functionName = functionManager.getFunctionMetadata(call.getFunctionHandle()).getName().getObjectName().toUpperCase(); | ||
|
|
||
| if (functionName.startsWith("CLP_GET_")) { | ||
| if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) { | ||
| throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION, | ||
| "CLP_GET_* UDF must have a single constant string argument"); | ||
| } | ||
|
|
||
| ConstantExpression constant = (ConstantExpression) call.getArguments().get(0); | ||
| String jsonPath = ((Slice) constant.getValue()).toStringUtf8(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validate argument type is VARCHAR and improve error message
Be explicit on expected type; produce actionable error.
- if (functionName.startsWith("CLP_GET_")) {
- if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) {
- throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
- "CLP_GET_* UDF must have a single constant string argument");
- }
+ if (functionName.startsWith("CLP_GET_")) {
+ if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) {
+ throw new PrestoException(
+ CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
+ String.format("CLP_GET_* must have exactly one constant string argument; got %d",
+ call.getArguments().size()));
+ }
- ConstantExpression constant = (ConstantExpression) call.getArguments().get(0);
+ ConstantExpression constant = (ConstantExpression) call.getArguments().get(0);
+ if (!constant.getType().equals(com.facebook.presto.common.type.VarcharType.VARCHAR)) {
+ throw new PrestoException(
+ CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
+ "CLP_GET_* argument must be a VARCHAR literal");
+ }
String jsonPath = ((Slice) constant.getValue()).toStringUtf8();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (expression instanceof CallExpression) { | |
| CallExpression call = (CallExpression) expression; | |
| String functionName = functionManager.getFunctionMetadata(call.getFunctionHandle()).getName().getObjectName().toUpperCase(); | |
| if (functionName.startsWith("CLP_GET_")) { | |
| if (call.getArguments().size() != 1 || !(call.getArguments().get(0) instanceof ConstantExpression)) { | |
| throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION, | |
| "CLP_GET_* UDF must have a single constant string argument"); | |
| } | |
| ConstantExpression constant = (ConstantExpression) call.getArguments().get(0); | |
| String jsonPath = ((Slice) constant.getValue()).toStringUtf8(); | |
| if (expression instanceof CallExpression) { | |
| CallExpression call = (CallExpression) expression; | |
| String functionName = functionManager.getFunctionMetadata(call.getFunctionHandle()) | |
| .getName() | |
| .getObjectName() | |
| .toUpperCase(); | |
| if (functionName.startsWith("CLP_GET_")) { | |
| if (call.getArguments().size() != 1 | |
| || !(call.getArguments().get(0) instanceof ConstantExpression)) { | |
| throw new PrestoException( | |
| CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION, | |
| String.format( | |
| "CLP_GET_* must have exactly one constant string argument; got %d", | |
| call.getArguments().size())); | |
| } | |
| ConstantExpression constant = (ConstantExpression) call.getArguments().get(0); | |
| if (!constant.getType().equals(com.facebook.presto.common.type.VarcharType.VARCHAR)) { | |
| throw new PrestoException( | |
| CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION, | |
| "CLP_GET_* argument must be a VARCHAR literal"); | |
| } | |
| String jsonPath = ((Slice) constant.getValue()).toStringUtf8(); |
| private String encodeJsonPath(String jsonPath) | ||
| { | ||
| StringBuilder sb = new StringBuilder(); | ||
| for (char c : jsonPath.toCharArray()) { | ||
| if (Character.isUpperCase(c)) { | ||
| sb.append("_ux").append(Character.toLowerCase(c)); | ||
| } | ||
| else if (c == '.') { | ||
| sb.append("_dot_"); | ||
| } | ||
| else if (c == '_') { | ||
| sb.append("_und_"); | ||
| } | ||
| else { | ||
| sb.append(c); | ||
| } | ||
| } | ||
| return sb.toString(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Robustly encode variable names: handle special characters, leading digits, and length
Current encoding leaves $, @, -, spaces, etc., which can break downstream parsing. Also no length bound.
- private String encodeJsonPath(String jsonPath)
- {
- StringBuilder sb = new StringBuilder();
- for (char c : jsonPath.toCharArray()) {
- if (Character.isUpperCase(c)) {
- sb.append("_ux").append(Character.toLowerCase(c));
- }
- else if (c == '.') {
- sb.append("_dot_");
- }
- else if (c == '_') {
- sb.append("_und_");
- }
- else {
- sb.append(c);
- }
- }
- return sb.toString();
- }
+ private String encodeJsonPath(String jsonPath)
+ {
+ // Allow only [a-z0-9_]; encode everything else as _xHH_ (hex), and normalise case.
+ final int MAX_LEN = 255;
+ StringBuilder sb = new StringBuilder(jsonPath.length() * 2);
+ for (char c : jsonPath.toCharArray()) {
+ char lc = Character.toLowerCase(c);
+ if ((lc >= 'a' && lc <= 'z') || (lc >= '0' && lc <= '9') || lc == '_') {
+ sb.append(lc);
+ }
+ else if (c == '.') {
+ sb.append("_dot_");
+ }
+ else if (c == '_') {
+ sb.append("_und_");
+ }
+ else if (Character.isUpperCase(c)) {
+ sb.append("_ux").append(Character.toLowerCase(c));
+ }
+ else {
+ sb.append(String.format("_x%02x_", (int) c));
+ }
+ }
+ // Ensure leading char is not a digit
+ if (sb.length() > 0 && Character.isDigit(sb.charAt(0))) {
+ sb.insert(0, "v_");
+ }
+ String encoded = sb.toString();
+ if (encoded.length() > MAX_LEN) {
+ // Truncate and add a stable hash suffix to avoid collisions
+ int keep = MAX_LEN - 9; // room for _hxxxxxx
+ int hash = Math.abs(encoded.hashCode());
+ encoded = encoded.substring(0, keep) + String.format("_h%06x", hash & 0xFFFFFF);
+ }
+ return encoded;
+ }Consider also encoding characters like @ and $ explicitly in tests to validate this path.
Committable suggestion skipped: line range outside the PR's diff.
| private TableScanNode buildNewTableScanNode(TableScanNode node) | ||
| { | ||
| Set<VariableReferenceExpression> outputVars = new LinkedHashSet<>(node.getOutputVariables()); | ||
| Map<VariableReferenceExpression, ColumnHandle> newAssignments = new HashMap<>(node.getAssignments()); | ||
|
|
||
| // Add any missing variables for known handles | ||
| globalColumnVarMap.forEach((handle, var) -> { | ||
| outputVars.add(var); | ||
| newAssignments.put(var, handle); | ||
| }); | ||
|
|
||
| return new TableScanNode( | ||
| node.getSourceLocation(), | ||
| idAllocator.getNextId(), | ||
| node.getTable(), | ||
| new ArrayList<>(outputVars), | ||
| newAssignments, | ||
| node.getTableConstraints(), | ||
| node.getCurrentConstraint(), | ||
| node.getEnforcedConstraint(), | ||
| node.getCteMaterializationInfo()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Limit added assignments to only variables used in the current scan subtree
Adding all entries from globalColumnVarMap to every scan inflates outputs and I/O. After scoping the map per scan (previous comment), add only those handles required by this scan.
As an interim guard, filter by connector/table identity to avoid cross-table pollution.
🏁 Script executed:
#!/bin/bash
# Locate the ClpUdfRewriter.java file
FILE=$(find . -type f -name 'ClpUdfRewriter.java' | head -n1)
echo "Found file: $FILE"
# Show declaration and usages of globalColumnVarMap
grep -n "globalColumnVarMap" "$FILE"
grep -n "private.*globalColumnVarMap" "$FILE"Length of output: 1190
Filter assignments to only columns used by this scan
In buildNewTableScanNode (lines 270–291), don’t unconditionally add every entry from globalColumnVarMap. Instead, replace the globalColumnVarMap.forEach(…) block with a filtered loop that only includes handles belonging to node.getTable() (e.g.
globalColumnVarMap.entrySet().stream()
.filter(e -> e.getKey().getTableHandle().equals(node.getTable()))
.forEach(e -> {
outputVars.add(e.getValue());
newAssignments.put(e.getValue(), e.getKey());
});) to avoid inflating outputs and cross-table pollution.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
around lines 270 to 291, the current globalColumnVarMap.forEach unconditionally
adds every global column variable to the new TableScanNode causing cross-table
pollution; replace that block with a filtered iteration that only adds entries
whose ColumnHandle belongs to the same table as node.getTable() (i.e. test
handle.getTableHandle().equals(node.getTable()) before adding), adding only the
matching var->handle pairs to outputVars and newAssignments so the scan only
exposes columns for its own table.
| @Test | ||
| public void testScanProjectFilter() | ||
| { | ||
| TransactionId transactionId = localQueryRunner.getTransactionManager().beginTransaction(false); | ||
| Session session = testSessionBuilder().setCatalog("clp").setSchema("default").setTransactionId(transactionId).build(); | ||
|
|
||
| Plan plan = localQueryRunner.createPlan( | ||
| session, | ||
| "SELECT LOWER(city.Name), LOWER(CLP_GET_STRING('user')) from test WHERE CLP_GET_INT('user_id') = 0 AND LOWER(city.Name) = 'BEIJING'", | ||
| WarningCollector.NOOP); | ||
| ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); | ||
| PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); | ||
| ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); | ||
| optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); | ||
|
|
||
| PlanAssert.assertPlan( | ||
| session, | ||
| localQueryRunner.getMetadata(), | ||
| (node, sourceStats, lookup, s, types) -> PlanNodeStatsEstimate.unknown(), | ||
| new Plan(optimizedPlan, plan.getTypes(), StatsAndCosts.empty()), | ||
| anyTree( | ||
| project( | ||
| ImmutableMap.of( | ||
| "lower", | ||
| PlanMatchPattern.expression("lower(city.Name)"), | ||
| "lower_0", | ||
| PlanMatchPattern.expression("lower(user)")), | ||
| filter( | ||
| expression("lower(city.Name) = 'BEIJING'"), | ||
| ClpTableScanMatcher.clpTableScanPattern( | ||
| new ClpTableLayoutHandle(table, Optional.of("(user_id: 0)"), Optional.empty()), | ||
| ImmutableSet.of( | ||
| new ClpColumnHandle("user", VARCHAR), | ||
| new ClpColumnHandle("user_id", BIGINT), | ||
| city)))))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add tests for Filter→Project→TableScan and multi-table join scenarios
- Validate pushdown when Filter is not directly on TableScan (or confirm optimizer ordering).
- Validate per-table mandatory filter enforcement and symbol uniqueness when two scans use the same JSON path.
I can draft these plans using LocalQueryRunner if desired.
🏁 Script executed:
#!/bin/bash
set -e
# Search for SQL JOIN usage in TestClpUdfRewriter.java
rg -n "SELECT .*JOIN" -C3 presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java || echo "No SQL join queries found"
# Search for any occurrences of 'join' (case-insensitive) in TestClpUdfRewriter.java
rg -n -i "join" -C3 presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java || echo "No 'join' occurrences found"
# Search for filter→project→tableScan sequence in test matchers
rg -nP "filter\\s*\\(.*project" -C3 presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java || echo "No filter→project pattern found"Length of output: 537
Add tests for Filter→Project→TableScan ordering and multi-table joins
- Cover a case where a Filter sits above a Project above a TableScan to confirm pushdown applies when Filter isn’t directly on the scan.
- Add a join test with two TableScans over the same JSON path to verify per-table mandatory filters and unique symbol generation.
🤖 Prompt for AI Agents
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 231 to 266, add two tests: one for Filter→Project→TableScan
ordering and one for multi-table join with same JSON path. For the
Filter→Project→TableScan test, construct a query where a FILTER sits above a
PROJECT above the CLP table scan (e.g. SELECT LOWER(col) FROM test WHERE
CLP_GET_INT('id') = 0 AND LOWER(col) = 'X'), run the same plan creation,
ClpUdfRewriter and ClpComputePushDown passes, and assert the optimized plan
contains a Project over a Filter over a ClpTableScan with the pushdown predicate
represented in ClpTableLayoutHandle (Optional.of("(id: 0)")) and both projected
symbols present. For the multi-table join test, create a query joining the same
JSON path table twice (e.g. FROM test t1 JOIN test t2 ON ... with
CLP_GET_INT('id') filters on each side), run the same rewrite/optimize flow, and
assert the plan shows two ClpTableScan matchers each with their own
ClpTableLayoutHandle containing the correct per-table mandatory filter and that
symbols/column handles are distinct (unique names) to ensure per-table pushdown
isolation.
| private static final class ClpTableScanMatcher | ||
| implements Matcher | ||
| { | ||
| private final ClpTableLayoutHandle expectedLayoutHandle; | ||
| private final Set<ColumnHandle> expectedColumns; | ||
|
|
||
| private ClpTableScanMatcher(ClpTableLayoutHandle expectedLayoutHandle, Set<ColumnHandle> expectedColumns) | ||
| { | ||
| this.expectedLayoutHandle = expectedLayoutHandle; | ||
| this.expectedColumns = expectedColumns; | ||
| } | ||
|
|
||
| static PlanMatchPattern clpTableScanPattern(ClpTableLayoutHandle layoutHandle, Set<ColumnHandle> columns) | ||
| { | ||
| return node(TableScanNode.class).with(new ClpTableScanMatcher(layoutHandle, columns)); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean shapeMatches(PlanNode node) | ||
| { | ||
| return node instanceof TableScanNode; | ||
| } | ||
|
|
||
| @Override | ||
| public MatchResult detailMatches( | ||
| PlanNode node, | ||
| StatsProvider stats, | ||
| Session session, | ||
| Metadata metadata, | ||
| SymbolAliases symbolAliases) | ||
| { | ||
| checkState(shapeMatches(node), "Plan testing framework error: shapeMatches returned false"); | ||
| TableScanNode tableScanNode = (TableScanNode) node; | ||
| ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get(); | ||
|
|
||
| // Check layout handle | ||
| if (!expectedLayoutHandle.equals(actualLayoutHandle)) { | ||
| return NO_MATCH; | ||
| } | ||
|
|
||
| // Check assignments contain expected columns | ||
| Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments(); | ||
| Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values()); | ||
|
|
||
| if (!expectedColumns.equals(actualColumns)) { | ||
| return NO_MATCH; | ||
| } | ||
|
|
||
| SymbolAliases.Builder aliasesBuilder = SymbolAliases.builder(); | ||
| for (VariableReferenceExpression variable : tableScanNode.getOutputVariables()) { | ||
| aliasesBuilder.put(variable.getName(), new SymbolReference(variable.getName())); | ||
| } | ||
|
|
||
| return match(aliasesBuilder.build()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Matcher: Optional.get without presence check
If a layout isn’t present, get() will throw. Not an issue in current tests but easy to harden.
- ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get();
+ Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout();
+ if (!layout.isPresent()) {
+ return NO_MATCH;
+ }
+ ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private static final class ClpTableScanMatcher | |
| implements Matcher | |
| { | |
| private final ClpTableLayoutHandle expectedLayoutHandle; | |
| private final Set<ColumnHandle> expectedColumns; | |
| private ClpTableScanMatcher(ClpTableLayoutHandle expectedLayoutHandle, Set<ColumnHandle> expectedColumns) | |
| { | |
| this.expectedLayoutHandle = expectedLayoutHandle; | |
| this.expectedColumns = expectedColumns; | |
| } | |
| static PlanMatchPattern clpTableScanPattern(ClpTableLayoutHandle layoutHandle, Set<ColumnHandle> columns) | |
| { | |
| return node(TableScanNode.class).with(new ClpTableScanMatcher(layoutHandle, columns)); | |
| } | |
| @Override | |
| public boolean shapeMatches(PlanNode node) | |
| { | |
| return node instanceof TableScanNode; | |
| } | |
| @Override | |
| public MatchResult detailMatches( | |
| PlanNode node, | |
| StatsProvider stats, | |
| Session session, | |
| Metadata metadata, | |
| SymbolAliases symbolAliases) | |
| { | |
| checkState(shapeMatches(node), "Plan testing framework error: shapeMatches returned false"); | |
| TableScanNode tableScanNode = (TableScanNode) node; | |
| ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get(); | |
| // Check layout handle | |
| if (!expectedLayoutHandle.equals(actualLayoutHandle)) { | |
| return NO_MATCH; | |
| } | |
| // Check assignments contain expected columns | |
| Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments(); | |
| Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values()); | |
| if (!expectedColumns.equals(actualColumns)) { | |
| return NO_MATCH; | |
| } | |
| SymbolAliases.Builder aliasesBuilder = SymbolAliases.builder(); | |
| for (VariableReferenceExpression variable : tableScanNode.getOutputVariables()) { | |
| aliasesBuilder.put(variable.getName(), new SymbolReference(variable.getName())); | |
| } | |
| return match(aliasesBuilder.build()); | |
| } | |
| } | |
| @Override | |
| public MatchResult detailMatches( | |
| PlanNode node, | |
| StatsProvider stats, | |
| Session session, | |
| Metadata metadata, | |
| SymbolAliases symbolAliases) | |
| { | |
| checkState(shapeMatches(node), "Plan testing framework error: shapeMatches returned false"); | |
| TableScanNode tableScanNode = (TableScanNode) node; | |
| Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout(); | |
| if (!layout.isPresent()) { | |
| return NO_MATCH; | |
| } | |
| ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get(); | |
| // Check layout handle | |
| if (!expectedLayoutHandle.equals(actualLayoutHandle)) { | |
| return NO_MATCH; | |
| } | |
| // Check assignments contain expected columns | |
| Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments(); | |
| Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values()); | |
| if (!expectedColumns.equals(actualColumns)) { | |
| return NO_MATCH; | |
| } | |
| SymbolAliases.Builder aliasesBuilder = SymbolAliases.builder(); | |
| for (VariableReferenceExpression variable : tableScanNode.getOutputVariables()) { | |
| aliasesBuilder.put(variable.getName(), new SymbolReference(variable.getName())); | |
| } | |
| return match(aliasesBuilder.build()); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 268 to 323, the matcher calls get() on the optional layout without
checking presence which can throw; modify the matcher to guard the optional
before calling get() — e.g., if the table's layout Optional is empty return
NO_MATCH (or assert presence with a clear failure message), and only then
retrieve the layout; do the same if any other Optional may be absent so the
matcher fails gracefully instead of throwing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (4)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (4)
197-216: Make project assertions alias-agnostic to decouple from encoded variable namesAsserting "user_und_id", "is_uxholiday", etc., ties tests to the current encoding scheme. Prefer alias-agnostic matchers (e.g., anySymbol) or assert that projected expressions reference the expected scan variables/ColumnHandles rather than exact names.
268-323: Guard Optional layout before get() to avoid NPE in matcher
tableScanNode.getTable().getLayout().get()can throw when no layout is set. Make the matcher fail gracefully.Apply this diff:
@@ - ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get(); + Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout(); + if (!layout.isPresent()) { + return NO_MATCH; + } + ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();
142-179: Add negative and special-character path testsCover error paths for: non-constant args (e.g., CLP_GET_STRING(city.Name)), wrong arity, and JSON paths with special characters like '$' or '@'. Also test duplicate-path dedup semantics across multiple CLP_GET_* calls.
I can add minimal tests expecting PrestoException for invalid usages and a behavioural test pinning current handling of special characters. Want me to draft them?
103-133: Add join and per-table scope tests (join isolation for pushdown)Leveraging the retrieved learning about determining scope from each Filter’s immediate TableScan, add a join test with two scans and independent CLP_GET_* filters to ensure per-table mandatory filter enforcement and symbol isolation.
Would you like me to sketch a LocalQueryRunner-based join test asserting two distinct ClpTableScan matchers each with its own mandatory filter?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.
Applied to files:
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
🧬 Code graph analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
ClpComputePushDown(47-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
ClpUdfRewriter(58-306)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(33-155)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(50-171)presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java (1)
LocalQueryRunner(304-1248)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
ClpMetadataDbSetUp(45-264)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2)
148-157: Solid end-to-end coverage of UDF rewrite + pushdown pathThis validates rewriting for INT/FLOAT/STRING/BOOL/STRING_ARRAY and confirms residuals remain above the scan. Good.
231-266: Good: covers Project-over-Filter-over-Scan orderingThis ensures pushdown applies even when a Project sits between Filter and TableScan and validates residual predicate handling.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
Show resolved
Hide resolved
| new ClpTableLayoutHandle( | ||
| table, | ||
| Optional.of( | ||
| "(((user_id: 0 AND fare < 50.0) AND (city: \"SF\" AND isHoliday: true)))"), | ||
| Optional.empty()), | ||
| ImmutableSet.of( | ||
| city, | ||
| fare, | ||
| isHoliday, | ||
| new ClpColumnHandle("user_id", BIGINT), | ||
| new ClpColumnHandle("city", VARCHAR), | ||
| new ClpColumnHandle("tags", new ArrayType(VARCHAR))))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Optionally loosen layout equality to semantic checks
If ClpTableLayoutHandle exposes getters, compare table handle equality and assert kqlQuery Optional presence plus substring membership of required conjuncts instead of strict object equality. This will reduce future churn from benign formatting changes.
Also applies to: 303-306
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 167-178 (and similarly at 303-306), the test currently asserts
strict equality on ClpTableLayoutHandle which is brittle to formatting; instead,
use the handle's getters to assert the table handle equals the expected table,
assert that the kqlQuery Optional is present, and verify the kqlQuery string
contains the required conjunct substrings (e.g., "user_id: 0", "fare < 50.0",
"city: \"SF\"", "isHoliday: true") rather than comparing the whole object
string; update the assertions accordingly so they check semantic membership and
presence instead of exact object equality.
🧹 Nitpick (assertive)
Reduce brittleness: avoid asserting exact KQL string formatting
The equality on a fully materialized KQL string makes the test fragile to harmless reformatting/parenthesis changes. Prefer asserting presence of required conjuncts (e.g., contains "user_id: 0", "fare < 50.0", 'city: "SF"', "isHoliday: true") and that Optional is present, rather than full-string equality. Consider enhancing ClpTableScanMatcher to compare by parts.
|
|
||
| Plan plan = localQueryRunner.createPlan( | ||
| session, | ||
| "SELECT LOWER(city.Name), LOWER(CLP_GET_STRING('user')) from test WHERE CLP_GET_INT('user_id') = 0 AND LOWER(city.Name) = 'BEIJING'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace the LOWER(city.Name) in the SELECT with LOWER(CLP_GET_STRING(city.Name)) to test the case when user uses both UDF and accesses the field directly?
Also does the LOWER(city.Name) = 'BEIJING' condition make sense? Do you mean UPPER or beijing? Actually there are other places above also have this issue.
it will be better to also check the rows of the returned result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*the number of rows of the returned result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. But this is just the rewriter test case for query plan check. We can add e2e result check in presto-native-execution in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)
35-40: Add null-safety for injected dependenciesGuard constructor params with requireNonNull to fail fast on DI misconfiguration.
+import static java.util.Objects.requireNonNull; @@ - public ClpPlanOptimizerProvider(FunctionMetadataManager functionManager, StandardFunctionResolution functionResolution, ClpSplitFilterProvider splitFilterProvider) + public ClpPlanOptimizerProvider(FunctionMetadataManager functionManager, StandardFunctionResolution functionResolution, ClpSplitFilterProvider splitFilterProvider) { - this.functionManager = functionManager; - this.functionResolution = functionResolution; - this.splitFilterProvider = splitFilterProvider; + this.functionManager = requireNonNull(functionManager, "functionManager is null"); + this.functionResolution = requireNonNull(functionResolution, "functionResolution is null"); + this.splitFilterProvider = requireNonNull(splitFilterProvider, "splitFilterProvider is null"); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (3)
272-291: Bug: metadata SQL uses Optional.toString() and misses parentheses.
"NOT " + expression.getMetadataSqlQuery()yieldsNOT Optional[...]. Also, wrap in parentheses to preserve precedence.Apply:
- if (expression.getMetadataSqlQuery().isPresent()) { - return new ClpExpression(notPushDownExpression, "NOT " + expression.getMetadataSqlQuery()); - } + if (expression.getMetadataSqlQuery().isPresent()) { + return new ClpExpression(notPushDownExpression, "NOT (" + expression.getMetadataSqlQuery().get() + ")"); + }
764-791: IN: escape string literals.String values are emitted without escaping, which breaks when they contain
",\,*,?.Apply:
- if (literal.getType() instanceof VarcharType) { - queryBuilder.append("\"").append(literalString).append("\""); - } + if (literal.getType() instanceof VarcharType) { + queryBuilder.append("\"") + .append(escapeKqlSpecialCharsForStringValue(literalString)) + .append("\""); + }
306-343: Escape quotes and backslashes in LIKE patternsRefactor
handleLikeso that after translating%→*and_→?, you escape only backslashes and double-quotes (preserving*and?as wildcards):- pattern = pattern.replace("%", "*").replace("_", "?"); - return new ClpExpression(format("%s: \"%s\"", variableName, pattern)); + pattern = pattern.replace("%", "*").replace("_", "?"); + // Only escape characters that break quoted KQL patterns; keep * and ? as wildcards + String escapedPattern = pattern.replace("\\", "\\\\").replace("\"", "\\\""); + return new ClpExpression(format("%s: \"%s\"", variableName, escapedPattern));
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java(22 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
ClpExpression(29-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)
14-14: Wiring confirmed — no old package references remain
All imports and usages updated to com.facebook.presto.plugin.clp.optimization.ClpPlanOptimizerProvider; ClpUdfRewriter and ClpComputePushDown resolve correctly.
| * <li><code>CLP_GET_*</code> UDFs.</li> | ||
| * </ul> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Docs claim CLP_GET_ support, but no handling is present here.*
This visitor doesn’t recognise CLP_GET_* calls; it will fall back to new ClpExpression(node) for such CallExpressions. If pushdown relies entirely on an upstream UDF rewriter, clarify that here (and in tests), or add minimal handling to recognise residual CLP_GET_* calls.
Would you like me to add a small handler that treats already-rewritten CLP_GET_* as variables and otherwise bails out to remaining expression?
| public ClpFilterToKqlConverter( | ||
| StandardFunctionResolution standardFunctionResolution, | ||
| FunctionMetadataManager functionMetadataManager, | ||
| Map<VariableReferenceExpression, ColumnHandle> assignments, | ||
| Set<String> metadataFilterColumns) | ||
| { | ||
| this.standardFunctionResolution = requireNonNull(standardFunctionResolution, "standardFunctionResolution is null"); | ||
| this.functionMetadataManager = requireNonNull(functionMetadataManager, "function metadata manager is null"); | ||
| this.assignments = requireNonNull(assignments, "assignments is null"); | ||
| this.metadataFilterColumns = requireNonNull(metadataFilterColumns, "metadataFilterColumns is null"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Constructor signature changed—ensure all call sites updated.
New args: (StandardFunctionResolution, FunctionMetadataManager, Map assignments, Set metadataFilterColumns). Please verify all instantiations were updated.
Run:
🏁 Script executed:
#!/bin/bash
# Find all usages and show arguments context
rg -n -C2 -P --type=java '\bnew\s+ClpFilterToKqlConverter\s*\('Length of output: 1490
Update all ClpFilterToKqlConverter instantiations to include the new parameters
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java:119 – add the
assignmentsandmetadataFilterColumnsarguments. - presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:301 – pass both
assignmentsandmetadataFilterColumnsto the constructor.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 111-121, the constructor signature now requires two additional
parameters (assignments and metadataFilterColumns); update all instantiations
accordingly: in
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
at the constructor call around line 119, pass the current variable holding
assignments and a Set<String> for metadataFilterColumns (or an existing metadata
set), and in
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
at the test instantiation around line 301, supply appropriate test values (e.g.,
an empty map and empty set or the test’s existing assignments/metadata
collection) so the constructor receives both
Map<VariableReferenceExpression,ColumnHandle> assignments and Set<String>
metadataFilterColumns.
| @Override | ||
| public ClpExpression visitVariableReference(VariableReferenceExpression node, Map<VariableReferenceExpression, ColumnHandle> context) | ||
| public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context) | ||
| { | ||
| return new ClpExpression(getVariableName(node, context)); | ||
| return new ClpExpression(getVariableName(node)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid NPE when a variable has no CLP assignment.
assignments.get(node) can be null or a non-CLP handle. Safely fall back to “remaining expression” to prevent accidental pushdown with an invalid field name.
Apply:
@Override
- public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context)
- {
- return new ClpExpression(getVariableName(node));
- }
+ public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context)
+ {
+ ColumnHandle handle = assignments.get(node);
+ if (handle instanceof ClpColumnHandle) {
+ return new ClpExpression(((ClpColumnHandle) handle).getOriginalColumnName());
+ }
+ // No CLP mapping: not pushdownable
+ return new ClpExpression(node);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public ClpExpression visitVariableReference(VariableReferenceExpression node, Map<VariableReferenceExpression, ColumnHandle> context) | |
| public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context) | |
| { | |
| return new ClpExpression(getVariableName(node, context)); | |
| return new ClpExpression(getVariableName(node)); | |
| } | |
| @Override | |
| public ClpExpression visitVariableReference(VariableReferenceExpression node, Void context) | |
| { | |
| ColumnHandle handle = assignments.get(node); | |
| if (handle instanceof ClpColumnHandle) { | |
| return new ClpExpression(((ClpColumnHandle) handle).getOriginalColumnName()); | |
| } | |
| // No CLP mapping: not pushdownable | |
| return new ClpExpression(node); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 156-161, the visitVariableReference currently always returns a
ClpExpression created from the variable name; instead, first look up
assignments.get(node) and if it is null or not the expected CLP-assignment type,
return the "remaining expression" fallback to avoid NPEs and accidental pushdown
with an invalid field name; if the assignment is present and is the expected CLP
handle, return that mapped CLP expression.
| private String getVariableName(VariableReferenceExpression variable) | ||
| { | ||
| return ((ClpColumnHandle) context.get(variable)).getOriginalColumnName(); | ||
| return ((ClpColumnHandle) assignments.get(variable)).getOriginalColumnName(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
getVariableName now redundant (and unsafe).
Given the change above, this method is unused and dereferences without null/type checks. Remove it or harden it if you prefer to keep the call in visitVariableReference.
I can push a follow-up diff to delete it and tidy imports.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 208-211, the getVariableName(VariableReferenceExpression) method is
now redundant and unsafe because it dereferences assignments.get(variable)
without null/type checks and is unused; remove this method and clean up imports,
or if you prefer to keep it, replace it with a safe implementation that checks
assignments.get(variable) for null and that the result is an instance of
ClpColumnHandle, returning a sensible fallback or throwing a clear, documented
exception; also update any caller (e.g., visitVariableReference) to use the safe
retrieval or inline the safe logic so no unchecked cast or NPE can occur.
| return ImmutableSet.of(new ClpUdfRewriter(functionManager)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Avoid per-call allocations by caching optimizers
These are stateless; build once and return cached sets.
public class ClpPlanOptimizerProvider
implements ConnectorPlanOptimizerProvider
{
private final FunctionMetadataManager functionManager;
private final StandardFunctionResolution functionResolution;
private final ClpSplitFilterProvider splitFilterProvider;
+ private final Set<ConnectorPlanOptimizer> logicalOptimizers;
+ private final Set<ConnectorPlanOptimizer> physicalOptimizers;
@@
- this.splitFilterProvider = splitFilterProvider;
+ this.splitFilterProvider = splitFilterProvider;
+ this.logicalOptimizers = ImmutableSet.of(new ClpUdfRewriter(this.functionManager));
+ this.physicalOptimizers = ImmutableSet.of(new ClpComputePushDown(this.functionManager, this.functionResolution, this.splitFilterProvider));
@@
@Override
public Set<ConnectorPlanOptimizer> getLogicalPlanOptimizers()
{
- return ImmutableSet.of(new ClpUdfRewriter(functionManager));
+ return logicalOptimizers;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return ImmutableSet.of(new ClpUdfRewriter(functionManager)); | |
| } | |
| public class ClpPlanOptimizerProvider | |
| implements ConnectorPlanOptimizerProvider | |
| { | |
| private final FunctionMetadataManager functionManager; | |
| private final StandardFunctionResolution functionResolution; | |
| private final ClpSplitFilterProvider splitFilterProvider; | |
| private final Set<ConnectorPlanOptimizer> logicalOptimizers; | |
| private final Set<ConnectorPlanOptimizer> physicalOptimizers; | |
| public ClpPlanOptimizerProvider( | |
| FunctionMetadataManager functionManager, | |
| StandardFunctionResolution functionResolution, | |
| ClpSplitFilterProvider splitFilterProvider) | |
| { | |
| this.functionManager = functionManager; | |
| this.functionResolution = functionResolution; | |
| this.splitFilterProvider = splitFilterProvider; | |
| this.logicalOptimizers = ImmutableSet.of( | |
| new ClpUdfRewriter(this.functionManager) | |
| ); | |
| this.physicalOptimizers = ImmutableSet.of( | |
| new ClpComputePushDown( | |
| this.functionManager, | |
| this.functionResolution, | |
| this.splitFilterProvider | |
| ) | |
| ); | |
| } | |
| @Override | |
| public Set<ConnectorPlanOptimizer> getLogicalPlanOptimizers() | |
| { | |
| return logicalOptimizers; | |
| } | |
| // … you can similarly update getPhysicalPlanOptimizers() to return physicalOptimizers … | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
around lines 45-46, avoid allocating a new ImmutableSet on every call by
creating and caching the optimizer set once: add a private final ImmutableSet
field initialized in the provider constructor (e.g. ImmutableSet.of(new
ClpUdfRewriter(functionManager))) and then have the getter simply return that
cached field; ensure initialization uses the same functionManager passed to the
constructor and keep the field final for thread-safety.
| public Set<ConnectorPlanOptimizer> getPhysicalPlanOptimizers() | ||
| { | ||
| return ImmutableSet.of(new ClpPlanOptimizer(functionManager, functionResolution, splitFilterProvider)); | ||
| return ImmutableSet.of(new ClpComputePushDown(functionManager, functionResolution, splitFilterProvider)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Planner phase ordering and symbol constraints — please verify
UDF rewrite must occur before pushdown (logical vs physical phases should ensure this). Given the open concern about special characters in column symbols, verify that ClpUdfRewriter emits names acceptable to Velox (or properly escapes them) so rewritten symbols won’t be rejected downstream. Consider a focused test covering $, @, and other non-underscore characters.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
around lines 49-52, the ClpComputePushDown is currently registered at the
physical plan optimizer phase which risks running before the UDF rewrite and
producing symbol names Velox will reject; update registration so the
ClpUdfRewriter runs in a logical-plan optimizer phase prior to any pushdown (or
move the pushdown registration to run after logical rewrites), and update
ClpUdfRewriter to produce Velox-safe symbol names by either escaping/quoting
special characters or normalizing to alphanumeric plus underscores (follow Velox
naming/escaping conventions used elsewhere), then add focused unit tests
asserting rewrites for column names containing $, @ and other non-underscore
characters are emitted and accepted (or properly escaped) and that optimizer
ordering ensures rewrite happens before pushdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (8)
153-156: DRY the rewrite + pushdown pipelineThe same 4-step optimise sequence is repeated in three tests. Extract a helper to reduce duplication and ease future changes.
Apply:
@@ public void tearDown() { localQueryRunner.close(); ClpMetadataDbSetUp.tearDown(dbHandle); } + private PlanNode optimizeThroughUdfAndPushdown(Plan plan, Session session) + { + ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); + PlanNode rewritten = udfRewriter.optimize( + plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + ClpComputePushDown pushDown = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); + return pushDown.optimize(rewritten, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + } @@ - ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); - PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); - ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); - optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session); @@ - ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); - PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); - ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); - optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session); @@ - ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); - PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); - ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); - optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session);Also applies to: 192-196, 241-245
167-171: Avoid asserting exact KQL string formattingExact KQL string equality is brittle to harmless formatting/parentheses reordering. Prefer semantic checks (presence of required conjuncts) via a matcher that validates substrings rather than the full string literal.
203-216: Make projection assertions alias-agnosticThe assertions depend on specific encoded symbol names (e.g., user_und_id, is_uxholiday). If encoding or de-duping changes, these will break. Capture symbols via matcher aliases or use anySymbol()/anyExpression-style matchers instead of hard-coded names.
253-257: Same: avoid relying on encoded symbol names in expressionsReferencing city_dot__uxname ties the test to the current encoding. Use alias captures (via the TableScan matcher’s produced aliases) or alias-agnostic matchers to assert LOWER is applied to the symbol that corresponds to CLP_GET_STRING('city.Name').
268-306: Guard Optional.get() and fail gracefullydetailMatches calls get() on TableHandle.getLayout() without checking presence; this will throw instead of returning NO_MATCH.
Apply:
@@ - ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get(); + Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout(); + if (!layout.isPresent()) { + return NO_MATCH; + } + ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get(); @@ - // Check layout handle - if (!expectedLayoutHandle.equals(actualLayoutHandle)) { - return NO_MATCH; - } + // Check layout handle (consider relaxing to semantic checks; see separate comment) + if (!expectedLayoutHandle.equals(actualLayoutHandle)) { + return NO_MATCH; + }
142-179: Add negative and edge-case coverage for UDF usage and pathsMissing tests for:
- Non-constant UDF arg (e.g., CLP_GET_STRING(city.Name)) → expect failure or no rewrite.
- Multiple args to CLP_GET_* → expect failure.
- Special characters in JSON paths ($, @, quotes, brackets, unicode). Today’s encoder doesn’t normalise these; Presto symbols may reject them.
- Duplicate path across differing CLP_GET_* types (e.g., INT vs STRING) and dedup semantics.
I can draft targeted tests that run the rewrite+pushdown and assert exception or plan shape. Want me to add them here?
167-178: Loosen layout equality to semantic checks (optional but recommended)Instead of object equality on ClpTableLayoutHandle, compare:
- Table handle equality.
- Optional presence of KQL/metadata SQL.
- Each required conjunct present in the actual KQL string (e.g., user_id: 0, fare < 50.0, city: "SF", isHoliday: true).
This avoids churn from harmless formatting changes.
Also applies to: 301-306
231-266: Add join + Filter→Project→TableScan scenarios
- Multi-table JOIN with per-scan CLP filters to validate independent table scopes and required filter enforcement.
- A case where Filter sits above a Project above the scan, to ensure pushdown still triggers (or document ordering assumptions). This aligns with prior guidance to compute scope per immediate TableScan.
I can add two concise tests exercising both shapes using the existing harness.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.
Applied to files:
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
🧬 Code graph analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
ClpComputePushDown(47-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
ClpUdfRewriter(58-306)presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java (1)
LocalQueryRunner(304-1248)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
ClpMetadataDbSetUp(45-264)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (1)
121-133: Test harness initialization looks solidLocalQueryRunner/catalog wiring + CLP functions registration is correct and isolated via setUp/tearDown. Nothing to change here.
| // Check assignments contain expected columns | ||
| Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments(); | ||
| Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values()); | ||
|
|
||
| if (!expectedColumns.equals(actualColumns)) { | ||
| return NO_MATCH; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Reduce brittleness: check expected ⊆ actual columns
Planner may legally project extra columns. Require that actual contains at least the expected set instead of strict equality.
- if (!expectedColumns.equals(actualColumns)) {
+ if (!actualColumns.containsAll(expectedColumns)) {
return NO_MATCH;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Check assignments contain expected columns | |
| Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments(); | |
| Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values()); | |
| if (!expectedColumns.equals(actualColumns)) { | |
| return NO_MATCH; | |
| } | |
| // Check assignments contain expected columns | |
| Map<VariableReferenceExpression, ColumnHandle> actualAssignments = tableScanNode.getAssignments(); | |
| Set<ColumnHandle> actualColumns = new HashSet<>(actualAssignments.values()); | |
| if (!actualColumns.containsAll(expectedColumns)) { | |
| return NO_MATCH; | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 308 to 314, the test currently asserts exact equality between
expectedColumns and actualColumns which is brittle because the planner may add
extra projected columns; change the check to ensure expectedColumns is a subset
of actualColumns (i.e., verify actualColumns.containsAll(expectedColumns)) so
the test passes when additional legal columns are present while still enforcing
required columns are present.
|
For the PR title how about: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java (1)
37-43: Prefer “double” over “floating point” in the descriptionAligns with StandardTypes.DOUBLE and avoids ambiguity.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (5)
148-156: DRY: factor the rewrite+pushdown pipeline into a helperReduce duplication and make tests easier to evolve alongside optimiser changes.
@@ ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); - PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); - ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); - optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session); @@ - ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); - PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); - ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); - optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session); @@ - ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); - PlanNode optimizedPlan = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); - ClpComputePushDown optimizer = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); - optimizedPlan = optimizer.optimize(optimizedPlan, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + PlanNode optimizedPlan = optimizeThroughUdfAndPushdown(plan, session); + + private PlanNode optimizeThroughUdfAndPushdown(Plan plan, Session session) + { + ClpUdfRewriter udfRewriter = new ClpUdfRewriter(functionAndTypeManager); + PlanNode rewritten = udfRewriter.optimize(plan.getRoot(), session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + ClpComputePushDown pushDown = new ClpComputePushDown(functionAndTypeManager, functionResolution, splitFilterProvider); + return pushDown.optimize(rewritten, session.toConnectorSession(), variableAllocator, planNodeIdAllocator); + }Also applies to: 187-195, 241-244
301-302: Guard Optional before get() to avoid brittle NPEsMatcher should fail gracefully when layout is absent.
- ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) tableScanNode.getTable().getLayout().get(); + Optional<com.facebook.presto.spi.TableLayoutHandle> layout = tableScanNode.getTable().getLayout(); + if (!layout.isPresent()) { + return NO_MATCH; + } + ClpTableLayoutHandle actualLayoutHandle = (ClpTableLayoutHandle) layout.get();
312-314: Allow planner to project extra columns (subset check instead of equality)Exact set equality is brittle; assert expected ⊆ actual.
- if (!expectedColumns.equals(actualColumns)) { + if (!actualColumns.containsAll(expectedColumns)) { return NO_MATCH; }
303-306: Loosen layout comparison to semantic checks (table + required conjuncts)Equality on the full ClpTableLayoutHandle (and exact KQL string/parentheses) is fragile. Prefer: assert same table handle, Optional present, and that the KQL contains required conjunct substrings; do similar for metadata SQL if present. If getters are available on ClpTableLayoutHandle, switch to them; otherwise, minimally compare by parts rather than whole-object equality.
Would you like me to draft a matcher variant that compares by table handle and verifies KQL/metadata fragments via substring membership?
142-179: Add coverage for special characters and invalid UDF usagesGiven encodeJsonPath currently only normalises '.', '_' and uppercase, add tests covering paths with $, @, spaces, brackets, unicode; also add negative tests for non-constant args and wrong arity to assert the rewriter throws CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION (or leaves calls unrewritten and fails predictably).
I can add targeted tests modelled on these three methods to cover special characters and invalid usages. Want a patch?
Also applies to: 231-266
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
PR: y-scope/presto#46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.
Applied to files:
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
🧬 Code graph analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java (1)
presto-common/src/main/java/com/facebook/presto/common/type/StandardTypes.java (1)
StandardTypes(22-80)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
ClpComputePushDown(47-175)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
ClpUdfRewriter(58-306)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(33-155)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(50-171)presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java (1)
LocalQueryRunner(304-1248)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
ClpMetadataDbSetUp(45-264)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
ClpMetadata(57-206)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestissimo-worker-images-build
| import com.facebook.presto.common.block.Block; | ||
| import com.facebook.presto.common.type.StandardTypes; | ||
| import com.facebook.presto.spi.function.Description; | ||
| import com.facebook.presto.spi.function.ScalarFunction; | ||
| import com.facebook.presto.spi.function.SqlType; | ||
| import io.airlift.slice.Slice; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use PrestoException with NOT_SUPPORTED for clearer user-facing errors
Throwing UnsupportedOperationException surfaces as an internal error to users. Prefer a PrestoException with NOT_SUPPORTED and a clear action message.
import com.facebook.presto.common.block.Block;
import com.facebook.presto.common.type.StandardTypes;
import com.facebook.presto.spi.function.Description;
import com.facebook.presto.spi.function.ScalarFunction;
import com.facebook.presto.spi.function.SqlType;
import io.airlift.slice.Slice;
+import com.facebook.presto.spi.PrestoException;
+import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import com.facebook.presto.common.block.Block; | |
| import com.facebook.presto.common.type.StandardTypes; | |
| import com.facebook.presto.spi.function.Description; | |
| import com.facebook.presto.spi.function.ScalarFunction; | |
| import com.facebook.presto.spi.function.SqlType; | |
| import io.airlift.slice.Slice; | |
| import com.facebook.presto.common.block.Block; | |
| import com.facebook.presto.common.type.StandardTypes; | |
| import com.facebook.presto.spi.function.Description; | |
| import com.facebook.presto.spi.function.ScalarFunction; | |
| import com.facebook.presto.spi.function.SqlType; | |
| import io.airlift.slice.Slice; | |
| import com.facebook.presto.spi.PrestoException; | |
| import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED; |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
around lines 16 to 22, the code currently throws UnsupportedOperationException
which surfaces as an internal error; replace those throws with throwing a
PrestoException using StandardErrorCode.NOT_SUPPORTED (NOT_SUPPORTED) and a
clear, actionable message for the user. Add the necessary imports for
PrestoException and StandardErrorCode.NOT_SUPPORTED, and when a function is not
implemented return throw new PrestoException(NOT_SUPPORTED, "Feature X is not
supported; please use Y or file a ticket"), substituting a concise description
of the unsupported operation and suggested action.
| public final class ClpFunctions | ||
| { | ||
| private ClpFunctions() | ||
| { | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Centralise placeholder throwing and improve messages
Consolidate the placeholder exception and use consistent, actionable messages so accidental execution fails clearly.
public final class ClpFunctions
{
private ClpFunctions()
{
}
+ private static RuntimeException placeholderInvocation(String name)
+ {
+ return new PrestoException(
+ NOT_SUPPORTED,
+ name + " must be rewritten by the CLP query rewriter and should not be executed at runtime.");
+ }
+
@ScalarFunction(value = "CLP_GET_BIGINT", deterministic = false)
@Description("Retrieves an integer value corresponding to the given JSON path.")
@SqlType(StandardTypes.BIGINT)
public static long clpGetBigint(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
{
- throw new UnsupportedOperationException("CLP_GET_BIGINT is a placeholder function without implementation.");
+ throw placeholderInvocation("CLP_GET_BIGINT");
}
@ScalarFunction(value = "CLP_GET_DOUBLE", deterministic = false)
- @Description("Retrieves a floating point value corresponding to the given JSON path.")
+ @Description("Retrieves a double value corresponding to the given JSON path.")
@SqlType(StandardTypes.DOUBLE)
public static double clpGetDouble(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
{
- throw new UnsupportedOperationException("CLP_GET_DOUBLE is a placeholder function without implementation.");
+ throw placeholderInvocation("CLP_GET_DOUBLE");
}
@ScalarFunction(value = "CLP_GET_BOOL", deterministic = false)
@Description("Retrieves a boolean value corresponding to the given JSON path.")
@SqlType(StandardTypes.BOOLEAN)
public static boolean clpGetBool(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
{
- throw new UnsupportedOperationException("CLP_GET_BOOL is a placeholder function without implementation.");
+ throw placeholderInvocation("CLP_GET_BOOL");
}
@ScalarFunction(value = "CLP_GET_STRING", deterministic = false)
@Description("Retrieves a string value corresponding to the given JSON path.")
@SqlType(StandardTypes.VARCHAR)
public static Slice clpGetString(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
{
- throw new UnsupportedOperationException("CLP_GET_STRING is a placeholder function without implementation.");
+ throw placeholderInvocation("CLP_GET_STRING");
}
@ScalarFunction(value = "CLP_GET_STRING_ARRAY", deterministic = false)
@Description("Retrieves an array value corresponding to the given JSON path and converts each element into a string.")
- @SqlType("ARRAY(VARCHAR)")
+ @SqlType("array(varchar)")
public static Block clpGetStringArray(@SqlType(StandardTypes.VARCHAR) Slice jsonPath)
{
- throw new UnsupportedOperationException("CLP_GET_STRING_ARRAY is a placeholder function without implementation.");
+ throw placeholderInvocation("CLP_GET_STRING_ARRAY");
}
}Also applies to: 29-36, 37-44, 45-52, 53-60, 61-67
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
around lines 23-28 (and similarly for 29-36, 37-44, 45-52, 53-60, 61-67),
replace ad-hoc placeholder throws with a single centralized helper that always
throws a clear, consistent exception message; add a private static method (e.g.,
failPlaceholderExecution()) that throws an UnsupportedOperationException with a
descriptive actionable message like "CLP function placeholder - should not be
executed at runtime; ensure function wiring or implementation is provided" and
call that helper from each placeholder location so all placeholders fail
uniformly and with an actionable message.
| @ScalarFunction(value = "CLP_GET_STRING_ARRAY", deterministic = false) | ||
| @Description("Retrieves an array value corresponding to the given JSON path and converts each element into a string.") | ||
| @SqlType("ARRAY(VARCHAR)") | ||
| public static Block clpGetStringArray(@SqlType(StandardTypes.VARCHAR) Slice jsonPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Normalise type literal casing in @SqlType
Presto annotations typically use lower-case type literals (array(varchar)). The parser is case-insensitive, but lower-case improves consistency.
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFunctions.java
around lines 61 to 64, the @SqlType annotation uses an upper-case type literal
"ARRAY(VARCHAR)"; change it to the normalized lower-case form "array(varchar)"
to match Presto annotation style and improve consistency.
| localQueryRunner = new LocalQueryRunner(defaultSession); | ||
| localQueryRunner.createCatalog("clp", new ClpConnectorFactory(), ImmutableMap.of( | ||
| "clp.metadata-db-url", format(METADATA_DB_URL_TEMPLATE, dbHandle.getDbPath()), | ||
| "clp.metadata-db-user", METADATA_DB_USER, | ||
| "clp.metadata-db-password", METADATA_DB_PASSWORD, | ||
| "clp.metadata-table-prefix", METADATA_DB_TABLE_PREFIX)); | ||
| localQueryRunner.getMetadata().registerBuiltInFunctions(extractFunctions(new ClpPlugin().getFunctions())); | ||
| functionAndTypeManager = localQueryRunner.getMetadata().getFunctionAndTypeManager(); | ||
| functionResolution = new FunctionResolution(functionAndTypeManager.getFunctionAndTypeResolver()); | ||
| splitFilterProvider = new ClpMySqlSplitFilterProvider(new ClpConfig()); | ||
| planNodeIdAllocator = new PlanNodeIdAllocator(); | ||
| variableAllocator = new VariableAllocator(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Note on dependency alignment for reviewers
Tests rely on planner behaviours that may differ with Velox/submodule bumps. Ensure the PR branch points to the Velox commit that includes duplicate-projection fixes mentioned in the PR discussion to avoid false negatives during review.
If helpful, I can open a follow-up task to sync the Velox pointer and re-run this suite.
| project( | ||
| ImmutableMap.of( | ||
| "clp_get_bigint", | ||
| PlanMatchPattern.expression("user_und_id"), | ||
| "clp_get_double", | ||
| PlanMatchPattern.expression("fare"), | ||
| "clp_get_string", | ||
| PlanMatchPattern.expression("user"), | ||
| "clp_get_bool", | ||
| PlanMatchPattern.expression("is_uxholiday"), | ||
| "clp_get_string_array", | ||
| PlanMatchPattern.expression("tags"), | ||
| "expr", | ||
| PlanMatchPattern.expression("city.Name")), | ||
| ClpTableScanMatcher.clpTableScanPattern( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Avoid coupling tests to internal variable-name encoding
Assertions that reference encoded names like user_und_id and is_uxholiday will break if encoding rules change (e.g., to handle $, @). Prefer alias-agnostic matchers (e.g., anySymbol/anyExpression) or assert presence by column handle set only; bind symbols via the TableScan’s output ordering rather than exact names.
I can rewrite these project assertions to be alias-agnostic while still verifying the expected symbols flow through. Proceed?
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
around lines 203-217, the test is coupling assertions to encoded variable names
like user_und_id and is_uxholiday which can break if encoding rules change;
replace those literal name checks with alias-agnostic matchers (e.g.,
anySymbol()/anyExpression()) or match by the TableScan’s output ordering/column
handle set instead: bind the expected expressions to positions rather than
specific symbol names, use PlanMatchPattern.anySymbol()/anyExpression() where
the actual alias is irrelevant, and/or assert the presence and order of column
handles from ClpTableScanMatcher so the test verifies data flow without
depending on name encoding.
Don't forget modify the PR title before merging |
CLP_GET_* UDFs with rewrites for schemaless querying.
Description
This PR introduces new user-defined functions (UDFs) for the CLP connector to improve querying of semi-structured logs:
CLP_GET_*functions for retrieving values from JSON paths, targeting specific CLP column types (ClpString,Integer,Float, etc.). These UDFs enable direct access to dynamic fields and return corresponding Presto native types.They are integrated with the query rewriting layer. During query optimization, calls to these functions are rewritten into normal column references or KQL query column symbols, ensuring efficient execution with no additional parsing overhead.
The PR doesn't update the documentation and will leave it for a future PR.
Checklist
breaking change.
Validation performed
All unit tests passed. End-to-end testing worked.
Summary by CodeRabbit
New Features
Refactor
Tests
Chores