fix(native): Replace lambda body with optimized expression in NativeExpressionOptimizer#27143
fix(native): Replace lambda body with optimized expression in NativeExpressionOptimizer#27143pramodsatya wants to merge 2 commits intoprestodb:masterfrom
Conversation
Reviewer's GuideAdjusts NativeExpressionOptimizer's lambda handling to replace and re-optimize the entire LambdaDefinitionExpression returned by the sidecar, and adds an integration-style test harness plus tests verifying constant folding in lambda bodies, along with a test dependency update. Sequence diagram for updated lambda optimization in NativeExpressionOptimizersequenceDiagram
participant Planner
participant NativeExpressionOptimizer
participant ReplacingVisitor
participant Resolver
participant Sidecar
Planner->>NativeExpressionOptimizer: optimize(rowExpression)
NativeExpressionOptimizer->>ReplacingVisitor: visitExpression(originalExpression)
ReplacingVisitor->>ReplacingVisitor: visitLambda(lambda)
ReplacingVisitor->>ReplacingVisitor: canBeReplaced(lambda)
alt lambda_can_be_replaced
ReplacingVisitor->>Resolver: apply(lambda)
Resolver->>Sidecar: optimize(lambda)
Sidecar-->>Resolver: optimizedLambda
Resolver-->>ReplacingVisitor: optimizedLambda
ReplacingVisitor->>ReplacingVisitor: optimizedBody = optimizedLambda.getBody()
ReplacingVisitor->>ReplacingVisitor: optimizedBody.accept(this)
ReplacingVisitor->>ReplacingVisitor: toRowExpression(lambda.sourceLocation, optimizedBody, optimizedBody.type)
ReplacingVisitor-->>NativeExpressionOptimizer: new LambdaDefinitionExpression(..., optimizedBody)
else lambda_not_replaced
ReplacingVisitor-->>NativeExpressionOptimizer: lambda
end
NativeExpressionOptimizer-->>Planner: optimizedRowExpression
Class diagram for updated lambda handling in NativeExpressionOptimizerclassDiagram
class NativeExpressionOptimizer {
+RowExpression optimize(RowExpression originalExpression)
-ReplacingVisitor replacingVisitor
}
class ReplacingVisitor {
+RowExpression visitExpression(RowExpression originalExpression, Void context)
+RowExpression visitLambda(LambdaDefinitionExpression lambda, Void context)
-boolean canBeReplaced(RowExpression expression)
-RowExpression toRowExpression(OptionalSourceLocation sourceLocation, RowExpression body, Type type)
-Function<RowExpression, RowExpression> resolver
}
class RowExpression {
+Type getType()
}
class LambdaDefinitionExpression {
+OptionalSourceLocation getSourceLocation()
+List<Type> getArgumentTypes()
+List<VariableReferenceExpression> getArguments()
+RowExpression getBody()
}
class VariableReferenceExpression {
+String getName()
+Type getType()
}
class Type {
}
class OptionalSourceLocation {
}
class Function_RowExpression_RowExpression_ {
+RowExpression apply(RowExpression expression)
}
NativeExpressionOptimizer --> ReplacingVisitor : uses
ReplacingVisitor ..|> RowExpressionVisitor : implements
ReplacingVisitor --> Function_RowExpression_RowExpression_ : has_resolver
ReplacingVisitor --> RowExpression : operates_on
ReplacingVisitor --> LambdaDefinitionExpression : visits
LambdaDefinitionExpression --> RowExpression : has_body
LambdaDefinitionExpression --> VariableReferenceExpression : has_arguments
LambdaDefinitionExpression --> Type : has_argumentTypes
RowExpression --> Type : has_type
LambdaDefinitionExpression --> OptionalSourceLocation : has_sourceLocation
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
ff22724 to
fbd4872
Compare
|
@aditi-pandit, @tdcmeehan, @pdabre12 could you please help review this fix? |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
visitLambda, theresolver.apply(lambda)result is blindly cast toLambdaDefinitionExpression; consider validating the type or failing fast with a clear error message if the resolver ever returns a non-lambda to avoid unexpectedClassCastExceptions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `visitLambda`, the `resolver.apply(lambda)` result is blindly cast to `LambdaDefinitionExpression`; consider validating the type or failing fast with a clear error message if the resolver ever returns a non-lambda to avoid unexpected `ClassCastException`s.
## Individual Comments
### Comment 1
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionOptimizer.java:86-94` </location>
<code_context>
+ closeAllRuntimeException(queryRunner);
+ }
+
+ @Test
+ public void testLambdaBodyConstantFolding()
+ {
+ assertOptimizedEquals("transform(ARRAY[unbound_long, unbound_long2], x -> 1 + 1)",
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for non-optimizable and partially optimizable lambda bodies to ensure the visitor remains a no-op when appropriate
Currently `testLambdaBodyConstantFolding` only covers fully constant-foldable lambda bodies. Please add tests for:
- A non-foldable lambda body (e.g., `x -> x + unbound_long`), asserting the optimized expression remains structurally equivalent to the input.
- A partially foldable body (e.g., `x -> x + (1 + 1)`), asserting only the constant part is folded and the lambda/argument wiring is preserved.
This will help catch regressions where the replacement logic incorrectly rewrites lambdas or drops arguments.
```suggestion
@Test
public void testLambdaBodyConstantFolding()
{
// Fully constant-foldable lambda bodies
assertOptimizedEquals("transform(ARRAY[unbound_long, unbound_long2], x -> 1 + 1)",
"transform(ARRAY[unbound_long, unbound_long2], x -> 2)");
assertOptimizedEquals("transform(ARRAY[unbound_long, unbound_long2], x -> cast('123' AS integer))",
"transform(ARRAY[unbound_long, unbound_long2], x -> 123)");
assertOptimizedEquals("transform(ARRAY[unbound_long, unbound_long2], x -> cast(json_parse('[1, 2]') AS ARRAY<INTEGER>)[1] + 1)",
"transform(ARRAY[unbound_long, unbound_long2], x -> 2)");
// Non-foldable lambda body: ensure lambda wiring and arguments are preserved
assertOptimizedEquals(
"transform(ARRAY[unbound_long, unbound_long2], x -> x + unbound_long)",
"transform(ARRAY[unbound_long, unbound_long2], x -> x + unbound_long)");
// Partially foldable lambda body: only constant part should be folded
assertOptimizedEquals(
"transform(ARRAY[unbound_long, unbound_long2], x -> x + (1 + 1))",
"transform(ARRAY[unbound_long, unbound_long2], x -> x + 2)");
}
```
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionOptimizer.java:89-92` </location>
<code_context>
+ @Test
+ public void testLambdaBodyConstantFolding()
+ {
+ assertOptimizedEquals("transform(ARRAY[unbound_long, unbound_long2], x -> 1 + 1)",
+ "transform(ARRAY[unbound_long, unbound_long2], x -> 2)");
+ assertOptimizedEquals("transform(ARRAY[unbound_long, unbound_long2], x -> cast('123' AS integer))", "transform(ARRAY[unbound_long, unbound_long2], x -> 123)");
+ assertOptimizedEquals("transform(ARRAY[unbound_long, unbound_long2], x -> cast(json_parse('[1, 2]') AS ARRAY<INTEGER>)[1] + 1)",
+ "transform(ARRAY[unbound_long, unbound_long2], x -> 2)");
+ }
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for nested lambdas and multiple lambda occurrences in the same expression
Since the bug involved replacing entire `LambdaDefinitionExpression` nodes, it would be good to also cover:
- A lambda whose body contains another lambda (e.g. `transform(..., x -> transform(ARRAY[1,2], y -> 1 + 1))`).
- Multiple lambdas in a single expression (e.g. nested `transform` calls, or `transform` plus `filter` each with their own lambda).
This would verify that the visitor correctly traverses and normalizes all optimized lambda bodies, even when nested or repeated.
Suggested implementation:
```java
@Test
public void testLambdaBodyConstantFolding()
{
assertOptimizedEquals(
"transform(ARRAY[unbound_long, unbound_long2], x -> 1 + 1)",
"transform(ARRAY[unbound_long, unbound_long2], x -> 2)");
assertOptimizedEquals(
"transform(ARRAY[unbound_long, unbound_long2], x -> cast('123' AS integer))",
"transform(ARRAY[unbound_long, unbound_long2], x -> 123)");
assertOptimizedEquals(
"transform(ARRAY[unbound_long, unbound_long2], x -> cast(json_parse('[1, 2]') AS ARRAY<INTEGER>)[1] + 1)",
"transform(ARRAY[unbound_long, unbound_long2], x -> 2)");
}
@Test
public void testNestedLambdaBodyConstantFolding()
{
assertOptimizedEquals(
"transform(ARRAY[unbound_long, unbound_long2], x -> transform(ARRAY[1, 2], y -> 1 + 1))",
"transform(ARRAY[unbound_long, unbound_long2], x -> transform(ARRAY[1, 2], y -> 2))");
}
@Test
public void testMultipleLambdaOccurrencesConstantFolding()
{
assertOptimizedEquals(
"filter(transform(ARRAY[unbound_long, unbound_long2], x -> 1 + 1), x -> x > 1 + 1)",
"filter(transform(ARRAY[unbound_long, unbound_long2], x -> 2), x -> x > 2)");
```
- Ensure the closing brace for `testMultipleLambdaOccurrencesConstantFolding` is correctly followed by the next method or class-level brace in the file (the snippet you provided is truncated, so adjust brace placement if needed).
- No additional imports should be necessary if `assertOptimizedEquals`, `transform`, and `filter` are already used elsewhere in this test class; otherwise, align with existing helper methods and function usage patterns in this file.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...gin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionOptimizer.java
Show resolved
Hide resolved
...gin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionOptimizer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in the NativeExpressionOptimizer where lambda expressions returned by the native sidecar were not being correctly optimized. The issue was that the visitor was checking replacement eligibility only on the lambda body, but the complete lambda expression is sent to the sidecar.
Changes:
- Modified
ReplacingVisitor#visitLambdato check and replace the entireLambdaDefinitionExpressioninstead of just the body - Added comprehensive test coverage for lambda body constant folding scenarios
- Added
presto-analyzertest dependency
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
NativeExpressionOptimizer.java |
Updated lambda visitor to handle complete lambda expression optimization and apply recursive rewrites to optimized body |
TestNativeExpressionOptimizer.java |
Added new test class with harness and tests for lambda constant folding optimization |
pom.xml |
Added presto-analyzer test dependency for test infrastructure |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...gin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionOptimizer.java
Outdated
Show resolved
Hide resolved
|
@pramodsatya Why do we need the new test file? Can we just add the new test cases to |
|
@pdabre12, the |
aditi-pandit
left a comment
There was a problem hiding this comment.
@pramodsatya : Have one main question about the test. Will look into more detail after that.
| } | ||
|
|
||
| @Test | ||
| public void testLambdaBodyConstantFolding() |
There was a problem hiding this comment.
We should have constant folding tests elsewhere in the code as well, right ? Am curious why you have this new test class.
There was a problem hiding this comment.
@aditi-pandit, currently we have tests for constant folding with sidecar in TestNativeExpressionInterpreter and in TestNativeSidecarPlugin.testNativeExpressionOptimizer(). This bug is in NativeExpressionOptimizer so it cannot be validated with TestNativeExpressionInterpreter.
Using a test assertion such as
protected void assertQueryWithSameQueryRunner(Session session, @Language("SQL") String actual, @Language("SQL") String expected)
in the testcase TestNativeSidecarPlugin.testNativeExpressionOptimizer(), only verifies that the result of evaluating both SQL queries is the same, and it cannot be used to verify the expression tree is constant folded per expectations.
So the new test class is needed here. Please let me know if I am missing something.
There was a problem hiding this comment.
Thanks @pramodsatya
@pdabre12 : Can you do a first round of review, especially for the java test code ? The C++ code looks fine.
pdabre12
left a comment
There was a problem hiding this comment.
Thanks @pramodsatya.
I am fine with the extra test file % one change.
| return translator.translate(parsedExpression, SYMBOL_TYPES); | ||
| } | ||
|
|
||
| private NativeExpressionOptimizer getNativeExpressionOptimizer(FunctionAndTypeManager functionAndTypeManager, NodeManager nodeManager) |
There was a problem hiding this comment.
I see most of this code is repeated in TestNativeExpressionInterpreter#getRowExpressionInterpreter.
Can we extract the common code in a function?
There was a problem hiding this comment.
You can create a common function for the injector and then get the needed instance accordingly.
|
@pramodsatya You need to exclude the new test class file under this profile to address the CI failure. https://github.com/prestodb/presto/blob/master/presto-native-sidecar-plugin/pom.xml#L308 |
8642c17 to
c9f0ed0
Compare
Thanks @pdabre12, updated accordingly, could you PTAL? |
Description
Fixes a bug in
NativeExpressionOptimizer.ReplacingVisitor#visitLambdawhere an optimized lambda returned by the native sidecar is not handled correctly. The visitor currently checks replacement-eligibility only on the lambda body and then applies a body-level replacement. However, the complete lambda expression is sent to the sidecar for optimization and not just the body of lambda expression. This causes the optimizer to skip rewriting lambdas with optimized body, leaving unoptimized subtrees.This fix makes the visitor check and replace the entire
LambdaDefinitionExpressionwith a newLambdaDefinitionExpressioncontaining the replacement's body. This allows for any further rewrites to be applied to the optimized body before constructing the finalLambdaDefinitionExpression(See #27122).Motivation and Context
The native sidecar optimizer returns optimized lambda expressions where only the lambda body is optimized. The
ReplacingVisitorlogic only checks if lambda body can be replaced and then applies the replacement. TheCollectingVisitorlogic however gathers the completeLambdaDefinitionExpressionfor optimization via the sidecar and not just the lambda's body.Impact
Not user-facing, correctness fix in
NativeExpressionOptimizerthat applies at planning stage for C++ clusters.Test Plan
Added e2e tests.
Summary by Sourcery
Improve native sidecar expression optimization for lambda expressions by correctly optimizing the lambda body after applying the resolver, and add coverage for lambda constant folding.
Enhancements:
Tests:
Summary by Sourcery
Fix native sidecar lambda optimization to operate on the full lambda expression returned by the resolver and add targeted test coverage for lambda constant folding in transform expressions.
Bug Fixes:
Build:
Tests: