-
Notifications
You must be signed in to change notification settings - Fork 178
Pushdown sort by complex expressions to scan #4750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pushdown sort by complex expressions to scan #4750
Conversation
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteSortCommandIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteSortCommandIT.java
Outdated
Show resolved
Hide resolved
| project.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE); | ||
|
|
||
| // Branch 1: Check if complex expressions are already sorted by scan | ||
| if (handleComplexExpressionsSortedByScan(call, project, toTraits, toCollation)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some IT and ExplainIT for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of sort expression pushdown ITs are for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: which one for example?
and can this predicate move to definition of ExpandCollationOnProjectExprRule.Config (can reduce node copy)
...search/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Outdated
Show resolved
Hide resolved
...rch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/SortExpressionInfo.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/util/OpenSearchRelOptUtil.java
Show resolved
Hide resolved
Signed-off-by: Songkan Tang <[email protected]>
| b2.operand(CalciteLogicalIndexScan.class) | ||
| .predicate( | ||
| Predicate.not( | ||
| CalciteLogicalIndexScan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems duplicated with the below predication of noAggregatePushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use noAggregatePushed now
|
|
||
| // Check if the scan has sort expressions pushed down | ||
| if (scan.getPushDownContext().stream() | ||
| .noneMatch(operation -> operation.type() == PushDownType.SORT_EXPR)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the pushed SORT(i.e. SORT_EXPR with all digests are simple expression) can satisfy sort collation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible. The cost computation has coarse results by multiplying field count with 1.1. So VolcanoPlanner will prefer regular field sort pushdown.
Another option is to calculate the accurate complex expression count and multiply this count with 1.1. Put a check in the rule to not pushdown if all of expressions are simple expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, in this query ... | sort a | eval b = a + 1 | sort b, when pushing the sort b, this method will always return false. But I expect it return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it returns false in this case but doesn't matter. Now, if all of expressions are simple, it will go through regular pushDownSort. Field sort exists in regular traitset and is propagated well. It will be handled by handleSimpleExpressionFieldSorts method in ExpandCollationOnProjectExprRule.
| Integer offsetValue = LimitIndexScanRule.extractOffsetValue(sort.offset); | ||
| newScan = (CalciteLogicalIndexScan) scan.pushDownLimit(sort, limitValue, offsetValue); | ||
| } else { | ||
| // Attempt to push down sort expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still push down sort although scanProvidesRequiredCollation is true? I think we could just remove the sort directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking if first sort by [a, b] expressions is already in scan. The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.
For topK pushdown, it could be also applied. If sort by [a, b], limit 10 is pushed, say the second sort by [a], limit 2 comes in, we should allow it pushdown as well. This may need replace existing SORT_EXPR and LIMIT pushdown context.
But right now, it has the problem that multiple different sort complex expressions doesn't work well yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.
Shall we do the same thing for the above branch then? It only push down limit when scanProvidesRequiredCollation is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisit the logic, for already pushed down topK and if the scan collation satisfies the output collation, we can still leave the old sort in the scan(remove the new sort in the meantime). It's kind of keeping semantics. For regular sort pushdown, we can override by the new sort.
| throw new UnsupportedScriptException( | ||
| "ScriptQueryExpression requires a valid current time from hook, but it is not set"); | ||
| } | ||
| Map<String, Object> mergedParams = new HashMap<>(params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use LinkedHashMap here or the final plan may be unstable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use LinkedHashMap now
| OSRequestBuilderAction action = | ||
| requestBuilder -> { | ||
| for (SortExprDigest info : sortExprDigests) { | ||
| requestBuilder.pushDownSortExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Will this method throw exception by any chance? We plan to make the request builder transformation process lazy. We should put any logic which may throw exception outside of the lambda function, so it can avoid failure on the final plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, move most of logic that could throw exception outside. And use a lazy pushdownSortSuppliers method instead.
| sortExprDigest.getDirection().name())); | ||
|
|
||
| Script script = scriptExpr.getScript(); | ||
| if (script != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if script is null? Seems nothing happens. We better add assertion or throw exception if that's never possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should never happen. I neglect this check.
| case SORT_EXPR -> { | ||
| @SuppressWarnings("unchecked") | ||
| List<SortExprDigest> sortKeys = (List<SortExprDigest>) operation.digest(); | ||
| dCpu += NumberUtil.multiply(dRows, 1.1 * sortKeys.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should filter the simple expr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, it will only count complex expr. The calculation is fine grained.
Signed-off-by: Songkan Tang <[email protected]>
Description
Add sort performance enhancement of pushing down sort by complex expressions to scan.
Related Issues
Resolves #3912
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.