-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7122] Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP/CEIL/FLOOR #4488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /** | ||
| * Runs simplification unary function by eliminating idempotent. | ||
| */ | ||
| private RexCall simplifUnaryFunction(RexCall rexCall) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can simplify lower(upper(lower(x))) to upper(lower(x)) or lower(upper(x))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit like a special case, because the two functions have opposite effects. Maybe can hand this case, and it will not affect the default behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, my example was too specific. What I meant is that there are two idempotent functions, f and g, which may have multiple repetitions and can be combined arbitrarily, but will ultimately simplify to either f(g(x)) or g(f(x)). Of course, special cases like the one I mentioned above are exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,currently this goal can be achieved based on this PR. For example, lower(lower(upper(uper('a')))) will be simplified to lower(upper('a')).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @julianhyde provided in the case of Jira, these two functions do not have opposite effects, so this case is not need to be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think simplifyIdempotentUnaryFunction is more appropriate.
| if (e.getClass() == RexCall.class) { | ||
| return simplifyGenericNode((RexCall) e); | ||
| RexCall rexCall = (RexCall) e; | ||
| rexCall = simplifUnaryFunction(rexCall); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should keep the same format as before, using case upper/lower/abs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to do this at first, but it is handled according to SqlKind. The SqlKind of upper/lower/abs is OTHER_FUNCTION. If I need to add a new SqlKind and modify the SqlKind of these functions because of this, it doesn’t feel necessary.
For example, if a new XXX function is added in future, the corresponding SqlKind must also be added. The coupling feels a bit strong, but it is very convenient to put it directly in the List.
|
Can we take this discussion to Jira? We need to have a conversation about specification. Those conversations belong in Jira. Also please STOP logging a jira case just the moment before you push a PR. We know that you have been working on the PR for a few days. Give the rest of us chance to weigh in on the specification before you write the implementation. Please fix the reference to the jira case. Is it 7122 or 7112? |
OK, thanks for the reminder. I will list the JIRA in advance so that everyone can evaluate it together. @julianhyde |
| /** | ||
| * Runs simplification unary function by eliminating idempotent. | ||
| */ | ||
| private RexCall simplifUnaryFunction(RexCall rexCall) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think simplifyIdempotentUnaryFunction is more appropriate.
| private static final Strong STRONG = new Strong(); | ||
|
|
||
| private static final List<SqlOperator> IDEMOTENT_UNARY_FUNCTIONS = | ||
| Arrays.asList(SqlStdOperatorTable.UPPER, SqlStdOperatorTable.LOWER, SqlStdOperatorTable.ABS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableList is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your points and had changed them, thanks @silundong
| /** Test case for | ||
| * <a href="https://issues.apache.org/jira/browse/CALCITE-7122">[CALCITE-7122] | ||
| * Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP</a>. */ | ||
| @Test void testSimplifyIdempotentUnaryFunctions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you demonstrate this simplification capability using toSql? For example, if you write select upper(upper(lower(lower('x')))) it can be simplified to upper(lower('x')). This kind of test is not very easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to facilitate intuitive understanding of the case, I added SQL testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. RexImplicationCheckerTest is a unit test. If you drag in the functionality that goes to and from SQL, it is not longer a unit test.
To make the tests easier to understand, add comments. E.g. "Test that abs(abs(x)) is simplified to abs(x)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I added more detail comments for these tests and remove sql tests. @julianhyde
ILuffZhe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's see if there are new comments come in the next 2 days.
…UPPER/LOWER/ABS/INITCAP
|
Can idempotent functions such as ceil and floor also be considered together? |
The ceil and floor functions have already been separately processed before, and the logic is a little complex, so they should be no need to handle them here. |
I see, these two functions are binary functions. I don't have any other questions for now. |
Numeric Datetime |
I agree with @julianhyde 's idea. I also have a suggestion—though I’m not sure how feasible it is—to consolidate all the currently identified unary idempotent functions into a unified encapsulation while ensuring compatibility with the existing ones in |
|
Floor and ceil are not purely unary functions also support binary, and they were previously handled separately, so I did not include them in the logic of the unary function list. They were not accidentally omitted, and I think it is not a problem to handle them uniformly in unary way. But regarding the scalability mentioned,personally I believe that putting all idempotent functions on a unified list and identifying other types is consistent with the current approach. If there is no clear transformation plan, there is no need to abstract it, which may increase the understanding cost for subsequent developers or let someone with a deep understanding of the plan to come up with a clear plan to follow up separately. |
|
|
what is the status of this PR? It is marked as "changes requested". Have all change request been addressed? |
IMO it had been addressed all comments now. @mihaibudiu |
|
@xiedeyantu can you please confirm that you are satisfied with these changes? |
@mihaibudiu I'm just concerned that the current implementation might be too loose and not conducive to long-term maintenance. What I was trying to express is whether each function itself could have a common property to indicate whether it is an idempotent function. If there were a way to achieve this point, it would also benefit functional dependency analysis later, such as determining whether a function is an injective function. Or perhaps I'm being too critical - logically I don't think there should be any errors in the current approach. If you think the current implementation is fine, I have no objection. |
|
So your suggestion is to add a property to SqlFunction which says whether the function is idempotent? |
If I understand correctly, you also hope that all functions can have some necessary common properties, although not necessarily the idempotent property. If this PR can achieve minimal intrusive changes, I’m also fine with it. However, I also think that the optimization in the current PR is not very cost-effective. @mihaibudiu I'm considering exploring whether it's feasible to add some useful attributes to functions. At a certain stage, achieving function dependency capabilities requires handling specific functions, but I'm not sure if adding attributes is a worthwhile effort for Calcite. |
|
There are some properties like determinism, behavior with nulls, etc. which are very useful for optimizations. |
|
So let's review this once more and merge it if it works; we are spending a lot of time on an optimization which will probably be never applied in practice. |
|
What's the status of this PR? Does it fulfill the scope that was agreed upon? |
|
There is some concern that this is relatively intrusive, making code more complex and hard to maintain, and it does not really solve a practical problem. That's why it wasn't merged yet. |
|
I would be more confident if someone could supply a real example where this would be beneficial. The only case I can imagine is for machine-generated SQL. |
This should be an optimization for functions, similar to generating more efficient SQL. This idempotency elimination is a relatively obvious SQL optimization, but it is hard to say how much efficiency it can improve. For example, some enterprise SQL platforms will first optimize SQL, requiring the most concise and efficient SQL output, and then send it to the compute engine for execution. I think this optimization is profitable, although not much. @mihaibudiu |
|
Can you show a (non-artificial) program written by some person or machine which would benefit from this optimization? |
| if (values.size() == 1) { | ||
| if (values.get(0) instanceof BigDecimal) { | ||
| BigDecimal bigDecimal = (BigDecimal) values.get(0); | ||
| return bigDecimal.longValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong for example for UNSIGNED BIGINT.
You have to keep the value BigDecimal.
But I don't understand what this change has to do with the issue being solved.
|
|
||
| private static final Strong STRONG = new Strong(); | ||
|
|
||
| private static final ImmutableList<SqlOperator> IDEMOTENT_UNARY_FUNCTIONS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in "idempotent"
| break; | ||
| } | ||
| } | ||
| return simplifyGenericNode(rexCall); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This action was not invoked in the original code.
| if (e.getOperands().size() != 2) { | ||
| // Bail out since we only simplify floor <date> | ||
| return e; | ||
| return simplifyIdempotentUnaryFunction(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call from this point isn't the operator always in the IDEMPOTENT_UNARY_FUNCTIONS?
Is that check still necessary for other paths?
| * <a href="https://issues.apache.org/jira/browse/CALCITE-7122">[CALCITE-7122] | ||
| * Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP/CEIL/FLOOR</a>. | ||
| * */ | ||
| @Test void testSimplifyIdempotentUnaryFunctions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are fine, but a test that goes through RelToSql may be easier to read and write.
|
I hadn't noticed this information before, but I see that it mentions other architectural design suggestions, so we might need to discuss it further. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |



jira: https://issues.apache.org/jira/browse/CALCITE-7122