Skip to content

Conversation

@xuzifu666
Copy link
Member

/**
* Runs simplification unary function by eliminating idempotent.
*/
private RexCall simplifUnaryFunction(RexCall rexCall) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can simplify lower(upper(lower(x))) to upper(lower(x)) or lower(upper(x))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit like a special case, because the two functions have opposite effects. Maybe can hand this case, and it will not affect the default behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, my example was too specific. What I meant is that there are two idempotent functions, f and g, which may have multiple repetitions and can be combined arbitrarily, but will ultimately simplify to either f(g(x)) or g(f(x)). Of course, special cases like the one I mentioned above are exceptions.

Copy link
Member Author

@xuzifu666 xuzifu666 Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,currently this goal can be achieved based on this PR. For example, lower(lower(upper(uper('a')))) will be simplified to lower(upper('a')).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @julianhyde provided in the case of Jira, these two functions do not have opposite effects, so this case is not need to be simplified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think simplifyIdempotentUnaryFunction is more appropriate.

if (e.getClass() == RexCall.class) {
return simplifyGenericNode((RexCall) e);
RexCall rexCall = (RexCall) e;
rexCall = simplifUnaryFunction(rexCall);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should keep the same format as before, using case upper/lower/abs.

Copy link
Member Author

@xuzifu666 xuzifu666 Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do this at first, but it is handled according to SqlKind. The SqlKind of upper/lower/abs is OTHER_FUNCTION. If I need to add a new SqlKind and modify the SqlKind of these functions because of this, it doesn’t feel necessary.
For example, if a new XXX function is added in future, the corresponding SqlKind must also be added. The coupling feels a bit strong, but it is very convenient to put it directly in the List.

@julianhyde
Copy link
Contributor

Can we take this discussion to Jira? We need to have a conversation about specification. Those conversations belong in Jira.

Also please STOP logging a jira case just the moment before you push a PR. We know that you have been working on the PR for a few days. Give the rest of us chance to weigh in on the specification before you write the implementation.

Please fix the reference to the jira case. Is it 7122 or 7112?

@xuzifu666 xuzifu666 changed the title [CALCITE-7112] Eliminate nested calls for idempotent unary functions upper/lower/abs [CALCITE-7122] Eliminate nested calls for idempotent unary functions upper/lower/abs Aug 4, 2025
@xuzifu666
Copy link
Member Author

Can we take this discussion to Jira? We need to have a conversation about specification. Those conversations belong in Jira.

Also please STOP logging a jira case just the moment before you push a PR. We know that you have been working on the PR for a few days. Give the rest of us chance to weigh in on the specification before you write the implementation.

Please fix the reference to the jira case. Is it 7122 or 7112?

OK, thanks for the reminder. I will list the JIRA in advance so that everyone can evaluate it together. @julianhyde

/**
* Runs simplification unary function by eliminating idempotent.
*/
private RexCall simplifUnaryFunction(RexCall rexCall) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think simplifyIdempotentUnaryFunction is more appropriate.

private static final Strong STRONG = new Strong();

private static final List<SqlOperator> IDEMOTENT_UNARY_FUNCTIONS =
Arrays.asList(SqlStdOperatorTable.UPPER, SqlStdOperatorTable.LOWER, SqlStdOperatorTable.ABS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableList is better

Copy link
Member Author

@xuzifu666 xuzifu666 Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your points and had changed them, thanks @silundong

@xuzifu666 xuzifu666 changed the title [CALCITE-7122] Eliminate nested calls for idempotent unary functions upper/lower/abs [CALCITE-7122] Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP Aug 6, 2025
/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7122">[CALCITE-7122]
* Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP</a>. */
@Test void testSimplifyIdempotentUnaryFunctions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you demonstrate this simplification capability using toSql? For example, if you write select upper(upper(lower(lower('x')))) it can be simplified to upper(lower('x')). This kind of test is not very easy to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to facilitate intuitive understanding of the case, I added SQL testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. RexImplicationCheckerTest is a unit test. If you drag in the functionality that goes to and from SQL, it is not longer a unit test.

To make the tests easier to understand, add comments. E.g. "Test that abs(abs(x)) is simplified to abs(x)".

Copy link
Member Author

@xuzifu666 xuzifu666 Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added more detail comments for these tests and remove sql tests. @julianhyde

Copy link
Contributor

@ILuffZhe ILuffZhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's see if there are new comments come in the next 2 days.

@xiedeyantu
Copy link
Member

Can idempotent functions such as ceil and floor also be considered together?

@xuzifu666
Copy link
Member Author

Can idempotent functions such as ceil and floor also be considered together?

The ceil and floor functions have already been separately processed before, and the logic is a little complex, so they should be no need to handle them here.

@xiedeyantu
Copy link
Member

Can idempotent functions such as ceil and floor also be considered together?

The ceil and floor functions have already been separately processed before, and the logic is a little complex, so they should be no need to handle them here.

I see, these two functions are binary functions. I don't have any other questions for now.

@julianhyde
Copy link
Contributor

I see, these two functions are binary functions.

Numeric CEIL and FLOOR are unary. E.g. CEIL(CEIL(3.14)) = CEIL(3.14) = 4.

Datetime CEIL and FLOOR are binary. E.g. FLOOR(FLOOR(DATE '2025-02-08' TO MONTH)) = FLOOR(DATE '2025-02-08' TO MONTH) = DATE '2025-02-01'. The second parameter is a time unit, basically a literal over an enum type, so we could regard the functions as a family of unary functions, e.g. FLOOR_MONTH, CEIL_MONTH, FLOOR_YEAR, etc.

@xiedeyantu
Copy link
Member

xiedeyantu commented Aug 10, 2025

I see, these two functions are binary functions.

Numeric CEIL and FLOOR are unary. E.g. CEIL(CEIL(3.14)) = CEIL(3.14) = 4.

Datetime CEIL and FLOOR are binary. E.g. FLOOR(FLOOR(DATE '2025-02-08' TO MONTH)) = FLOOR(DATE '2025-02-08' TO MONTH) = DATE '2025-02-01'. The second parameter is a time unit, basically a literal over an enum type, so we could regard the functions as a family of unary functions, e.g. FLOOR_MONTH, CEIL_MONTH, FLOOR_YEAR, etc.

I agree with @julianhyde 's idea. I also have a suggestion—though I’m not sure how feasible it is—to consolidate all the currently identified unary idempotent functions into a unified encapsulation while ensuring compatibility with the existing ones in  simplify . The goal would be to extract a shared logic layer to avoid scattering similar implementations across multiple places, which could deduce future maintenance costs. Currently, only four functions are handled uniformly, and without careful review, others might mistakenly treat  CEIL  and  FLOOR  as non-idempotent or assume they were overlooked. If modifying this proves too difficult, we could at least document all the relevant functions in the comments to prevent future developers from thinking the feature was implemented incompletely.
I’m not entirely sure if this suggestion is valid. If Julian has time, I’d appreciate it if you could take a look as well.

@xuzifu666
Copy link
Member Author

xuzifu666 commented Aug 10, 2025

Floor and ceil are not purely unary functions also support binary, and they were previously handled separately, so I did not include them in the logic of the unary function list. They were not accidentally omitted, and I think it is not a problem to handle them uniformly in unary way.

But regarding the scalability mentioned,personally I believe that putting all idempotent functions on a unified list and identifying other types is consistent with the current approach. If there is no clear transformation plan, there is no need to abstract it, which may increase the understanding cost for subsequent developers or let someone with a deep understanding of the plan to come up with a clear plan to follow up separately.

@xuzifu666 xuzifu666 changed the title [CALCITE-7122] Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP [CALCITE-7122] Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP/CEIL/FLOOR Aug 11, 2025
@sonarqubecloud
Copy link

@mihaibudiu
Copy link
Contributor

what is the status of this PR? It is marked as "changes requested". Have all change request been addressed?

@xuzifu666
Copy link
Member Author

what is the status of this PR? It is marked as "changes requested". Have all change request been addressed?

IMO it had been addressed all comments now. @mihaibudiu

@mihaibudiu
Copy link
Contributor

@xiedeyantu can you please confirm that you are satisfied with these changes?

@xiedeyantu
Copy link
Member

@xiedeyantu can you please confirm that you are satisfied with these changes?

@mihaibudiu I'm just concerned that the current implementation might be too loose and not conducive to long-term maintenance. What I was trying to express is whether each function itself could have a common property to indicate whether it is an idempotent function. If there were a way to achieve this point, it would also benefit functional dependency analysis later, such as determining whether a function is an injective function. Or perhaps I'm being too critical - logically I don't think there should be any errors in the current approach. If you think the current implementation is fine, I have no objection.

@mihaibudiu
Copy link
Contributor

So your suggestion is to add a property to SqlFunction which says whether the function is idempotent?
Frankly, I think there would be too many such properties to create for various interesting semantic properties of functions that could be used by the optimizer.
Moreover, I think that the kinds of programs optimized by this PR never occur in practice, because this is just not how people write programs. So I think this is a lot of work for very little benefit.

@xiedeyantu
Copy link
Member

xiedeyantu commented Sep 25, 2025

So your suggestion is to add a property to SqlFunction which says whether the function is idempotent? Frankly, I think there would be too many such properties to create for various interesting semantic properties of functions that could be used by the optimizer. Moreover, I think that the kinds of programs optimized by this PR never occur in practice, because this is just not how people write programs. So I think this is a lot of work for very little benefit.

If I understand correctly, you also hope that all functions can have some necessary common properties, although not necessarily the idempotent property. If this PR can achieve minimal intrusive changes, I’m also fine with it. However, I also think that the optimization in the current PR is not very cost-effective.

@mihaibudiu I'm considering exploring whether it's feasible to add some useful attributes to functions. At a certain stage, achieving function dependency capabilities requires handling specific functions, but I'm not sure if adding attributes is a worthwhile effort for Calcite.

@mihaibudiu
Copy link
Contributor

There are some properties like determinism, behavior with nulls, etc. which are very useful for optimizations.
Idempotence - perhaps less.
I think this should be judged on a case by case basis.

@mihaibudiu
Copy link
Contributor

So let's review this once more and merge it if it works; we are spending a lot of time on an optimization which will probably be never applied in practice.

@rubenada
Copy link
Contributor

rubenada commented Oct 8, 2025

What's the status of this PR? Does it fulfill the scope that was agreed upon?

@mihaibudiu
Copy link
Contributor

There is some concern that this is relatively intrusive, making code more complex and hard to maintain, and it does not really solve a practical problem. That's why it wasn't merged yet.

@mihaibudiu
Copy link
Contributor

I would be more confident if someone could supply a real example where this would be beneficial. The only case I can imagine is for machine-generated SQL.

@xuzifu666
Copy link
Member Author

I would be more confident if someone could supply a real example where this would be beneficial. The only case I can imagine is for machine-generated SQL.

This should be an optimization for functions, similar to generating more efficient SQL. This idempotency elimination is a relatively obvious SQL optimization, but it is hard to say how much efficiency it can improve. For example, some enterprise SQL platforms will first optimize SQL, requiring the most concise and efficient SQL output, and then send it to the compute engine for execution. I think this optimization is profitable, although not much. @mihaibudiu

@mihaibudiu
Copy link
Contributor

Can you show a (non-artificial) program written by some person or machine which would benefit from this optimization?

if (values.size() == 1) {
if (values.get(0) instanceof BigDecimal) {
BigDecimal bigDecimal = (BigDecimal) values.get(0);
return bigDecimal.longValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong for example for UNSIGNED BIGINT.
You have to keep the value BigDecimal.
But I don't understand what this change has to do with the issue being solved.


private static final Strong STRONG = new Strong();

private static final ImmutableList<SqlOperator> IDEMOTENT_UNARY_FUNCTIONS =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in "idempotent"

break;
}
}
return simplifyGenericNode(rexCall);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This action was not invoked in the original code.

if (e.getOperands().size() != 2) {
// Bail out since we only simplify floor <date>
return e;
return simplifyIdempotentUnaryFunction(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call from this point isn't the operator always in the IDEMPOTENT_UNARY_FUNCTIONS?
Is that check still necessary for other paths?

* <a href="https://issues.apache.org/jira/browse/CALCITE-7122">[CALCITE-7122]
* Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP/CEIL/FLOOR</a>.
* */
@Test void testSimplifyIdempotentUnaryFunctions() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are fine, but a test that goes through RelToSql may be easier to read and write.

@xuzifu666
Copy link
Member Author

I hadn't noticed this information before, but I see that it mentions other architectural design suggestions, so we might need to discuss it further.

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 5, 2025
@github-actions github-actions bot removed the stale label Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants