Skip to content

Conversation

@xiedeyantu
Copy link
Member

See: CALCITE-7122

I've implemented a rough draft for this JIRA ticket based on my understanding. The actual implementation for idempotent elimination is referenced from the original author. The logic I refactored is as follows:

  1. Allow functions to provide their own idempotent property, instead of maintaining a list of which functions are idempotent (this addresses a point I've been emphasizing).
  2. Handle idempotent functions in an appropriate location, avoiding scattering the idempotent elimination logic across too many places (another point I previously mentioned).
  3. Since the current JIRA describes eliminating unary idempotent functions, I suggest temporarily excluding FLOOR and CEIL from consideration.

I haven't modified the test cases from the original PR #4488 , as I want to use them to verify the functional equivalence of the current code. If this implementation approach is approved, we can establish more specific requirements for test cases in the future.

This is just an idea for your reference.

@xiedeyantu xiedeyantu marked this pull request as draft October 12, 2025 15:52
@xiedeyantu
Copy link
Member Author

This is not a ready PR, just a suggestion.

@rubenada
Copy link
Contributor

Could be a valid approach. It's aligned with already existing aspects like "deterministic" and "dynamic".
Maybe in the long run we could even consider combining these fields (deterministic, dynamic, idempotent, and any other that may come) into a single flag field instead of N booleans.

@xiedeyantu
Copy link
Member Author

Could be a valid approach. It's aligned with already existing aspects like "deterministic" and "dynamic". Maybe in the long run we could even consider combining these fields (deterministic, dynamic, idempotent, and any other that may come) into a single flag field instead of N booleans.

Yes, I think we can continue to integrate or refactor this later. The current PR is mainly intended to share an initial idea. If you'd like me to further improve it, please let me know — I'm not sure if the original author has time to continue working on their PR.

@xiedeyantu
Copy link
Member Author

I updated the test case and forced submission. It is currently in the ready state. If the original author continues his work, I will close this PR.

@xiedeyantu xiedeyantu marked this pull request as ready for review October 13, 2025 13:00
@xiedeyantu
Copy link
Member Author

Could be a valid approach. It's aligned with already existing aspects like "deterministic" and "dynamic". Maybe in the long run we could even consider combining these fields (deterministic, dynamic, idempotent, and any other that may come) into a single flag field instead of N booleans.

I have filed a jira CALCITE-7224 to record it.

getReturnTypeInference(), getOperandTypeInference(), operandHandler,
getOperandTypeChecker(), callValidator,
getFunctionType(), monotonicityInference, dynamic);
getFunctionType(), monotonicityInference, dynamic, idempotent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep the constructor version with a default value of 'false' for this argument?
One alternative is to define a new enum { IDEMPOTENT, NONIDEMPOTENT } instead of using boolean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand your comment. Here, a similar function to "copy" requires ensuring idempotency with previously set values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several comments.
you have changed lots of constructor invocations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new constructor and restored the previously modified create method. Is this the modification you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a (new) boolean flag makes more sense considering the already existing flag (deterministic, dynamic). Modifying the constructors is the way to ensure backwards compatibility and avoid breaking consumers.
Whenever we tackle CALCITE-7224, we could deprecate all the constructors using the several flags (with the classic "to be removed before 2.0" comment), and create a single one with a unified flag field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubenada Indeed, using a boolean type aligns with the current design, which is why I used a boolean type in the first version. Using an enum type does break this pattern, but it more clearly indicates whether it is idempotent or non-idempotent. Since I have created a Jira ticket to refactor this boolean value, both approaches seem less critical at this point. I have two solutions: one is to complete the refactoring work in that Jira ticket first and then finish this PR. The other is, if @mihaibudiu agrees to revert to using a boolean type, I can also roll back the code and complete this PR first.

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but many Boolean flags will be hard to manage.
I think having them separate is still necessary, because they are independent.
The only thing I can think of is to give them different artificial enum types.

@xiedeyantu
Copy link
Member Author

xiedeyantu commented Oct 13, 2025

I have filed a jira CALCITE-7224

@mihaibudiu I have filed a jira CALCITE-7224, and can be processed together with existing deterministic and dynamic follow-ups. The current implementation retains the same style for now.

@mihaibudiu
Copy link
Contributor

Yes, I saw that, but that may require changing signatures for these functions...

@xiedeyantu
Copy link
Member Author

Yes, I saw that, but that may require changing signatures for these functions...

I'd like to confirm whether you're suggesting that using an enum containing two attributes, IDEMPOTENT and NONIDEMPOTENT, is a better approach in the current implementation, without having to worry about the original implementation style for now? Are you considering refactoring in the future?

@mihaibudiu
Copy link
Contributor

Yes, I think that once you make them Boolean, you can't change them to something else.
Let's see if anyone else has a different suggestion.
People may not like a lot of tiny enum classes, but from my experience they are not too bad, and they provide very strong typing.
In general, if you can wrap an abstraction, no matter how small, in a class, I think it's worth wrapping (unless performance is a concern).

@xiedeyantu
Copy link
Member Author

Yes, I think that once you make them Boolean, you can't change them to something else. Let's see if anyone else has a different suggestion. People may not like a lot of tiny enum classes, but from my experience they are not too bad, and they provide very strong typing. In general, if you can wrap an abstraction, no matter how small, in a class, I think it's worth wrapping (unless performance is a concern).

I agree with your suggestion. Since refactoring may not happen immediately, we can first implement the new feature to an ideal state. I will later change it to an enum type.

@mihaibudiu
Copy link
Contributor

I have changed my mind, I don't think this is an ideal design for two reasons:

  • it is not sustainable to add flags to all functions for every property that may be interesting
  • simplify is not the right place for all optimizations; this optimization in particular should be done only once, while simplify runs frequently.
    I think the right way to solve this is through a visitor pattern; the visitor will know which functions are idempotent.

@xiedeyantu
Copy link
Member Author

I have changed my mind, I don't think this is an ideal design for two reasons:

  • it is not sustainable to add flags to all functions for every property that may be interesting
  • simplify is not the right place for all optimizations; this optimization in particular should be done only once, while simplify runs frequently.
    I think the right way to solve this is through a visitor pattern; the visitor will know which functions are idempotent.

I don't entirely agree with your perspective. On the contrary, I believe adding possible function properties to SqlOperator is very appropriate. Moreover, if the refactoring is done well, adding future properties will be straightforward. Although simplify may be called frequently, the actual processing might only occur once. However, this optimization point is indeed very niche. If it's deemed unnecessary to implement, I think we could also close this JIRA.

@xiedeyantu
Copy link
Member Author

Hi @mihaibudiu, do you also think the function property of idempotency is too niche and not suitable for inclusion as a common property? If that is your point, I agree with your perspective. However, if we set aside its limited applicability, properties like determinism, dynamism, and monotonicity are already maintained as common attributes in this manner, so I believe this approach is reasonable.

@mihaibudiu
Copy link
Contributor

It's not really about niche and non-niche, in general the number of algebraic properties you may look for when optimizing is unbounded. It is not sustainable to modify all objects to represent such properties; that's exactly what the visitor pattern is designed to solve.

@xiedeyantu
Copy link
Member Author

It's not really about niche and non-niche, in general the number of algebraic properties you may look for when optimizing is unbounded. It is not sustainable to modify all objects to represent such properties; that's exactly what the visitor pattern is designed to solve.

I think we might be discussing how to refactor the function attribute system, which seems unrelated to the purpose of this PR. Since using boolean types as attribute identifiers is already an existing practice, should we accept extending it in the same way? If our final conclusion is not to agree, this PR could be temporarily closed, and the discussion on refactoring function attributes could continue in the new Jira ticket. What do you think?

@mihaibudiu
Copy link
Contributor

You can leave the PR open and move the discussion to JIRA.

@xiedeyantu
Copy link
Member Author

I have rolled back the code to the boolean-based implementation. If it gains approval, it can be merged. If it doesn’t receive consensus, I will still keep the PR open here, as it aligns with my design.

@sonarqubecloud
Copy link

@xiedeyantu xiedeyantu added the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label Oct 22, 2025
@xiedeyantu
Copy link
Member Author

This proposed solution was not accepted, so I am closing this issue for now.

@xiedeyantu xiedeyantu closed this Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants