-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
HHH-17002, HHH-18820, HHH-19391, HHH-18514 equals() and hashCode() for SQM nodes #10323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HHH-17002, HHH-18820, HHH-19391, HHH-18514 equals() and hashCode() for SQM nodes #10323
Conversation
|
Thanks for your pull request! This pull request does not follow the contribution rules. Could you have a look? ❌ All commit messages should start with a JIRA issue key matching pattern › This message was automatically generated. |
…r SQM nodes - finally enables efficient caching of criteria query plans - also reconsider how alias generation is done - aliases should only be unique to a given query, NOT globally unique, since that results in interpretation cache misses - ran into and fixed several other problems along the way - note that the previous solution based on translating to HQL was not working at all, partly because the translation to HQL is not very correct - but anyway this is more efficient, since hashCodes are in general more flexible from an efficiency perspective - there is still a remaining problem where NavigablePaths elements are assigned globally unique aliases resulting in cache misses
…oblem - just deprecate the method, since HQL doesn't allow anonymous CTEs - perhaps we need to do the same for the withRescursiveXxxxx() methods - I don't really see a better fix, since nested CTEs can be composed before composing them with the outer query (perhaps we could modify the aliases of all CTEs after the whole query is built)
Oracle disapproves of aliases of form "_1"
29fd7e3 to
ace0d87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the approach looks interesting and I think it has potential. The uses of toHqlString() might lead to problems, because it is only used for certain predicates/expressions instead of the whole query, which can ultimately lead to wrong results. Please check the inline comments for further details.
hibernate-core/src/main/java/org/hibernate/query/sqm/function/SelfRenderingSqmFunction.java
Show resolved
Hide resolved
hibernate-core/src/main/java/org/hibernate/query/sqm/tree/domain/AbstractSqmFrom.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public String generateAlias() { | ||
| return "t_" + (++aliasCounter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating the statement with such counters makes substitutability checks somewhat "impossible" AFAICT. If the order of operations is different, the generated aliases might be different even though the queries really are equivalent.
At first, I also thought that this might be a potential concurrency concern, but I guess it's fine since this all happens during CriteriaQuery construction.
Another "problem" of this approach is that people might add From nodes from other CriteriaQuery objects which coincidentally have the same generated alias, but do not represent the same node. Any use of toHqlString() for equality is then at danger of producing wrong results. Note that introducing From nodes from other queries appears to be a common programmer mistake, so this is a real problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generated aliases might be different even though the queries really are equivalent.
Well, yeah, sure, in future we could come up with a more efficient implementation which is less sensitive to equivalent permutations of the query syntax. But for now we're more interested in avoiding false hits on the cache than we are in worrying about false misses.
Note that introducing From nodes from other queries appears to be a common programmer mistake, so this is a real problem.
Well then if we don't allow that, we should detect it and throw a meaningful error. It's not the responsibility of this code to deal with that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yeah, sure, in future we could come up with a more efficient implementation which is less sensitive to equivalent permutations of the query syntax. But for now we're more interested in avoiding false hits on the cache than we are in worrying about false misses.
I might be missing some context as I was gone for a while and am still catching up, so please be patient with me for a bit :)
Using an equals/hashCode based approach surely is an interesting solution if we can figure out the aliasing part, but you seem to imply that there is a bug currently where one CriteriaQuery hits the query plan cache of another that is in fact not equal. I checked the issues you linked but nothing in the descriptions/comments implied that a false cache hit is their root cause. Can you please clarify?
Well then if we don't allow that, we should detect it and throw a meaningful error. It's not the responsibility of this code to deal with that problem.
I'm trying to give you context so you understand my concerns better. Obviously generateAlias() is not the place to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify?
Well there are two problems at present:
- There are multiple of node types for where
toHqlString()has missing "bits". And so different queries generate the same HQL string, resulting in an incorrect cache hit whenhibernate.criteria.plan_cache_enabledis turned on. - On the other hand, when
hibernate.criteria.plan_cache_enabledis off, the query caches criteria queries based on the identity of the SQL tree, which is quite likely to result in almost no hits at all.
So whether hibernate.criteria.plan_cache_enabled is enabled or not, we don't get nice behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And of course there's also the fact that evaluating a hashCode() or equals() is by nature more efficient than rendering a string, since it doesn't evolve allocation, and can often be quickly short-circuited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple of node types for where toHqlString() has missing "bits"
We should fix these missing "bits" then, because the HQL representation is still useful to people and should be semantically correct, regardless what query cache key strategy we use.
the query caches criteria queries based on the identity of the SQL tree, which is quite likely to result in almost no hits at all.
I agree that this is debatable, but I think we discussed this in the past and decided to default plan caching to false, because the majority in the team thought criteria queries are too dynamic to benefit from caching. Though that flag being false makes the query cache key null i.e. not even do a lookup.
And of course there's also the fact that evaluating a
hashCode()orequals()is by nature more efficient than rendering a string, since it doesn't evolve allocation, and can often be quickly short-circuited.
I would hope so as well, but performance folks repeatedly surprised me in the past about my misconceptions, so I try to be defensive and prefer measuring before jumping to conclusions. The choice of the current implementation was obviously also biased by me trying to reuse existing code and wanting to avoid setting generated aliases on the From nodes.
hibernate-core/src/main/java/org/hibernate/query/sqm/tree/domain/AbstractSqmAttributeJoin.java
Outdated
Show resolved
Hide resolved
hibernate-core/src/main/java/org/hibernate/query/sqm/tree/domain/AbstractSqmJoin.java
Outdated
Show resolved
Hide resolved
...rnate-core/src/main/java/org/hibernate/query/sqm/tree/expression/AsWrapperSqmExpression.java
Outdated
Show resolved
Hide resolved
...e-core/src/main/java/org/hibernate/query/sqm/tree/expression/SqmSelfRenderingExpression.java
Outdated
Show resolved
Hide resolved
...nate-core/src/main/java/org/hibernate/query/sqm/tree/expression/SqmSetReturningFunction.java
Show resolved
Hide resolved
| return this == parameter ? 0 : 1; | ||
| } | ||
|
|
||
| // this is not really a parameter, it's really a literal value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your implementation surely is better that the existing one, but this thing actually really is a parameter, in the sense that for caching, the value doesn't matter. This is a point where using equals/hashCode for cache keys is kind of quirky. Naturally, the equality of this type probably should check for value equality, but for caching, the only thing that matters IMO is the fact that the "other" object to compare against is also a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that a correct implementation is of equals() is:
return object instanceof ValueBindJpaCriteriaParameterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that a correct implementation is of
equals()is:return object instanceof ValueBindJpaCriteriaParameter
Well, no, that's no good. Lots and lots of test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what tests fail as I haven't tried this myself yet, though wanted to mention that in theory, return object instanceof ValueBindJpaCriteriaParameter should in fact be a good enough implementation for cache key purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that equals() is called on parameters in other contexts not related to caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's why I wrote this:
This is a point where using
equals/hashCodefor cache keys is kind of quirky.
Essentially, this boils down to the "custom equals method with context parameter" suggestion I made in a different discussion on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah well I tried that and it was a bloodbath.
We could probably make the change you suggest and then fix whatever other code is making use of equals(). It could use compareTo() instead, for example.
Not going to do that in this PR though, since this seems to be working well enough for now. We can open an issue.
Right, so, to clarify: the two places where I'm definitely not going to do that in this already-huge pull request, and I'm not going to let that minor issue stop me merging it. Calling |
|
Let my try to explain my main concerns and thinking with this approach to caching
On the other hand, the HQL string approach that we have right now traverses the SQM tree only once to generate the string. Hash code generation and equality checks don't do pointer chasing and benefit from vector instructions on CPUs. The only downside to this approach is IMO the additional allocation needed for the HQL string. To me, the HQL string approach seems more favorable, but maybe I'm wrong and someone from the performance team can prove my intuition about this topic wrong. HTH |
It's extremely unlikely that the whole SQM tree will be traversed on a hash collision. The overwhelmingly common case is that the comparison terminates after a couple of nodes. And hash collisions are also quite unlikely: there are almost 2^31 ~= 2E10 integers. There are way fewer criteria queries than that. It's true that for a cache hit the whole tree gets traversed (just once), but that's clearly a huge win compared to actually recompiling the criteria query from scratch. |
|
To be clear: computing a hashcode is a far more efficient operation than incrementally rendering a string, which usually involves multiple allocations; hashes can be cached just as easily as strings; when cached they occupy much less memory; and when we're done with them, they don't need to be garbage collected. |
|
Let's consider the whole process here. Suppose I have a newly-instantiated criteria query object, and I want to find if we have a cached plan for it. Under the current implementation:
Note that the cost of this is completely dominated by step 1. Under the new implementation:
In this implementation, steps 1 and 4 dominate, and are naively about equally expensive. But both are much less expensive than step 1 of the current implementation. |
|
Note also, that setting a generous load factor is completely realistic here, since there's only one such cache (or at most a handful) in the whole program. We're much more interested in not having those all those HQL strings floating around occupying memory than we are in scrimping on the load factor for a singleton map. |
|
Like I wrote already, the HQL string approach being faster is just my gut feeling, because I assume pointer chasing is more expensive than CPU vectorized computations + allocation. I might just as well be wrong, though it also felt easier to just reuse the HQL string implementation since we already had code around. If we want to know for sure, we probably will have to test the performance of both, though I think the implementations you added are useful regardless of the actual query cache key strategy we end up using. |
|
OK, so that’s a great approach and I 100% endorse taking on questions about performance with a whole lot of humility. But we can’t possibly do any meaningful performance comparisons while this is a pull request. Even if that’s possible in principle, we both know it’s not going to happen in practice. So here we’re just going to have to rely on some intuition. And I don’t feel like one can go very far wrong with the basic intuition that allocating heap memory is a relatively expensive operation in a language with automatic memory management. I don’t know how or why vectorization would change that. Can a garbage collector be efficiently vectorized? |
That's the thing, if the allocation can happen with the TLB memory, it might be super fast, but in general, "it depends" ;) |
As far as I understand, there's absolutely no way that you could possibly guarantee that this will happen. So even if it does happen in some sort of benchmark where you're hitting the query interpretation cache over and over again in a loop to evaluate which implementation is faster, there's absolutely no way you say for sure that it would happen in a real running program. |
… and reenable a check
|
Superseded by #10354 due to conflicts. |
Latest rebase of #10139.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license
and can be relicensed under the terms of the LGPL v2.1 license in the future at the maintainers' discretion.
For more information on licensing, please check here.