Skip to content

Conversation

@joegallo
Copy link
Contributor

@joegallo joegallo commented Aug 4, 2025

If you trace the org.elasticsearch.common.cache.Cache, you can see that it uses the hashCode of the key object pretty extenstively (see some of the discussion on #96050). For example, in a cache.evictEntry(entry) call, we first hashCode to get the right segment, then within the segment we hashCode to do a get, then finally we do a hashCode in the remove. So that's three hashCode calls to remove an entry. And the latter two are done while holding the writeLock. This isn't the be-all-end-all of performance, but it's a tidy little speedup, and it's easy to write.

In the same area, there's a minor bug around the handling of the NULL_MARKER in the bitset cache. While the NULL_MARKER bitset is shared, and so it's fair to count the size of the object as zero, the entries in the cache still take up space because of the size of the keys. Since we're O(segments * queries), it could be the case that many many entries in the cache that could be pointing to the NULL_MARKER, so let's not consider them to be completely free in terms of memory consumption. (Note: I didn't bother to update the logic around logging a warning if a bitset is created that won't fit in the cache -- so technically we could be off by 24 bytes there. I can change that if you'd prefer we remain technically correct.)

As with #132416, I've added @tvernum as a reviewer just as an FYI to him that this PR exists, I don't actually need his +1 specifically (anybody the @elastic/es-security team is fine by me).

This actually matters, because while the hashCode for a BooleanQuery
is already cached, it's not eagerly computed. Pre-computing the
hashCode moves the initial calculation of the hashCode outside of
anywhere that we're holding a lock.
An entry in the cache can be small, but it's not fair to consider it
to be *zero* (in extremely bad luck scenarios, we can end up with a
lot of NULL_MARKER entries, so let's not treat them as *free*).
@joegallo joegallo added >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v9.2.0 v9.1.1 v8.19.1 v9.0.5 labels Aug 4, 2025
@joegallo joegallo requested review from a team and tvernum August 4, 2025 19:40
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

@joegallo joegallo added the auto-backport Automatically create backport pull requests when merged label Aug 4, 2025
@slobodanadamovic slobodanadamovic requested review from slobodanadamovic and removed request for a team August 6, 2025 12:37
.setExpireAfterAccess(ttl)
.setMaximumWeight(maxWeightBytes)
.weigher((key, bitSet) -> bitSet == NULL_MARKER ? 0 : bitSet.ramBytesUsed())
.weigher((key, bitSet) -> BitsetCacheKey.SHALLOW_SIZE + (bitSet == NULL_MARKER ? 0 : bitSet.ramBytesUsed()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the intention but this concerns me a little. It means that the cache will now hold few items than it did before (for the same configuration). That's potentially going to mean someone's node ends up caching less than it used to, even though nothing has really changed.

Have we done any analysis on what it means for the overall cache size?

@joegallo
Copy link
Contributor Author

I'm going to close this one for now -- I'll split out the cached hashCode component into its own PR and revisit the issue of the zero size of NULL_MARKER later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v8.19.3 v9.0.6 v9.1.3 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants