Skip to content

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Aug 27, 2025

For heavy users of Document Level Security (DLS), and where the entire DLS bitset cache cannot be held in memory at once, we can experience a high degree of cache churn. Due to the locking that keeps the DocumentSubsetBitsetCache's bitsetCache and keysByIndex in sync, we end up seeing an extreme level of lock contention when entries are evicted from the cache (as they are frequently if the cache is churning).

The purpose of the keysByIndex data structure is to allow us to proactively evict entries from the cache in the event that their associated segment becomes inaccessible (e.g. because of a segment merge, or if an index is closed or deleted).

By removing the locking around the updates to the bitsetCache and keysByIndex structures, we get significantly improved throughput, but it becomes possible (though the chances are quite small) that we will no longer have an entry in the keysByIndex structure for an index that is open and for which there is an entry in the bitsetCache. As a consequence, if that segment later becomes inaccessible, we will not proactively remove the entry from the cache. This is not a true memory leak, however, as the maximum size and TTL policies of the cache still apply, and the entry will be removed from the cache eventually.

I've created a small benchmark that indexes ~10 million documents in 8 indices and then runs a selection of searches and aggregations against the indices via 63 user accounts associated with different DLS role queries. If all the data were in the cache at the same time, it would be approximately 64mb, but the cache is limited to 48mb during the benchmark run.

On main without these changes, I see the following results from the benchmark:

|                                                Mean Throughput |           dls-search |    76.79        |  ops/s |
|                                              Median Throughput |           dls-search |    76.84        |  ops/s |
|                                        50th percentile latency |           dls-search |  1604.59        |     ms |
|                                        90th percentile latency |           dls-search |  2624.26        |     ms |
|                                        99th percentile latency |           dls-search |  3547.23        |     ms |

And with these changes (approximately 15x better throughput and latency):

|                                                Mean Throughput |           dls-search |  1169.86        |  ops/s |
|                                              Median Throughput |           dls-search |  1170.12        |  ops/s |
|                                        50th percentile latency |           dls-search |    93.6846      |     ms |
|                                        90th percentile latency |           dls-search |   167.648       |     ms |
|                                        99th percentile latency |           dls-search |   269.456       |     ms |

Because sufficiently poor performance can be characterized as a bug, I'm labeling this PR as a >bug and I intend to backport it to all the relevant branches.

Note: because I've removed the bitsetCache.get call from the onCacheEviction method, this PR happens to fix #132842.

This is a WIP commit, in that these locks will be going away entirely,
but I want the 'ignored' name to be available.
the code is simpler this way, and doesn't require an allocation.
If the set has been emptied in and onCacheEviction call, then remove
it from the map.
Since we're tidying the map as we go, it's harder to reason about
whether the error is that the set is null versus if the set doesn't
contain one entry in particular, so treat those conditions as being
the same.
This test hits the race condition described in `onClose` a little less
than once in a thousand runs on my machine, so we can't check for the
same level of strict internal consistency between the two data
structures (it's possible for the cache to contain a bitset that isn't
referenced by the keysByIndex structure, and that's okay).
@joegallo joegallo added >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team auto-backport Automatically create backport pull requests when merged v9.2.0 v9.1.4 v9.0.7 v8.18.7 v8.19.4 labels Aug 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

@joegallo joegallo requested a review from tvernum August 27, 2025 19:39
@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've updated the changelog YAML for you.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the work on this.

It's amazing when a substantial investment in time leads to such an improvement simply by removing code.

Copy link
Contributor

@szybia szybia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@joegallo joegallo merged commit 98a73ce into elastic:main Aug 28, 2025
39 checks passed
@joegallo joegallo deleted the remove-bitsetcache-locking branch August 28, 2025 10:18
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.1 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 133681

@joegallo
Copy link
Contributor Author

#133707 will be autobackported to 8.18 and 8.19, so between that and #133705 we should be all set on the backports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v8.18.7 v8.19.4 v9.0.7 v9.1.4 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DocumentSubsetBitsetCache eviction increments misses statistic

4 participants