Catch up DLS with recent Lucene changes. #133966

jpountz · 2025-09-02T09:44:41Z

Lucene introduced the Bits#applyMask API to speed up the evaluation of non-scoring queries (e.g. aggregations-only queries) and is considering making BitSet a sealed class that only allows FixedBitSet and SparseFixedBitSet as sub-classes to control the performance impact of virtual calls.

As a consequence, this change:

renames CombinedBitSet to CombinedBits,
makes it implement Bits instead of extending BitSet,
implements CombinedBits#applyMask,
refactors how the query and live docs are intersected in ContextIndexSearcher.

Closes #120627

Lucene introduced the `Bits#applyMask` API to speed up the evaluation of non-scoring queries (e.g. aggregations-only queries) and is considering making `BitSet` a sealed class that only allows `FixedBitSet` and `SparseFixedBitSet` as sub-classes to control the performance impact of virtual calls. As a consequence, this change: - renames `CombinedBitSet` to `CombinedBits`, - makes it implement `Bits` instead of extending `BitSet`, - implements `CombinedBits#applyMask`, - refactors how the query and live docs are intersected in `ContextIndexSearcher`.

elasticsearchmachine · 2025-09-02T09:47:25Z

Pinging @elastic/es-security (Team:Security)

tvernum · 2025-09-12T10:10:27Z

Ping @joegallo, FYI.

tvernum

LGTM (I think I understood it all)

tvernum · 2025-09-12T09:58:17Z

server/src/main/java/org/elasticsearch/common/lucene/search/BitsIterator.java

+            bitSet.clear(to - from, bitSet.length());
+        } else {
+            to = from + WINDOW_SIZE;
+        }


Is there a reason we use a mix of WINDOW_SIZE and bitSet.length() ?

They should be the same, and it looks like the code relies on them being the same, but perhaps you had a deeper for reason mixing them.

No reason! Let me use WINDOW_SIZE everywhere.

joegallo

LGTM

joegallo · 2025-09-24T17:44:02Z

I updated my previous benchmark for testing DLS to account for these changes. Specifically, #120627 mentions Lucene 10.2 is introducing a new Bits#applyMask API that is important to have good query evaluation performance in the presence of deletes, so I added a step in my benchmark that deletes 5% of the docs at random from the eight metricbeat indices (each index has approximately 1 million docs, and I delete 50,000 from each of them).

Looking at the flamegraphs, the speed up is pretty clear:

The time spent in any .*Bits.applyMask method drops from 5.12% of all profiled cpu time to 1.26%. Everything else stays more or less the same outside of that, though.

In terms of the overall speed up, here's what means for the high level operation of running a search via a DLS role that hits an index for which there are deleted documents:

MAIN:
=====

(cache constrained to 48mb, lots of cache churn)
|                                                Mean Throughput |              dls-search |  1121.89        |  ops/s |
|                                              Median Throughput |              dls-search |  1123.23        |  ops/s |
|                                        50th percentile latency |              dls-search |    98.781       |     ms |
|                                        90th percentile latency |              dls-search |   170.912       |     ms |
|                                        99th percentile latency |              dls-search |   263.398       |     ms |

(unconstrained cache, lots of cache hits)
|                                                Mean Throughput |              dls-search |  2038.01        |  ops/s |
|                                              Median Throughput |              dls-search |  2042.57        |  ops/s |
|                                        50th percentile latency |              dls-search |    55.4865      |     ms |
|                                        90th percentile latency |              dls-search |    87.5875      |     ms |
|                                        99th percentile latency |              dls-search |   132.864       |     ms |

THIS PR:
========

(cache constrained to 48mb, lots of cache churn)
|                                                Mean Throughput |              dls-search |  1152.41        |  ops/s |
|                                              Median Throughput |              dls-search |  1152.34        |  ops/s |
|                                        50th percentile latency |              dls-search |    95.733       |     ms |
|                                        90th percentile latency |              dls-search |   167.011       |     ms |
|                                        99th percentile latency |              dls-search |   260.01        |     ms |

(unconstrained cache, lots of cache hits)
|                                                Mean Throughput |              dls-search |  2180.96        |  ops/s |
|                                              Median Throughput |              dls-search |  2190.19        |  ops/s |
|                                        50th percentile latency |              dls-search |    51.5064      |     ms |
|                                        90th percentile latency |              dls-search |    81.5829      |     ms |
|                                        99th percentile latency |              dls-search |   125.111       |     ms |

If you compare the main versus this-pr case, you'll notice that there's a modest but real improvement (a few percent, in this benchmark) in both throughput and latency for both the cache-constrained and non cache-constrained cases.

elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label labels Sep 2, 2025

jpountz added the :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC label Sep 2, 2025

elasticsearchmachine added the Team:Security Meta label for security team label Sep 2, 2025

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Sep 2, 2025

jpountz added the >non-issue label Sep 2, 2025

jpountz added 2 commits September 9, 2025 09:10

Merge branch 'main' into modernize_dls

30307a0

Merge branch 'main' into modernize_dls

99e9e96

tvernum approved these changes Sep 12, 2025

View reviewed changes

joegallo and others added 2 commits September 12, 2025 07:02

Merge branch 'main' into modernize_dls

f0e7083

Consistently use WINDOW_SIZE rather than bitSet.length().

02237f3

joegallo approved these changes Sep 12, 2025

View reviewed changes

Merge branch 'main' into modernize_dls

23198a1

joegallo self-assigned this Sep 23, 2025

joegallo added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Sep 23, 2025

This comment was marked as resolved.

Sign in to view

elasticsearchmachine and others added 2 commits September 23, 2025 16:27

[CI] Update transport version definitions

c5b561e

Merge branch 'main' into modernize_dls

184d49e

This comment was marked as resolved.

Sign in to view

Merge branch 'main' into modernize_dls

770a0df

This comment was marked as resolved.

Sign in to view

joegallo merged commit 8e46154 into elastic:main Sep 24, 2025
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Catch up DLS with recent Lucene changes. #133966

Catch up DLS with recent Lucene changes. #133966

Uh oh!

jpountz commented Sep 2, 2025

Uh oh!

elasticsearchmachine commented Sep 2, 2025

Uh oh!

tvernum commented Sep 12, 2025

Uh oh!

tvernum left a comment

Uh oh!

tvernum Sep 12, 2025 •

edited

Loading

Uh oh!

jpountz Sep 12, 2025

Uh oh!

joegallo left a comment

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

joegallo commented Sep 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Catch up DLS with recent Lucene changes. #133966

Catch up DLS with recent Lucene changes. #133966

Uh oh!

Conversation

jpountz commented Sep 2, 2025

Uh oh!

elasticsearchmachine commented Sep 2, 2025

Uh oh!

tvernum commented Sep 12, 2025

Uh oh!

tvernum left a comment

Choose a reason for hiding this comment

Uh oh!

tvernum Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

joegallo left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

joegallo commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tvernum Sep 12, 2025 •

edited

Loading

joegallo commented Sep 24, 2025 •

edited

Loading