Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Sep 2, 2025

Lucene introduced the Bits#applyMask API to speed up the evaluation of non-scoring queries (e.g. aggregations-only queries) and is considering making BitSet a sealed class that only allows FixedBitSet and SparseFixedBitSet as sub-classes to control the performance impact of virtual calls.

As a consequence, this change:

  • renames CombinedBitSet to CombinedBits,
  • makes it implement Bits instead of extending BitSet,
  • implements CombinedBits#applyMask,
  • refactors how the query and live docs are intersected in ContextIndexSearcher.

Closes #120627

Lucene introduced the `Bits#applyMask` API to speed up the evaluation of
non-scoring queries (e.g. aggregations-only queries) and is considering making
`BitSet` a sealed class that only allows `FixedBitSet` and `SparseFixedBitSet`
as sub-classes to control the performance impact of virtual calls.

As a consequence, this change:
 - renames `CombinedBitSet` to `CombinedBits`,
 - makes it implement `Bits` instead of extending `BitSet`,
 - implements `CombinedBits#applyMask`,
 - refactors how the query and live docs are intersected in
   `ContextIndexSearcher`.
@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label labels Sep 2, 2025
@jpountz jpountz added the :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC label Sep 2, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Sep 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Sep 2, 2025
@tvernum
Copy link
Contributor

tvernum commented Sep 12, 2025

Ping @joegallo, FYI.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I think I understood it all)

bitSet.clear(to - from, bitSet.length());
} else {
to = from + WINDOW_SIZE;
}
Copy link
Contributor

@tvernum tvernum Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we use a mix of WINDOW_SIZE and bitSet.length() ?

They should be the same, and it looks like the code relies on them being the same, but perhaps you had a deeper for reason mixing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason! Let me use WINDOW_SIZE everywhere.

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joegallo joegallo self-assigned this Sep 23, 2025
@joegallo joegallo added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Sep 23, 2025
@joegallo

This comment was marked as resolved.

@joegallo

This comment was marked as resolved.

@joegallo

This comment was marked as resolved.

@joegallo

This comment was marked as resolved.

@joegallo
Copy link
Contributor

joegallo commented Sep 24, 2025

I updated my previous benchmark for testing DLS to account for these changes. Specifically, #120627 mentions Lucene 10.2 is introducing a new Bits#applyMask API that is important to have good query evaluation performance in the presence of deletes, so I added a step in my benchmark that deletes 5% of the docs at random from the eight metricbeat indices (each index has approximately 1 million docs, and I delete 50,000 from each of them).

Looking at the flamegraphs, the speed up is pretty clear:

Screenshot 2025-09-24 at 1 31 01 PM Screenshot 2025-09-24 at 1 31 10 PM

The time spent in any .*Bits.applyMask method drops from 5.12% of all profiled cpu time to 1.26%. Everything else stays more or less the same outside of that, though.

In terms of the overall speed up, here's what means for the high level operation of running a search via a DLS role that hits an index for which there are deleted documents:

MAIN:
=====

(cache constrained to 48mb, lots of cache churn)
|                                                Mean Throughput |              dls-search |  1121.89        |  ops/s |
|                                              Median Throughput |              dls-search |  1123.23        |  ops/s |
|                                        50th percentile latency |              dls-search |    98.781       |     ms |
|                                        90th percentile latency |              dls-search |   170.912       |     ms |
|                                        99th percentile latency |              dls-search |   263.398       |     ms |

(unconstrained cache, lots of cache hits)
|                                                Mean Throughput |              dls-search |  2038.01        |  ops/s |
|                                              Median Throughput |              dls-search |  2042.57        |  ops/s |
|                                        50th percentile latency |              dls-search |    55.4865      |     ms |
|                                        90th percentile latency |              dls-search |    87.5875      |     ms |
|                                        99th percentile latency |              dls-search |   132.864       |     ms |

THIS PR:
========

(cache constrained to 48mb, lots of cache churn)
|                                                Mean Throughput |              dls-search |  1152.41        |  ops/s |
|                                              Median Throughput |              dls-search |  1152.34        |  ops/s |
|                                        50th percentile latency |              dls-search |    95.733       |     ms |
|                                        90th percentile latency |              dls-search |   167.011       |     ms |
|                                        99th percentile latency |              dls-search |   260.01        |     ms |

(unconstrained cache, lots of cache hits)
|                                                Mean Throughput |              dls-search |  2180.96        |  ops/s |
|                                              Median Throughput |              dls-search |  2190.19        |  ops/s |
|                                        50th percentile latency |              dls-search |    51.5064      |     ms |
|                                        90th percentile latency |              dls-search |    81.5829      |     ms |
|                                        99th percentile latency |              dls-search |   125.111       |     ms |

If you compare the main versus this-pr case, you'll notice that there's a modest but real improvement (a few percent, in this benchmark) in both throughput and latency for both the cache-constrained and non cache-constrained cases.

@joegallo joegallo merged commit 8e46154 into elastic:main Sep 24, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Bits#applyMask on Bits implementations that are used as live docs.

4 participants