Sometimes high memory usage of Lucene103BlockTreeTermsWriter during merging

We encountered an OOME while merging a large inverted index.

<img width="1224" height="653" alt="Image" src="https://github.com/user-attachments/assets/ee37edab-fcb2-4630-bb7e-b04f3e3c6e16" />

In this case the `_id` field gets merged, each value has ~15 bytes and the new segment being written into has ~250M docs. The amount of jvm heap is limited here, around ~2GB and majority is spent on `pending` array list in `Lucene103BlockTreeTermsWriter.TermsWriter` as the screenshot indicates.

Since 9.1.0, Elasticsearch started to use the default Lucene postings format: #128509
Before this Elasticsearch had its own fork of an early version of Lucene's postings format for space efficiency reasons. The older codec still exists and is used for when `index.mode` is set to `logsdb` or `time_series`. However in this case the `index.mode` was `standard`.

This looks like a regression, because an OOMEs like this haven't been observed before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sometimes high memory usage of Lucene103BlockTreeTermsWriter during merging #137359

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sometimes high memory usage of Lucene103BlockTreeTermsWriter during merging #137359

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions