Skip to content

Sometimes high memory usage of Lucene103BlockTreeTermsWriter during merging #137359

@martijnvg

Description

@martijnvg

We encountered an OOME while merging a large inverted index.

Image

In this case the _id field gets merged, each value has ~15 bytes and the new segment being written into has ~250M docs. The amount of jvm heap is limited here, around ~2GB and majority is spent on pending array list in Lucene103BlockTreeTermsWriter.TermsWriter as the screenshot indicates.

Since 9.1.0, Elasticsearch started to use the default Lucene postings format: #128509
Before this Elasticsearch had its own fork of an early version of Lucene's postings format for space efficiency reasons. The older codec still exists and is used for when index.mode is set to logsdb or time_series. However in this case the index.mode was standard.

This looks like a regression, because an OOMEs like this haven't been observed before.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions