Skip to content

Elasticsearch 8.16.x Large Increase in MMAP CountsΒ #119652

@Evesy

Description

@Evesy

Elasticsearch Version

8.16.1

Installed Plugins

No response

Java Version

bundled && Java 17

OS Version

Linux elasticsearch-data-hot-1 6.1.112+ #1 SMP PREEMPT_DYNAMIC Sat Oct 19 17:09:54 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

Elasticsearch 8.16.x onwards is requiring significantly more memory regions than prior versions.

https://discuss.elastic.co/t/heap-allocation-failures-on-8-17/372211/8
https://discuss.elastic.co/t/oom-since-8-16-1-with-openjdk23

In our experience (first link), we started observing semi-frequent heap allocation failures across all our hot nodes after upgrading from 8.15.x to 8.17.x. All our hot nodes would restart due to these errors within a couple of hours of each other, and then the same would happen again between 12 - 24 hours later.

After some digging we discovered that the max mmap count we had configured, based on the recommendations was being reached, resulting in these heap allocation failures.

We doubled the value to then observe if/where Elasticsearch would eventually top out at, which in our case was in the early 400k mark, and have yet to observe any failures since. The number of memory regions is not something we were previously collecting, however at the most conservative estimate if it was previously right below the limit prior to upgrading, the new numbers we were seeing after upgrading would be a roughly 60% increase in the amount of mmap regions being used, which does not feel like intended behaviour (or should at least be documented if so)

The second link provided above is another user with the same issue, after upgrading to 8.16.x (which indicates the change likes somewhere in the 8.16 series)

In our case we went from 8.15.1 to 8.17.0, without any JVM changes (using our own provided Java 21). In the other example it was upgrading from 8.15.1 to 8.16.1 including a change to the JVM version (preumably bundled JVM)

Steps to Reproduce

I've not been able to find a specific behaviour that may cause the increase between versions, and is difficult to reproduce in small clusters due to the relatively low activity of both indexing and search

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    :PerformanceAll issues related to Elasticsearch performance including regressions and investigations>bugTeam:PerformanceMeta label for performance team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions