Skip to content

Search responses with large size can cause OOMs #110962

@carlosdelest

Description

@carlosdelest

Elasticsearch Version

7.17.x - 8.x

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

Search responses that include a high number of hits and/or hits with considerable size can OOM a node, with the following stacktrace:

elasticsearch[node_name][transport_worker][T#XXXX]
  at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(I)Lorg/elasticsearch/common/bytes/BytesReference; (StreamInput.java:161)
  at org.elasticsearch.common.io.stream.StreamInput.readBytesReference()Lorg/elasticsearch/common/bytes/BytesReference; (StreamInput.java:127)
  at org.elasticsearch.search.SearchHit.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (SearchHit.java:150)
  at org.elasticsearch.search.SearchHits.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (SearchHits.java:90)
  at org.elasticsearch.search.fetch.FetchSearchResult.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (FetchSearchResult.java:42)
  at org.elasticsearch.search.fetch.QueryFetchSearchResult.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (QueryFetchSearchResult.java:28)
  at org.elasticsearch.action.search.SearchTransportService$$Lambda$6076+0x0000000801b1cc88.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; ()
  at org.elasticsearch.action.ActionListenerResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Lorg/elasticsearch/transport/TransportResponse; (ActionListenerResponseHandler.java:58)
  at org.elasticsearch.action.ActionListenerResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; (ActionListenerResponseHandler.java:25)
  at org.elasticsearch.transport.TransportService$4.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Lorg/elasticsearch/transport/TransportResponse; (TransportService.java:863)
  at org.elasticsearch.transport.TransportService$4.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; (TransportService.java:843)
  at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Lorg/elasticsearch/transport/TransportResponse; (TransportService.java:1462)
  at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; (TransportService.java:1449)
  at org.elasticsearch.transport.InboundHandler.handleResponse(Ljava/net/InetSocketAddress;Lorg/elasticsearch/common/io/stream/StreamInput;Lorg/elasticsearch/transport/TransportResponseHandler;)V (InboundHandler.java:311)
  at org.elasticsearch.transport.InboundHandler.messageReceived(Lorg/elasticsearch/transport/TcpChannel;Lorg/elasticsearch/transport/InboundMessage;J)V (InboundHandler.java:134)

Large search responses should not OOM a node, but be cancelled.

Steps to Reproduce

This was observed in production and we don't have a reproducible script.

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions