-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Today if a node-to-node connection drops we log this message:
elasticsearch/server/src/main/java/org/elasticsearch/transport/ClusterConnectionManager.java
Lines 242 to 248 in a59c182
| logger.info( | |
| """ | |
| transport connection to [{}] closed by remote; \ | |
| if unexpected, see [{}] for troubleshooting guidance""", | |
| node.descriptionWithoutAttributes(), | |
| ReferenceDocs.NETWORK_DISCONNECT_TROUBLESHOOTING | |
| ); |
The "if unexpected" bit is tricksy, it's actually pretty hard to tell from the logs whether a disconnect was expected (e.g. the node shut down) or not (e.g. network disruption). Yet we should be able to work out ourselves whether a disconnect was unexpected, and log a message that unambiguously indicates that we saw an unexpected disconnect.
In particular, if the org.elasticsearch.cluster.NodeConnectionsService finds it is disconnected from a peer and then successfully reconnects to that same peer again (its DiscoveryNode#ephemeralId did not change) then that's definitely not due to the node shutting down. We should be emitting a WARN log in this case. Moreover, it'd be incredibly useful to capture the exception (if any) that TcpTransport reported as causing the disconnect so we can repeat it in such a log message.