-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
We had a scenario where a bad (invalid JSON) cluster pipeline configuration was deployed. When the cluster restarted it seemed that the Akka cluster was unhealthy.
{"time":"2018-07-20T02:58:34.790Z","name":"log","level":"warn","data":{"message":"Cluster Node [akka.tcp://Metrics@iad4f-re22-2a:2551] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://Metrics@iad4d-rd41-38a:2551, status = Up)]. Node roles [dc-default]"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-28","logger":"a.c.ClusterCoreDaemon"},"id":"ad8b61f7-4a18-479f-bf98-8ea3289b1f52","version":"0"}
Eventually each node would output:
{"time":"2018-07-20T02:58:20.809Z","name":"log","level":"warn","data":{"message":"Association with remote system [akka.tcp://Metrics@iad4c-rf14-36a:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated] "},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-33","logger":"a.r.ReliableDeliverySupervisor"},"id":"7b7baeff-df15-4846-90ba-891a162b3e51","version":"0"}
You would also see some of these:
{"time":"2018-07-20T02:58:27.961Z","name":"log","level":"warn","data":{"message":"heartbeat interval is growing too large: 2001 millis"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-67","logger":"a.r.PhiAccrualFailureDetector"},"id":"8c7597b8-3dda-432d-a26c-6eed5a26df1b","version":"0"
And then there was the scary:
{"time":"2018-07-20T02:58:58.790Z","name":"log","level":"info","data":{"message":"Cluster Node [akka.tcp://Metrics@iad4f-re22-2a:2551] - Leader can currently not perform its duties, reachability status: [akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Reachable [Unreachable] (26), akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4d-rd41-38a:2551: Unreachable [Unreachable] (12), akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4f-re22-2a:2551: Unreachable [Unreachable] (19), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Reachable [Unreachable] (32), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4c-rf14-36a:2551: Unreachable [Unreachable] (31), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4f-re22-2a:2551: Unreachable [Unreachable] (30), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Unreachable [Unreachable] (27), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4c-rf14-36a:2551: Unreachable [Unreachable] (26), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4d-rd41-38a:2551: Unreachable [Unreachable] (25)], member status: [akka.tcp://Metrics@iad4a-rl2-17c:2551 Up seen=false, akka.tcp://Metrics@iad4c-rf14-36a:2551 Up seen=false, akka.tcp://Metrics@iad4d-rd41-38a:2551 Up seen=false, akka.tcp://Metrics@iad4f-re22-2a:2551 Up seen=true]"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-48","logger":"a.c.Cluster(akka://Metrics)"},"id":"220dc047-2a81-4507-802d-203cd7902b27","version":"0"}
So it seems that Akka cluster formation is dependent on a successful loading of the cluster pipeline. However, intuitively it feels like this should not be the case; or at the very least if this dependency exists and must exist then the cluster formation should not even be attempted if the cluster pipeline configuration cannot be loaded.
Metadata
Metadata
Assignees
Labels
No labels