Skip to content

Cluster Aggregator Pipeline Blocks Akka Cluster Formation #89

@vjkoskela

Description

@vjkoskela

We had a scenario where a bad (invalid JSON) cluster pipeline configuration was deployed. When the cluster restarted it seemed that the Akka cluster was unhealthy.

{"time":"2018-07-20T02:58:34.790Z","name":"log","level":"warn","data":{"message":"Cluster Node [akka.tcp://Metrics@iad4f-re22-2a:2551] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://Metrics@iad4d-rd41-38a:2551, status = Up)]. Node roles [dc-default]"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-28","logger":"a.c.ClusterCoreDaemon"},"id":"ad8b61f7-4a18-479f-bf98-8ea3289b1f52","version":"0"}

Eventually each node would output:

{"time":"2018-07-20T02:58:20.809Z","name":"log","level":"warn","data":{"message":"Association with remote system [akka.tcp://Metrics@iad4c-rf14-36a:2551] has failed, address is now gated for [5000] ms. Reason: [Disassociated] "},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-33","logger":"a.r.ReliableDeliverySupervisor"},"id":"7b7baeff-df15-4846-90ba-891a162b3e51","version":"0"}

You would also see some of these:

{"time":"2018-07-20T02:58:27.961Z","name":"log","level":"warn","data":{"message":"heartbeat interval is growing too large: 2001 millis"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-67","logger":"a.r.PhiAccrualFailureDetector"},"id":"8c7597b8-3dda-432d-a26c-6eed5a26df1b","version":"0"

And then there was the scary:

{"time":"2018-07-20T02:58:58.790Z","name":"log","level":"info","data":{"message":"Cluster Node [akka.tcp://Metrics@iad4f-re22-2a:2551] - Leader can currently not perform its duties, reachability status: [akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Reachable [Unreachable] (26), akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4d-rd41-38a:2551: Unreachable [Unreachable] (12), akka.tcp://Metrics@iad4c-rf14-36a:2551 -> akka.tcp://Metrics@iad4f-re22-2a:2551: Unreachable [Unreachable] (19), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Reachable [Unreachable] (32), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4c-rf14-36a:2551: Unreachable [Unreachable] (31), akka.tcp://Metrics@iad4d-rd41-38a:2551 -> akka.tcp://Metrics@iad4f-re22-2a:2551: Unreachable [Unreachable] (30), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4a-rl2-17c:2551: Unreachable [Unreachable] (27), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4c-rf14-36a:2551: Unreachable [Unreachable] (26), akka.tcp://Metrics@iad4f-re22-2a:2551 -> akka.tcp://Metrics@iad4d-rd41-38a:2551: Unreachable [Unreachable] (25)], member status: [akka.tcp://Metrics@iad4a-rl2-17c:2551 Up seen=false, akka.tcp://Metrics@iad4c-rf14-36a:2551 Up seen=false, akka.tcp://Metrics@iad4d-rd41-38a:2551 Up seen=false, akka.tcp://Metrics@iad4f-re22-2a:2551 Up seen=true]"},"context":{"host":"iad4f-re22-2a.sjc.dropbox.com","processId":"7","threadId":"Metrics-akka.actor.default-dispatcher-48","logger":"a.c.Cluster(akka://Metrics)"},"id":"220dc047-2a81-4507-802d-203cd7902b27","version":"0"}

So it seems that Akka cluster formation is dependent on a successful loading of the cluster pipeline. However, intuitively it feels like this should not be the case; or at the very least if this dependency exists and must exist then the cluster formation should not even be attempted if the cluster pipeline configuration cannot be loaded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions