Skip to content

Design and Implement active coordinator. #37

@satyamjay-iitd

Description

@satyamjay-iitd

Why?

Some (maybe all) distributed applications require some form of runtime coordination. Just starting the application is not enough, which is what generic_jctrl does.

Scenarios where coordination might be required

  1. Fault Tolerance:- An active coordinator(AC) can monitor failures, and in response take some actions to heal the cluster.
  2. Runtime Reconfiguration:- An AC is required to do this correctly.
  3. Implementation of distributed protocols.

What?

AC cannot be just another actor. Why?
An Actor doesn't have a view of the entire cluster.
AC will require few interfaces that the actor cannot have (without sacrificing the simplicity of the actor). Few that come to mind are:-

  1. Asking the node to start/stop an actor.
  2. Asking the node current status of the actor.
  3. Having the state of the entire cluster of the job that it is managing.

How?

Implementing AC requires careful considerations. Few designs that are at-least worth discussing

Design 1

AC is just another actor with extra privileges.
From a normal actor's POV it is receiving/sending messages from just another actor.

Modification Required for this Design

  1. Actor: None
  2. Node: None
  3. SpecialActor(Framework's Responsibilities):
    3.1. New special actor will have to be written.
    3.2. Logic to send start/stop actor can be provided by the framework itself. User can simply call start(ActorAddr)
    3.3. State of the Cluster ClusterState<GS, NS, AS>
struct ClusterState<GS, NS>{
      global_state: GS,
      node_state: Map<NodeId, NS>,
      actor_state: Map<ActorId, AS>
}
    3.4. Handle messages coming from other Actors. Framework will have to spawn a receiver server for the other actors to send message to AC.
    3.5. Manage timers which when expire trigger some user provided logic.
  1. SpecialActor (User's Responsibility)
    4.1 Decide what GS, NS, AS would be.
    4.2 Provide handler that will be called when the AC receives a message from other actors.
    4.3 Provide handler that will be called when the various timer in the AC expires.

Design 2

AC is a new kind of entity. Actor and nodes are now aware of this new type of entity

Design 3

Modify Actor such that they can now spawn other actors.

Modification Required for this Design

  1. Actors
    1. Actors can now be related. An actor that spawns another actor becomes parent of that actor.
    2. Parent of the actor should be notified of the children's failure.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions