-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Why?
Some (maybe all) distributed applications require some form of runtime coordination. Just starting the application is not enough, which is what generic_jctrl does.
Scenarios where coordination might be required
- Fault Tolerance:- An active coordinator(AC) can
monitorfailures, and in response take someactionsto heal the cluster. - Runtime Reconfiguration:- An AC is required to do this correctly.
- Implementation of distributed protocols.
What?
AC cannot be just another actor. Why?
An Actor doesn't have a view of the entire cluster.
AC will require few interfaces that the actor cannot have (without sacrificing the simplicity of the actor). Few that come to mind are:-
- Asking the node to start/stop an actor.
- Asking the node current status of the actor.
- Having the state of the entire cluster of the job that it is managing.
How?
Implementing AC requires careful considerations. Few designs that are at-least worth discussing
Design 1
AC is just another actor with extra privileges.
From a normal actor's POV it is receiving/sending messages from just another actor.
Modification Required for this Design
Actor: NoneNode: NoneSpecialActor(Framework's Responsibilities):
3.1. New special actor will have to be written.
3.2. Logic to sendstart/stopactor can be provided by the framework itself. User can simply callstart(ActorAddr)
3.3. State of the ClusterClusterState<GS, NS, AS>
struct ClusterState<GS, NS>{
global_state: GS,
node_state: Map<NodeId, NS>,
actor_state: Map<ActorId, AS>
}
3.4. Handle messages coming from other Actors. Framework will have to spawn a receiver server for the other actors to send message to AC.
3.5. Manage timers which when expire trigger some user provided logic.
SpecialActor (User's Responsibility)
4.1 Decide whatGS,NS,ASwould be.
4.2 Provide handler that will be called when the AC receives a message from other actors.
4.3 Provide handler that will be called when the various timer in the AC expires.
Design 2
AC is a new kind of entity. Actor and nodes are now aware of this new type of entity
Design 3
Modify Actor such that they can now spawn other actors.
Modification Required for this Design
- Actors
- Actors can now be related. An actor that spawns another actor becomes parent of that actor.
- Parent of the actor should be notified of the children's failure.