Skip to content

Latest commit

 

History

History
250 lines (205 loc) · 10.9 KB

File metadata and controls

250 lines (205 loc) · 10.9 KB

Cluster Membership Gossip Flow

This document explains how cluster membership events are detected and how membership state is propagated through gossip.

For a runnable demonstration, see examples/ClusterGossip.

Architecture

The gossip subsystem is separated from the rest of the cluster code and communicates through a small set of types. The main participants are shown below:

flowchart LR
    Provider["Cluster Provider"] --> MemberList
    MemberList -->|topology updates| Gossiper
    Gossiper --> GossipActor
    GossipActor -->|build deltas| DeltaBuilder
    DeltaBuilder --> GossipSender
    GossipSender -->|IGossipTransport| RemoteGossipActor["Remote GossipActor"]
Loading
  • Gossiper – orchestrates the gossip loop and owns a Gossip instance that holds the replicated state.
  • GossipActor – receives commands from the Gossiper and merges incoming state from other members.
  • GossipSender – sends built deltas to remote nodes using an IGossipTransport implementation.
  • MemberStateDeltaBuilder – determines which portions of the gossip state should be sent to a specific target.
  • IGossipTransport – abstraction over the mechanics of sending a GossipRequest; the default GossipTransport simply forwards the request through IContext.RequestReenter.

Component relationships

classDiagram
    class Gossiper
    class GossipActor
    class Gossip
    class GossipState
    class MemberStateDeltaBuilder
    class GossipSender
    class GossipRequest
    class GossipResponse
    class IGossipTransport
    class GossipTransport
    Gossiper o-- Gossip
    Gossiper --> GossipActor : commands
    GossipActor --> Gossip : merges
    GossipActor --> MemberStateDeltaBuilder : builds deltas
    GossipActor --> GossipSender : uses
    GossipSender ..> IGossipTransport
    IGossipTransport <|-- GossipTransport
    GossipSender ..> GossipRequest : sends
    GossipActor --> GossipRequest : handles
    GossipActor --> GossipResponse : replies
Loading

Gossip maintains a GossipState and exchanges GossipRequest and GossipResponse messages via IGossipTransport to synchronize that state across nodes.

Gossip message flow

Sending a delta

sequenceDiagram
    participant Gossiper
    participant GossipActor
    participant DeltaBuilder
    participant GossipSender
    participant Transport
    participant RemoteActor

    Gossiper->>GossipActor: SendGossipStateRequest
    GossipActor->>DeltaBuilder: build per-target delta
    DeltaBuilder-->>GossipActor: MemberStateDelta
    GossipActor->>GossipSender: target, delta, request
    GossipSender->>Transport: Request(pid, GossipRequest)
    Transport->>RemoteActor: GossipRequest
    RemoteActor-->>Transport: GossipResponse
    Transport-->>GossipSender: callback
    GossipSender-->>DeltaBuilder: CommitOffsets
Loading

Gossiper periodically instructs the GossipActor to send state by issuing a SendGossipStateRequest【F:src/Proto.Cluster/Gossip/Gossiper.cs†L452-L466】. The actor builds a GossipRequest for each target and uses GossipSender to transmit it through the configured IGossipTransport【F:src/Proto.Cluster/Gossip/GossipActor.cs†L199-L218】【F:src/Proto.Cluster/Gossip/GossipSender.cs†L21-L57】.

Receiving a request

sequenceDiagram
    participant Transport
    participant GossipActor
    participant Gossip
    participant EventStream

    Transport->>GossipActor: GossipRequest
    GossipActor->>Gossip: ReceiveState
    Gossip-->>GossipActor: updates
    GossipActor->>EventStream: publish updates
    GossipActor-->>Transport: GossipResponse
Loading

Incoming GossipRequest messages are handled by the GossipActor, which merges the state and publishes any resulting updates to the event stream before replying with GossipResponse【F:src/Proto.Cluster/Gossip/GossipActor.cs†L124-L161】.

Detecting members

Cluster providers (e.g., Kubernetes) watch the environment for running nodes and invoke MemberList.UpdateClusterTopology with the current set of members (KubernetesClusterMonitor.cs). UpdateClusterTopology filters out blocked members, computes which members joined or left, and constructs a ClusterTopology that includes Joined, Left, and Blocked lists (MemberList.cs). The resulting topology is published to the node's local event stream via BroadcastTopologyChanges (MemberList.cs).

Propagating membership to gossip

The Gossiper subscribes to ClusterTopology events. Each update is cloned with its Joined and Left lists cleared before being forwarded to the GossipActor for inclusion in the gossip state (Gossiper.cs). The gossip implementation stores the full membership under the cluster:topology key and tracks active member IDs for later consensus checks (Gossip.cs).

Gossip state structure

classDiagram
    class GossipState {
        +members : map<string, GossipMemberState>
    }
    class GossipMemberState {
        +values : map<string, GossipKeyValue>
    }
    class GossipKeyValue {
        +sequence_number : long
        +value : Any
        +local_timestamp_unix_milliseconds : long
    }
    GossipState --> GossipMemberState : members
    GossipMemberState --> GossipKeyValue : values
Loading

Each cluster node keeps a GossipState containing a map of member IDs to GossipMemberState entries. A GossipMemberState holds a map of keys such as cluster:topology or cluster:heartbeat to GossipKeyValue records. Every GossipKeyValue carries a monotonically increasing sequence_number and a payload packed as google.protobuf.Any【F:src/Proto.Cluster/GossipContracts.proto†L21-L28】【F:src/Proto.Cluster/GossipContracts.proto†L40-L43】.

When a node updates one of its keys, the sequence number is incremented before the value is stored, ensuring later updates supersede earlier ones【F:src/Proto.Cluster/Gossip/GossipStateManagement.cs†L120-L132】. To avoid resending data, MemberStateDeltaBuilder tracks a per-target high-water mark ("watermark") for each {target}.{member} pair. Only values with sequence numbers above this watermark are included in a delta, and the watermark is advanced to the highest sent sequence number【F:src/Proto.Cluster/Gossip/MemberStateDeltaBuilder.cs†L51-L74】. Gossip stores these watermarks as committed offsets and updates them when a peer acknowledges receipt, guaranteeing that each node receives monotonically ordered state without duplicates【F:src/Proto.Cluster/Gossip/Gossip.cs†L69-L71】【F:src/Proto.Cluster/Gossip/Gossip.cs†L300-L307】.

Gossip dissemination

The gossip loop periodically updates heartbeat information and sends the current state to randomly chosen peers (Gossiper.cs, Gossip.cs). Peers merge received updates, which allows membership changes to spread throughout the cluster.

Gossip transport

GossipSender performs the send using an IGossipTransport implementation. The default GossipTransport issues a GossipRequest via IContext.RequestReenter, invoking a continuation when a GossipResponse arrives (IGossipTransport.cs, GossipSender.cs). Custom transports can wrap this interface to add metrics, retries, or other behaviour without modifying cluster code.

Delta-based member state propagation

MemberStateDeltaBuilder constructs per-target deltas by tracking a watermark for each {target}.{member} pair. The watermark represents the highest sequence number previously sent to that target for a given member. During a build, values with a higher sequence number are included in the delta and the watermark is advanced:

var watermarkKey = $"{targetMemberId}.{memberId}";
committedOffsets.TryGetValue(watermarkKey, out var watermark);
...
if (value.SequenceNumber <= watermark) continue;
if (value.SequenceNumber > newWatermark) newWatermark = value.SequenceNumber;

The builder stops once it has added updates for _gossipMaxSend members, ensuring that large clusters do not overwhelm the network. Members exceeding this limit are retried in later cycles:

count++;
if (count >= _gossipMaxSend) break;

If no sequence numbers exceed the watermark, the member is omitted from the delta and its watermark remains unchanged. This prevents redundant transmissions but means updates may be delayed when the gossipMaxSend limit is hit.

flowchart LR
    CurrentState --> Builder
    Builder -->|per member| Watermark{seq > watermark?}
    Watermark -- yes --> Delta[append to delta]
    Watermark -- no --> Skip[skip]
    Delta --> Commit[update offset]
Loading

The builder returns a MemberStateDelta record containing the new state and a callback to commit offsets once the remote peer acknowledges receipt (MemberStateDeltaBuilder.cs, MemberStateDelta.cs).

Isolation from the cluster

Gossip logic can now run without direct dependencies on cluster internals. Gossiper wires the pieces together through GossiperOptions, supplying the member list, block list, event stream and timing settings. This separation allows testing and alternative transports by substituting implementations such as IGossipTransport (Gossiper.cs).

Member states

  • Joined / Left – Calculated by MemberList.UpdateClusterTopology and included in the published topology (MemberList.cs).
  • Gracefully left – When a node shuts down gracefully it sets the cluster:left gossip key, waits two gossip intervals, and deregisters from the provider (Cluster.cs). Other nodes read this key and block those members (Gossiper.cs).
  • Blocked – Members are blocked when they leave or when their heartbeat expires. Heartbeat data is stored under cluster:heartbeat, and expired entries cause nodes to be added to the block list (Gossiper.cs).

The combination of provider detection, event publication, and gossip propagation ensures that membership changes reach all cluster nodes without cluster-wide broadcasts.

TestKit extensions

Proto.TestKit provides GossipNetworkPartition, a sender middleware that drops GossipRequest messages to or from isolated addresses. Tests use it to simulate partitions and verify that gossip resumes once connectivity is restored (GossipNetworkPartition.cs).