Skip to content

Commit 032e3cb

Browse files
dvkashapovhpatro
authored andcommitted
Cluster: Avoid usage of light weight messages to nodes with not ready bidirectional links (#2817)
After network failure nodes that come back to cluster do not always send and/or receive messages from other nodes in shard, this fix avoids usage of light weight messages to nodes with not ready bidirectional links. When a light message comes before any normal message, freeing of cluster link is happening because on the just established connection link->node is not assigned yet. It is assigned in getNodeFromLinkAndMsg right after the condition if (is_light). So on a cluster with heavy pubsub load a long loop of disconnects is possible, and we got this. 1. node A establishes cluster link to node B 2. node A propagates PUBLISH to node B 3. node B frees cluster link because of link->node == null as it has not received non-light messages yet 4. go to 1. During this loop subscribers of node B does not receive any messages published to node A. So here we want to make sure that PING was sent (and link->node was initialized) on this connection before using lightweight messages. --------- Signed-off-by: Daniil Kashapov <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]>
1 parent bf03b0c commit 032e3cb

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

src/cluster_legacy.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4828,6 +4828,18 @@ void clusterSendUpdate(clusterLink *link, clusterNode *node) {
48284828
clusterMsgSendBlockDecrRefCount(msgblock);
48294829
}
48304830

4831+
/* Inline functions that check support of light weight messages by node
4832+
* and avoid using light weight messages until the bidirectional
4833+
* link(s) have been established. */
4834+
static inline bool nodeSupportsLightMsgHdrForPubSub(clusterNode *n) {
4835+
return n->link && n->pong_received >= n->link->ctime &&
4836+
(n->flags & CLUSTER_NODE_LIGHT_HDR_PUBLISH_SUPPORTED);
4837+
}
4838+
static inline bool nodeSupportsLightMsgHdrForModule(clusterNode *n) {
4839+
return n->link && n->pong_received >= n->link->ctime &&
4840+
(n->flags & CLUSTER_NODE_LIGHT_HDR_MODULE_SUPPORTED);
4841+
}
4842+
48314843
/* Create a MODULE message block.
48324844
*
48334845
* If is_light is 1, then build a message block with `clusterMsgLight` struct else `clusterMsg`. */

src/cluster_legacy.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,6 @@ typedef struct clusterLink {
7575
#define nodeFailed(n) ((n)->flags & CLUSTER_NODE_FAIL)
7676
#define nodeCantFailover(n) ((n)->flags & CLUSTER_NODE_NOFAILOVER)
7777
#define nodeSupportsExtensions(n) ((n)->flags & CLUSTER_NODE_EXTENSIONS_SUPPORTED)
78-
#define nodeSupportsLightMsgHdrForPubSub(n) ((n)->flags & CLUSTER_NODE_LIGHT_HDR_PUBLISH_SUPPORTED)
79-
#define nodeSupportsLightMsgHdrForModule(n) ((n)->flags & CLUSTER_NODE_LIGHT_HDR_MODULE_SUPPORTED)
8078
#define nodeInNormalState(n) (!((n)->flags & (CLUSTER_NODE_HANDSHAKE | CLUSTER_NODE_MEET | CLUSTER_NODE_PFAIL | CLUSTER_NODE_FAIL)))
8179

8280
/* Cluster messages header */

0 commit comments

Comments
 (0)