Skip to content

Commit df5c79a

Browse files
dvkashapovhpatro
authored andcommitted
Cluster: Avoid usage of light weight messages to nodes with not ready bidirectional links (#2817)
After network failure nodes that come back to cluster do not always send and/or receive messages from other nodes in shard, this fix avoids usage of light weight messages to nodes with not ready bidirectional links. When a light message comes before any normal message, freeing of cluster link is happening because on the just established connection link->node is not assigned yet. It is assigned in getNodeFromLinkAndMsg right after the condition if (is_light). So on a cluster with heavy pubsub load a long loop of disconnects is possible, and we got this. 1. node A establishes cluster link to node B 2. node A propagates PUBLISH to node B 3. node B frees cluster link because of link->node == null as it has not received non-light messages yet 4. go to 1. During this loop subscribers of node B does not receive any messages published to node A. So here we want to make sure that PING was sent (and link->node was initialized) on this connection before using lightweight messages. --------- Signed-off-by: Daniil Kashapov <[email protected]> Co-authored-by: Harkrishn Patro <[email protected]>
1 parent 8753cb3 commit df5c79a

File tree

2 files changed

+13
-2
lines changed

2 files changed

+13
-2
lines changed

src/cluster_legacy.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
#include "connection.h"
4545
#include "module.h"
4646

47+
#include <stdbool.h>
4748
#include <stdlib.h>
4849
#include <sys/types.h>
4950
#include <sys/socket.h>
@@ -4400,6 +4401,18 @@ void clusterSendUpdate(clusterLink *link, clusterNode *node) {
44004401
clusterMsgSendBlockDecrRefCount(msgblock);
44014402
}
44024403

4404+
/* Inline functions that check support of light weight messages by node
4405+
* and avoid using light weight messages until the bidirectional
4406+
* link(s) have been established. */
4407+
static inline bool nodeSupportsLightMsgHdrForPubSub(clusterNode *n) {
4408+
return n->link && n->pong_received >= n->link->ctime &&
4409+
(n->flags & CLUSTER_NODE_LIGHT_HDR_PUBLISH_SUPPORTED);
4410+
}
4411+
static inline bool nodeSupportsLightMsgHdrForModule(clusterNode *n) {
4412+
return n->link && n->pong_received >= n->link->ctime &&
4413+
(n->flags & CLUSTER_NODE_LIGHT_HDR_MODULE_SUPPORTED);
4414+
}
4415+
44034416
/* Create a MODULE message block.
44044417
*
44054418
* If is_light is 1, then build a message block with `clusterMsgLight` struct else `clusterMsg`. */

src/cluster_legacy.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,6 @@ typedef struct clusterLink {
6767
#define nodeFailed(n) ((n)->flags & CLUSTER_NODE_FAIL)
6868
#define nodeCantFailover(n) ((n)->flags & CLUSTER_NODE_NOFAILOVER)
6969
#define nodeSupportsExtensions(n) ((n)->flags & CLUSTER_NODE_EXTENSIONS_SUPPORTED)
70-
#define nodeSupportsLightMsgHdrForPubSub(n) ((n)->flags & CLUSTER_NODE_LIGHT_HDR_PUBLISH_SUPPORTED)
71-
#define nodeSupportsLightMsgHdrForModule(n) ((n)->flags & CLUSTER_NODE_LIGHT_HDR_MODULE_SUPPORTED)
7270
#define nodeInNormalState(n) (!((n)->flags & (CLUSTER_NODE_HANDSHAKE | CLUSTER_NODE_MEET | CLUSTER_NODE_PFAIL | CLUSTER_NODE_FAIL)))
7371

7472
/* This structure represent elements of node->fail_reports. */

0 commit comments

Comments
 (0)