Skip to content

Commit 7485468

Browse files
committed
fix(network): remove destructive subscription cycling from ReformMesh (TM-B2)
Log analysis from chaos test TM-A1 revealed that subscription cycling (unsubscribe/subscribe) in ReformMesh was destroying ALL existing gossipsub meshes, not just the mesh for the reconnecting peer. Timeline from logs: - 23:22:18: Node1 receiving gossip from Node2 AND Node3 normally - 23:22:19: ReformMesh triggered with subscription cycling - 23:22:19+: ZERO gossip messages received from ANY peer Root cause: When you unsubscribe() from a topic, gossipsub sends LEAVE messages to ALL mesh peers for that topic. When you resubscribe(), the mesh must be rebuilt via GRAFT - which is not guaranteed to happen. Fix: Remove subscription cycling entirely. Only use add_explicit_peer() which marks the peer for mesh inclusion without affecting other meshes. Gossipsub will GRAFT the explicit peer during its next heartbeat (1s). This preserves existing working meshes with other peers while still ensuring the reconnecting peer is added to the mesh.
1 parent ebbc200 commit 7485468

File tree

1 file changed

+22
-26
lines changed

1 file changed

+22
-26
lines changed

app/src/actors_v2/network/network_actor.rs

Lines changed: 22 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2630,24 +2630,27 @@ impl Handler<NetworkMessage> for NetworkActor {
26302630
Some(SwarmCommand::ReformMesh { peer_id, topics }) => {
26312631
use libp2p::gossipsub::IdentTopic;
26322632

2633-
// TM-B1 FIX: Subscription cycling IS needed for mesh formation.
2634-
// The feedback loop was caused by GossipPeerSubscribed handler
2635-
// triggering more ReformMesh commands - that handler has been
2636-
// removed. Now subscription cycling only happens once per
2637-
// reconnection (triggered from ConnectionEstablished with
2638-
// deduplication).
2633+
// TM-B2 FIX: DO NOT use subscription cycling!
26392634
//
2640-
// How this works:
2641-
// 1. unsubscribe() removes us from topic
2642-
// 2. subscribe() re-joins topic, triggering mesh rebuild
2643-
// 3. Gossipsub sends GRAFT to connected peers for mesh formation
2644-
// 4. Remote peer receives SUBSCRIBE but no longer triggers ReformMesh
2645-
// (GossipPeerSubscribed handler was removed)
2635+
// Log analysis from chaos test TM-A1 shows that subscription
2636+
// cycling (unsubscribe/subscribe) is DESTRUCTIVE:
2637+
// - It sends LEAVE messages to ALL mesh peers for each topic
2638+
// - This destroys existing working meshes with other peers
2639+
// - The mesh rebuild via GRAFT is not guaranteed
2640+
// - Result: node ends up with NO mesh peers after ReformMesh
2641+
//
2642+
// Instead, we ONLY call add_explicit_peer() which:
2643+
// - Marks the peer for inclusion in the mesh
2644+
// - Does NOT affect meshes with other peers
2645+
// - Gossipsub will GRAFT this peer during heartbeat (1s interval)
2646+
//
2647+
// For faster mesh formation after partition recovery, we also
2648+
// send publish_many_peers to trigger immediate IWANT/IHAVE
2649+
// exchanges which can help establish mesh connections.
26462650

26472651
let gossipsub = &mut swarm.behaviour_mut().gossipsub;
26482652

2649-
// Check current mesh status
2650-
let mut needs_reform = false;
2653+
// Check current mesh status for logging
26512654
let mut missing_topics = Vec::new();
26522655
for topic_str in &topics {
26532656
let topic = IdentTopic::new(topic_str);
@@ -2656,33 +2659,26 @@ impl Handler<NetworkMessage> for NetworkActor {
26562659
.mesh_peers(&topic_hash)
26572660
.any(|p| *p == peer_id);
26582661
if !in_mesh {
2659-
needs_reform = true;
26602662
missing_topics.push(topic_str.clone());
26612663
}
26622664
}
26632665

2664-
if needs_reform {
2666+
if !missing_topics.is_empty() {
26652667
tracing::info!(
26662668
peer_id = %peer_id,
2667-
topics_count = topics.len(),
26682669
missing_topics = ?missing_topics,
2669-
"ReformMesh: cycling subscriptions to trigger GRAFT"
2670+
"ReformMesh: adding explicit peer (no subscription cycling)"
26702671
);
2671-
2672-
// Cycle subscriptions to trigger mesh rebuild with GRAFT
2673-
for topic_str in &topics {
2674-
let topic = IdentTopic::new(topic_str);
2675-
let _ = gossipsub.unsubscribe(&topic);
2676-
let _ = gossipsub.subscribe(&topic);
2677-
}
26782672
} else {
26792673
tracing::debug!(
26802674
peer_id = %peer_id,
26812675
"ReformMesh: peer already in mesh for all topics"
26822676
);
26832677
}
26842678

2685-
// Also add as explicit peer for reliable message delivery
2679+
// Add as explicit peer - this is the ONLY safe mesh formation method
2680+
// Explicit peers are always included in publish fanout and will
2681+
// receive GRAFT during the next gossipsub heartbeat
26862682
gossipsub.add_explicit_peer(&peer_id);
26872683
}
26882684

0 commit comments

Comments
 (0)