Remove Bogus Alarm

andrewjstone · andrewjstone · commit ad388eb59393 · 2025-08-02T23:39:24.000Z
It's not actually an error to receive a `CommitAdvance` while
coordinating for the same epoch. The `GetShare` from the coordinator
could have been delayed in the network` and the node that received it
already committed before the coordinator knew it was done preparing. In
essence, the following would happen:

1. The coordinator would send GetShare requests for the prior epoch
2. Enough nodes would reply so that the coordinator would start sending
prepares.
3. Enough nodes would ack prepares to commit
4. Nexus would poll and send commits. Other nodes would get those
commits, but not the coordinator
5. A node that hadn't yet received the `GetShare` would get
a `CommitAdvance` or see the `Commit` from nexus and get it's
configuration and recompute it's own share and commit. It may have been
a prior coordinator with delayed deliveries to other nodes of `GetShare`
messages.
6. The node that just committed finally receives the `GetShare` and
sends back a `CommitAdvance` to the coordinator

This is all valid, and was similar to a proptest counterexample
diff --git a/trust-quorum/src/alarm.rs b/trust-quorum/src/alarm.rs
@@ -6,7 +6,7 @@
 
 use serde::{Deserialize, Serialize};
 
-use crate::{Configuration, Epoch};
+use crate::{Configuration, Epoch, PlatformId};
 
 #[derive(
     Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize,
@@ -20,18 +20,11 @@ pub enum Alarm {
     /// coordinators will generate different key shares. However, since Nexus
     /// will not tell different nodes to coordinate the same configuration, this
     /// state should be impossible to reach.
-    MismatchedConfigurations { config1: Configuration, config2: Configuration },
-
-    /// We received a `CommitAdvance` while coordinating for the same epoch
-    ///
-    /// Reason: `CommitAdvance` is a reply for a key share request in an
-    /// old epoch that we don't have the latest committed coordination.
-    /// However we are actually the coordinator for the configuration in the
-    /// `CommitAdvance`. While it's possible that another node could learn
-    /// of the commit from Nexus before the coordinator, the coordinator will
-    /// never ask for a key share for that epoch. Therefore this state should be
-    /// impossible to reach.
-    CommitAdvanceForCoordinatingEpoch { config: Configuration },
+    MismatchedConfigurations {
+        config1: Configuration,
+        config2: Configuration,
+        from: PlatformId,
+    },
 
     /// The `keyShareComputer` could not compute this node's share
     ///
diff --git a/trust-quorum/src/node.rs b/trust-quorum/src/node.rs
@@ -325,6 +325,7 @@ impl Node {
                 ctx.raise_alarm(Alarm::MismatchedConfigurations {
                     config1: (*existing).clone(),
                     config2: config.clone(),
+                    from: from.clone(),
                 });
             }
         } else {
@@ -347,15 +348,12 @@ impl Node {
                 );
                 self.coordinator_state = None;
             } else if coordinating_epoch == config.epoch {
-                error!(
+                info!(
                     self.log,
                     "Received CommitAdvance while coordinating for same epoch!";
                     "from" => %from,
                     "epoch" => %config.epoch
                 );
-                ctx.raise_alarm(Alarm::CommitAdvanceForCoordinatingEpoch {
-                    config,
-                });
                 return;
             } else {
                 info!(