Skip to content

Commit ad388eb

Browse files
committed
Remove Bogus Alarm
It's not actually an error to receive a `CommitAdvance` while coordinating for the same epoch. The `GetShare` from the coordinator could have been delayed in the network` and the node that received it already committed before the coordinator knew it was done preparing. In essence, the following would happen: 1. The coordinator would send GetShare requests for the prior epoch 2. Enough nodes would reply so that the coordinator would start sending prepares. 3. Enough nodes would ack prepares to commit 4. Nexus would poll and send commits. Other nodes would get those commits, but not the coordinator 5. A node that hadn't yet received the `GetShare` would get a `CommitAdvance` or see the `Commit` from nexus and get it's configuration and recompute it's own share and commit. It may have been a prior coordinator with delayed deliveries to other nodes of `GetShare` messages. 6. The node that just committed finally receives the `GetShare` and sends back a `CommitAdvance` to the coordinator This is all valid, and was similar to a proptest counterexample
1 parent 681e851 commit ad388eb

File tree

2 files changed

+8
-17
lines changed

2 files changed

+8
-17
lines changed

trust-quorum/src/alarm.rs

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
77
use serde::{Deserialize, Serialize};
88

9-
use crate::{Configuration, Epoch};
9+
use crate::{Configuration, Epoch, PlatformId};
1010

1111
#[derive(
1212
Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize,
@@ -20,18 +20,11 @@ pub enum Alarm {
2020
/// coordinators will generate different key shares. However, since Nexus
2121
/// will not tell different nodes to coordinate the same configuration, this
2222
/// state should be impossible to reach.
23-
MismatchedConfigurations { config1: Configuration, config2: Configuration },
24-
25-
/// We received a `CommitAdvance` while coordinating for the same epoch
26-
///
27-
/// Reason: `CommitAdvance` is a reply for a key share request in an
28-
/// old epoch that we don't have the latest committed coordination.
29-
/// However we are actually the coordinator for the configuration in the
30-
/// `CommitAdvance`. While it's possible that another node could learn
31-
/// of the commit from Nexus before the coordinator, the coordinator will
32-
/// never ask for a key share for that epoch. Therefore this state should be
33-
/// impossible to reach.
34-
CommitAdvanceForCoordinatingEpoch { config: Configuration },
23+
MismatchedConfigurations {
24+
config1: Configuration,
25+
config2: Configuration,
26+
from: PlatformId,
27+
},
3528

3629
/// The `keyShareComputer` could not compute this node's share
3730
///

trust-quorum/src/node.rs

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,7 @@ impl Node {
325325
ctx.raise_alarm(Alarm::MismatchedConfigurations {
326326
config1: (*existing).clone(),
327327
config2: config.clone(),
328+
from: from.clone(),
328329
});
329330
}
330331
} else {
@@ -347,15 +348,12 @@ impl Node {
347348
);
348349
self.coordinator_state = None;
349350
} else if coordinating_epoch == config.epoch {
350-
error!(
351+
info!(
351352
self.log,
352353
"Received CommitAdvance while coordinating for same epoch!";
353354
"from" => %from,
354355
"epoch" => %config.epoch
355356
);
356-
ctx.raise_alarm(Alarm::CommitAdvanceForCoordinatingEpoch {
357-
config,
358-
});
359357
return;
360358
} else {
361359
info!(

0 commit comments

Comments
 (0)