feat(rpc): add tn_syncing, tn_epochInfo, tn_currentCommittee methods#539
feat(rpc): add tn_syncing, tn_epochInfo, tn_currentCommittee methods#539TanguyDeTaxis wants to merge 8 commits intoTelcoin-Association:mainfrom
Conversation
ae01506 to
b4d8c3b
Compare
tn_syncing previously returned Synced based only on epoch comparison, ignoring execution lag. A node could have consensus data for the current epoch but still be behind on execution. Now requires both epoch AND execution to be caught up before declaring synced. Also adds currentEpoch/highestEpoch fields to SyncProgress and comprehensive tests for serialization and sync_status logic.
Add a watch channel to track the current committee in ConsensusBus, enabling RPC and other components to access committee info without going through the primary node. Updated epoch manager to populate the channel when creating a committee.
Extend the tn RPC namespace with sync status, epoch info, and committee endpoints. Add SyncStatus, SyncProgress, EpochInfo, CommitteeInfo, ValidatorInfo types. Fix JSON-RPC error codes to use the standard server error range (-32000 to -32099). Update EngineToPrimary trait with the new methods and adapt faucet test mock accordingly.
b4d8c3b to
8f2132f
Compare
# Conflicts: # crates/consensus/primary/src/consensus_bus.rs
grantkee
left a comment
There was a problem hiding this comment.
@TanguyDeTaxis Thank you for taking this on. It's going in the right direction, but I think there is more to consider. I'm leaving feedback for you, but I am still reviewing/thinking about some of the implementation details. Your work reflects the issue #530 however, I'm realizing there are still outstanding design decisions I need to figure out. I will follow up soon with more comments
There was a problem hiding this comment.
nit: please move everything after L47 to execution/tn-rpc/src/rpc_ext.rs - this was missed in a previous refactor and aligns better with the separation of domain logic
There was a problem hiding this comment.
Thanks, agreed on separation. I moved this out of node/lib.rs into node/engine_to_primary_rpc.rs in f586ad9 so lib.rs is only wiring now. I kept it in node for now because it depends on node-side consensus bus/reth wiring, but I can move it into execution/tn-rpc if you want that boundary in this PR.
crates/node/src/lib.rs
Outdated
| return SyncStatus::Synced; | ||
| } | ||
|
|
||
| // Block-level progress: compare executed blocks vs known consensus blocks. |
There was a problem hiding this comment.
I pointed out a similar problem in PR feedback for #542. This is comparing execution state with consensus state which is no longer guaranteed to be 1:1+ after #531
After addressing #542, I think the same approach will work here
There was a problem hiding this comment.
Agreed. In e98b524 I removed the 1:1 execution<->consensus assumption. Sync now tracks consensus progress and execution progress independently (round-based), and returns Synced only when epoch + consensus + execution are all caught up.
crates/node/src/lib.rs
Outdated
| fn current_epoch_info(&self) -> Option<EpochInfo> { | ||
| let (_, record) = self.db.last_record::<EpochRecords>()?; | ||
| Some(record.into()) | ||
| } |
There was a problem hiding this comment.
Epoch records are only generated after the epoch concludes, so this logic needs to be adjusted
There was a problem hiding this comment.
Updated in e98b524. network_epoch is now derived from the highest downloaded epoch record + 1, with explicit handling for the epoch-0 dummy record.
crates/node/src/lib.rs
Outdated
| db: DB, | ||
| /// The block number when sync started, captured on first syncing status check. | ||
| /// Derived from the last EpochRecord's parent_state as specified in issue #530. | ||
| sync_starting_block: OnceLock<u64>, |
There was a problem hiding this comment.
I made RethEnv thread-safe and cheap to clone in #549 so we can use it in the RPC struct
pub struct EngineToPrimaryRpc<DB> {
/// Access to the execution database.
reth_env: RethEnv,
//...snip
}This will help address other feedback
There was a problem hiding this comment.
Done, I wired RethEnv directly into EngineToPrimaryRpc (new signature includes reth_env), and use it in sync and epoch RPC paths.
crates/node/src/lib.rs
Outdated
| // Epoch-based sync decision: compare network epoch (from committee) vs local epoch (from | ||
| // DB). This follows the issue #530 spec: "A node is considered synced when its | ||
| // execution state is within the current consensus epoch." |
There was a problem hiding this comment.
The content is good, and I appreciate well-commented code. That being said, please remove the reference to the "issue #530"
The comment might be more accurate by saying:
...compare network epoch (from last epoch record) vs local epoch (from execution DB). A node is considered synced when its execution state is within the current consensus epoch
There was a problem hiding this comment.
Done. I removed the issue #530 wording and updated the comments to describe the data sources directly (network epoch from epoch records, local epoch from execution state).
| self.consensus_bus.current_committee().send_replace(Some(committee.clone())); | ||
|
|
There was a problem hiding this comment.
This committee is the local committee - loaded from the latest execution state per epoch. I think the consensus_bus.current_committee().send_replace() should happen when a new epoch record is verified (~manger.rs:1238)
There was a problem hiding this comment.
Good call. I now publish current_committee after persisting a new epoch record (manager.rs around save_epoch_record path), and also publish at committee load so RPC reflects the live epoch promptly.
crates/node/src/lib.rs
Outdated
|
|
||
| // Synced only when both epoch AND execution are caught up. | ||
| // A node can have consensus data for the current epoch but execution may still | ||
| // lag behind — we must wait for execution to complete before declaring synced. |
There was a problem hiding this comment.
Yes, I agree with this comment (which is more specific than the "execution state is within the current consensus epoch" from the issue. The problem is even with ~6h epochs, a node might not be synced for some time
There was a problem hiding this comment.
Agreed, and this is now reflected in e98b524: even with long epochs we do not report Synced unless execution progress has also caught up.
crates/node/src/lib.rs
Outdated
| return SyncStatus::Synced; | ||
| } | ||
|
|
||
| // Fallback when committee info is unavailable (e.g. early startup): |
There was a problem hiding this comment.
Another edge case to consider is the 0 epoch when there is only a dummy epoch record until the first epoch closes
There was a problem hiding this comment.
Handled in e98b524: I added the epoch-0 dummy-record guard so network_epoch does not advance incorrectly before the first real epoch record.
| pub struct SyncProgress { | ||
| /// The block number the node started syncing from. | ||
| pub starting_block: u64, | ||
| /// The current block number the node has reached. | ||
| pub current_block: u64, | ||
| /// The highest known block number. | ||
| pub highest_block: u64, | ||
| /// The node's current local epoch. | ||
| pub current_epoch: u64, | ||
| /// The network's current epoch. | ||
| pub highest_epoch: u64, | ||
| } |
There was a problem hiding this comment.
The starting_block, current_block, highest_block should all mention and correlate to consensus blocks (aka - ConsensusHeader). Please add another field to this struct that includes execution_block_height or something along those lines that uses RethEnv::canonical_tip() number
There was a problem hiding this comment.
Implemented. I clarified these fields as consensus block metrics and added execution_block_height from reth canonical tip, with serialization/deserialization test updates.
crates/node/src/lib.rs
Outdated
| // DB). This follows the issue #530 spec: "A node is considered synced when its | ||
| // execution state is within the current consensus epoch." | ||
| let network_epoch = | ||
| self.consensus_bus.current_committee().borrow().as_ref().map(|c| c.epoch()); |
There was a problem hiding this comment.
The network_epoch is most likely the highest downloaded epoch record (from DB) + 1 (assuming the node has successfully fetched all existing epoch records as outlined in SYNC.md)
local_epoch = epoch from the node's execution state (current_committee epoch, or canonical tip)
There was a problem hiding this comment.
Implemented this approach in e98b524: network_epoch uses highest downloaded epoch record + 1, while local_epoch comes from execution state (canonical tip epoch), with committee fallback.
|
Thanks for the detailed feedback, @grantkee . I pushed a follow-up series tonight to address the sync/epoch/committee points and improve maintainability:
I also added/updated tests around sync status and active-epoch behavior. Happy to adjust further if you want a different interpretation for specific edge cases. |
Summary
tn_syncing,tn_epochInfo, andtn_currentCommitteeRPC methods to thetnnamespacecurrent_committeewatch channel toConsensusBusso RPC can access committee infotn_syncingto check both epoch and execution progress before declaring a node synced (addresses review feedback)currentEpoch/highestEpochfields to the sync progress response for better validator visibility-32000to-32099)tn_syncingbehaviorReturns
falsewhen synced, or a progress object when syncing:{ "startingBlock": 100, "currentBlock": 5000, "highestBlock": 10000, "currentEpoch": 3, "highestEpoch": 5 }A node is considered synced when:
CvvActive), orPreviously, condition 2 only checked the epoch — a node could report as synced while execution was still lagging behind consensus. This is now fixed.
Files changed
crates/consensus/primary/src/consensus_bus.rscurrent_committeewatch channelcrates/node/src/manager.rscrates/execution/tn-rpc/src/rpc_ext.rscrates/execution/tn-rpc/src/lib.rsEngineToPrimarytrait with new methodscrates/execution/tn-rpc/src/error.rscrates/node/src/lib.rscrates/execution/faucet/tests/it/faucet.rs