feat: improve node syncing reporting by akaladarshi · Pull Request #5589 · ChainSafe/forest

akaladarshi · 2025-04-21T18:48:48Z

Summary of changes

Changes introduced in this pull request:

Added new SyncStatusReport for tracking the syncing progress
SyncStatusReport is the replacement of the SyncState to track forest syncing
Removed the SyncState Filecoin.SyncState API
Introduced a new API SyncStatus with the name of Forest.SyncStatus to return the sync status report
Updated chain_follower to use new SyncStatusReport instead of SyncState.

Reference issue to close (if applicable)

Closes #5539

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

lemmih

Looks good!

sudo-shashank · 2025-04-23T03:31:48Z

src/cli/subcommands/sync_cmd.rs

            Self::Wait { watch } => {
                let ticker = Ticker::new(0.., Duration::from_secs(1));
                let mut stdout = stdout();
+                let mut last_lines_printed = 0;


either this should be bool type or we can use a better var name to be clear

Done (Renamed it).

sudo-shashank · 2025-04-23T03:43:02Z

src/cli/subcommands/sync_cmd.rs

+                    print_sync_report_details(&report, &mut current_lines)?;
+
+                    last_lines_printed = current_lines;
+                    // Break if Synced and not watching


Exit feels more appropriate than Break as a reader

Suggested change

// Break if Synced and not watching

// Exit if synced and not in watch mode.

sudo-shashank · 2025-04-23T03:52:05Z

src/cli/subcommands/sync_cmd.rs

-fn format_tipset_cids(cids: &str) -> &str {
-    if cids.is_empty() { "[]" } else { cids }
+/// Prints the sync status report details.
+/// `line_count` is mutable and incremented for each line printed, used for clearing in `Wait`.


Suggested change

/// `line_count` is mutable and incremented for each line printed, used for clearing in `Wait`.

/// `line_count` is incremented for each printed line and is used for terminal cleanup (e.g., in `Wait` mode).

Completely removed this part, now returning the number of printed lines from the print_sync_report_details fn directly.

sudo-shashank · 2025-04-23T03:53:37Z

src/cli/subcommands/sync_cmd.rs

-    if cids.is_empty() { "[]" } else { cids }
+/// Prints the sync status report details.
+/// `line_count` is mutable and incremented for each line printed, used for clearing in `Wait`.
+/// Pass `&mut 0` if line counting/clearing is not needed (like in `Status` or final print).


Suggested change

/// Pass `&mut 0` if line counting/clearing is not needed (like in `Status` or final print).

/// Pass `&mut 0` to disable line counting/clearing (like in `status` or final print).

sudo-shashank · 2025-04-23T04:13:42Z

src/cli/subcommands/sync_cmd.rs

    check_snapshot_progress(client, true).await
 }
+
+/// Handles the initial check for snapshot download if the node is initializing.


Suggested change

/// Handles the initial check for snapshot download if the node is initializing.

/// Checks if snapshot download is required or in progress when the node is initializing.

/// If a snapshot download is in progress, it waits for completion before starting sync monitor.

sudo-shashank · 2025-04-23T04:15:38Z

src/cli/subcommands/sync_cmd.rs

+/// Handles the initial check for snapshot download if the node is initializing.
+async fn handle_initial_snapshot_check(client: &rpc::Client) -> anyhow::Result<()> {
+    let initial_report = SyncStatus::call(client, ()).await?;
+    // Use the public getter method instead of accessing the private field


this comment is not clear to me, like which private field and why ?

Added the comment as a reminder to use getters instead of directly accessing private fields during refactoring, but forgot to remove it. Will update it.

Removed the comment.

sudo-shashank · 2025-04-23T04:25:00Z

@akaladarshi code changes overall LGTM 👍🏻, requested some minor changes.
Some comments could be more clear and worth mentioning and useful why we are doing something instead of what is obvious from code. Suggested a few 🙂

sudo-shashank · 2025-04-23T04:31:23Z

src/chain_sync/chain_follower.rs

+                let fork_info = ForkSyncInfo {
+                    target_tipset_key: last_ts.key().clone(),
+                    target_epoch: last_ts.epoch(),
+                    // The epoch from which sync activities (fetch/validate) need to start for this fork.


this comment makes more sense in ForkSyncInfo struct than here

sudo-shashank · 2025-04-23T04:31:38Z

src/chain_sync/chain_follower.rs

+                    target_sync_epoch_start: first_ts.epoch(),
+                    stage,
+                    validated_chain_head_epoch: current_validated_epoch,
+                    start_time, // Track when this fork's sync task was initiated


sudo-shashank · 2025-04-23T04:32:22Z

src/chain_sync/chain_follower.rs

+                    stage,
+                    validated_chain_head_epoch: current_validated_epoch,
+                    start_time, // Track when this fork's sync task was initiated
+                    last_updated: Some(now), // Mark the last update time


same here, I guess you already have it in ForkSyncInfo

sudo-shashank · 2025-04-23T04:43:08Z

src/cli/subcommands/sync_cmd.rs

                }

+                // Print the status report once, without line counting for clearing
+                print_sync_report_details(&sync_status, &mut 0)?;


Using Option instead of &mut 0 to disable line counting is more suitable here

Refactored it, now returning printed lines from the from function itself.

sudo-shashank · 2025-04-23T04:49:44Z

src/chain_sync/sync_status.rs

+    /// Node is significantly behind the network head and actively downloading/validating.
+    #[strum(to_string = "Syncing")]
+    Syncing,
+    /// Node is close to the network head (e.g., within a configurable threshold like 5 epochs).


is the threshold configurable somewhere?

No it's not configurable, I don't think it needs to be configurable because its just for changing the sync status to synced from syncing, earlier it was 10.

right, the comment was misleading

sudo-shashank · 2025-04-23T05:03:06Z

src/chain_sync/sync_status.rs

+        match stateless_mode {
+            true => self.set_status(NodeSyncStatus::Offline),
+            false => {
+                if time_diff < seconds_per_epoch as u64 * 5 {


worth commenting why 5, best if we can avoid hardcoding

It was a randomly selected value, just like SyncState had 10, I just decrease it to be more closer to node head. will make it a constant but open to suggestion for this value.

made it a constant.

@LesnyRumcajs any suggestions for this value here

sudo-shashank · 2025-04-23T05:06:35Z

src/chain_sync/sync_status.rs

+        self.set_network_head(network_head_epoch as ChainEpoch);
+        self.set_epochs_behind(network_head_epoch as i64 - current_chain_head_epoch);


as ChainEpoch is same as i64, lets standardise
or handle it when initialising network_head_epoch

or change calculate_expected_epoch return type to ChainEpoch

Everywhere we were converting the returned value of calculate_expected_epoch to i64 so updated calculate_expected_epoch return type to i64.

sudo-shashank · 2025-04-23T05:20:05Z

src/cli/subcommands/sync_cmd.rs

+    if *line_count > 0 {
+        // Only increment if we are in the Wait command context
+        *line_count += 1;


from this it's not clear how we determine we are in Wait context, adding a comment above conditional check will make more sense

also using Option::None instead of 0 is better

Removed this part.

sudo-shashank · 2025-04-23T05:26:58Z

src/lotus_json/mod.rs

    signature for crate::shim::crypto::Signature,
    signature_type for crate::shim::crypto::SignatureType,
    signed_message for  crate::message::SignedMessage,
-    sync_stage for crate::chain_sync::SyncStage,


we had some test in SyncStage, can we have some for SyncStatus?

Test was there only for checking if the SyncStage is compatible with the LotusJson for comparing with the Lotus API, but since SyncStatus is specific to Forest we don't need those tests.

And to test SyncStatus we need mocks of all the other components it belongs to (ChainStore, StateManager etc.), I don't think we currently have that. We should though.

sudo-shashank · 2025-04-23T05:29:44Z

src/rpc/methods/eth.rs

+        match sync_status.get_status() == NodeSyncStatus::Syncing {
+            true => {


when matching on boolean values, I feel If and Else is more suitable and readable

In general, I agree, though we already have several occurrences of such style in Forest, so it's not a huge deal. For this particular one, I'd avoid comparing booleans and suggest matching on sync_status.get_status() directly. This way, when a new state is introduced, we'll get a compilation error due to an unhandled enum variant.

I've implemented the match statement. Since we only care about Syncing status and returning error in all other case, I think using wildcard (_) is more appropriate.

sudo-shashank · 2025-04-23T05:46:17Z

@akaladarshi some more code feedback

LesnyRumcajs · 2025-04-23T09:21:23Z

CHANGELOG.md

 ### Breaking

 - [#5559](https://github.com/ChainSafe/forest/pull/5559) Change `Filecoin.ChainGetMinBaseFee` to `Forest.ChainGetMinBaseFee` with read access.
+- [#5589](https://github.com/ChainSafe/forest/pull/5589) Replace exiting `Filecoin.SyncState` API with new `Forest.SyncStatus` to track node syncing progress specific to forest.


Suggested change

- [#5589](https://github.com/ChainSafe/forest/pull/5589) Replace exiting `Filecoin.SyncState` API with new `Forest.SyncStatus` to track node syncing progress specific to forest.

- [#5589](https://github.com/ChainSafe/forest/pull/5589) Replace existing `Filecoin.SyncState` API with new `Forest.SyncStatus` to track node syncing progress specific to Forest.

sudo-shashank · 2025-04-24T02:42:34Z

src/chain_sync/sync_status.rs

+use std::sync::Arc;
+use tracing::log;
+
+// Node considered synced if the head is within this many epochs


Suggested change

// Node considered synced if the head is within this many epochs

// Node considered synced if the head is within this threshold.

sudo-shashank · 2025-04-24T02:45:48Z

src/chain_sync/sync_status.rs

+    }
+
+    pub(crate) fn get_status(&self) -> NodeSyncStatus {
+        self.status.clone()


clone can be avoided here

I think this will be negligible since NodeSyncStatus is an enum, anyways added a copy trait in the enum and removed the clone.

sudo-shashank · 2025-04-24T03:19:09Z

src/chain_sync/sync_status.rs

+    /// Node is significantly behind the network head and actively downloading/validating.
+    #[strum(to_string = "Syncing")]
+    Syncing,
+    /// Node is close to the network head (e.g., 5 epochs).


Suggested change

/// Node is close to the network head (e.g., 5 epochs).

/// Node is close to the network head, within the `SYNCED_EPOCH_THRESHOLD`

akaladarshi · 2025-04-24T11:15:33Z

I am also planning to remove the existing SyncSnapshotProgress API and include snapshot tracking directly into our new SyncStatus API.
This will allow us to have snapshot progress tracking and syncing progress in single API. It will done in a subsequent PR since current PR is already very big, @LesnyRumcajs Do you have any thoughts on this?.

LesnyRumcajs

Overall looks good, the output is more understandable than before.

LesnyRumcajs · 2025-04-24T14:40:36Z

src/chain_sync/sync_status.rs

+// Node considered synced if the head is within this threshold.
+const SYNCED_EPOCH_THRESHOLD: u64 = 5;


Hm, that's 2m30s for calibnet and mainnet. Did lower values give you trouble?

I didn't see any problem with this, I was experimenting with different numbers to see if there could be any issue but didn't. So just choose 5, since last was 10 (@LesnyRumcajs was there any specific reason for choosing 10?)

I don't remember exactly. Lower values might have given flaky results in the healthcheck endpoint, i.e., frequent transitions between healthy and unhealthy due to forks. I'm okay with both.

Should I revert back to 10, It's not much of a gap in terms of waiting for node status to change to sync as it's a one time thing while starting up.

Let's revert to limit the changes. If someone reports that the difference is too big, we can always revisit this.

LesnyRumcajs · 2025-04-24T14:41:05Z

src/chain_sync/sync_status.rs

+    #[strum(to_string = "Synced")]
+    Synced,
+    /// An error occurred during the sync process.
+    #[strum(to_string = "error")]


Suggested change

#[strum(to_string = "error")]

#[strum(to_string = "Error")]

use same case

LesnyRumcajs · 2025-04-24T14:41:16Z

src/chain_sync/sync_status.rs

+    #[strum(to_string = "error")]
+    Error,
+    /// Node is configured to not sync (offline mode).
+    #[strum(to_string = "offline")]


Suggested change

#[strum(to_string = "offline")]

#[strum(to_string = "Offline")]

LesnyRumcajs · 2025-04-24T14:49:32Z

src/chain_sync/sync_status.rs

+        }
+    }
+
+    pub(crate) fn set_current_chain_head_key(&mut self, tipset_key: TipsetKey) {


I'd avoid having a structure with private fields and a bunch of setters and getters. If they are meant to be used outside of the struct, mark the fields as pub(crate). Otherwise, it'd be nice to encapsulate this if possible so that the user doesn't have to know the internal structure of it.

Will remove the setters and getters and use the pub(crate) directly on the fields.

LesnyRumcajs · 2025-04-24T14:50:36Z

src/chain_sync/sync_status.rs

+        let now = Utc::now();
+        let now_ts = now.timestamp() as u64;
+        let seconds_per_epoch = state_manager.chain_config().block_delay_secs;
+        let network_head_epoch = calculate_expected_epoch(
+            now_ts,
+            state_manager.chain_store().genesis_block_header().timestamp,
+            seconds_per_epoch,
+        );


I think this could be a separate method somewhere.

Everywhere else we are directly using the calculate_expected_epoch and passing the fields, but here we have introduced the variables just so we can reuse them. I don't think we need to introduce a separate method just for this specific use case.

sudo-shashank

https://filecoinproject.slack.com/archives/C029LPZ5N73/p1745504911544329

sudo-shashank · 2025-04-25T18:33:05Z

src/rpc/methods/sync.rs

    const PARAM_NAMES: [&'static str; 0] = [];
    const API_PATHS: BitFlags<ApiPaths> = ApiPaths::all();
    const PERMISSION: Permission = Permission::Read;



let's add DESCRIPTION

feat: add forest sync status

a365028

akaladarshi force-pushed the akaladarshi/refactor-forest-sync branch from ebf05a1 to a365028 Compare April 21, 2025 18:51

akaladarshi added 7 commits April 22, 2025 16:40

refactor: remove sync state

ff9509d

feat: update the sync status command

a924a67

fix: linter issues

a2c3289

rename sync status API

1c8a77e

refactor: sync commands for readbility

874f3a4

update changelog

d6a3c55

Merge branch 'main' into akaladarshi/refactor-forest-sync

b79e204

akaladarshi marked this pull request as ready for review April 22, 2025 13:07

akaladarshi requested a review from a team as a code owner April 22, 2025 13:07

akaladarshi requested review from hanabi1224 and sudo-shashank and removed request for a team April 22, 2025 13:07

lemmih approved these changes Apr 22, 2025

View reviewed changes

sudo-shashank requested changes Apr 23, 2025

View reviewed changes

LesnyRumcajs reviewed Apr 23, 2025

View reviewed changes

akaladarshi added 4 commits April 23, 2025 15:00

Merge branch 'main' into akaladarshi/refactor-forest-sync

0ee2cb0

address comments

01ce84f

refactor: snapshot progress

cd77904

Merge branch 'main' into akaladarshi/refactor-forest-sync

63b831f

akaladarshi requested review from LesnyRumcajs and sudo-shashank April 23, 2025 19:10

sudo-shashank reviewed Apr 24, 2025

View reviewed changes

sudo-shashank requested changes Apr 24, 2025

View reviewed changes

address comments

c573e63

akaladarshi requested a review from sudo-shashank April 24, 2025 03:56

fix: linter issues

f5d6045

sudo-shashank approved these changes Apr 24, 2025

View reviewed changes

sudo-shashank requested a review from lemmih April 24, 2025 07:13

LesnyRumcajs reviewed Apr 24, 2025

View reviewed changes

address comments

8b924ea

akaladarshi requested a review from LesnyRumcajs April 24, 2025 16:21

chore: add more detailed log

ce8f14b

sudo-shashank requested changes Apr 25, 2025

View reviewed changes

sudo-shashank approved these changes Apr 25, 2025

View reviewed changes

sudo-shashank added this pull request to the merge queue Apr 25, 2025

sudo-shashank removed this pull request from the merge queue due to a manual request Apr 25, 2025

sudo-shashank requested changes Apr 25, 2025

View reviewed changes

address comment

23a9b9a

sudo-shashank approved these changes Apr 28, 2025

View reviewed changes

sudo-shashank added this pull request to the merge queue Apr 28, 2025

Merged via the queue into ChainSafe:main with commit 8e7c0bb Apr 28, 2025
42 checks passed

akaladarshi deleted the akaladarshi/refactor-forest-sync branch April 28, 2025 11:14

akaladarshi mentioned this pull request Apr 28, 2025

fix: Filecoin.EthSyncing API #5601

Closed

	// Break if Synced and not watching
	// Exit if synced and not in watch mode.

	/// `line_count` is mutable and incremented for each line printed, used for clearing in `Wait`.
	/// `line_count` is incremented for each printed line and is used for terminal cleanup (e.g., in `Wait` mode).

	/// Pass `&mut 0` if line counting/clearing is not needed (like in `Status` or final print).
	/// Pass `&mut 0` to disable line counting/clearing (like in `status` or final print).

	/// Handles the initial check for snapshot download if the node is initializing.
	/// Checks if snapshot download is required or in progress when the node is initializing.
	/// If a snapshot download is in progress, it waits for completion before starting sync monitor.

		self.set_network_head(network_head_epoch as ChainEpoch);
		self.set_epochs_behind(network_head_epoch as i64 - current_chain_head_epoch);

		match sync_status.get_status() == NodeSyncStatus::Syncing {
		true => {

	- [#5589](https://github.com/ChainSafe/forest/pull/5589) Replace exiting `Filecoin.SyncState` API with new `Forest.SyncStatus` to track node syncing progress specific to forest.
	- [#5589](https://github.com/ChainSafe/forest/pull/5589) Replace existing `Filecoin.SyncState` API with new `Forest.SyncStatus` to track node syncing progress specific to Forest.

	// Node considered synced if the head is within this many epochs
	// Node considered synced if the head is within this threshold.

	/// Node is close to the network head (e.g., 5 epochs).
	/// Node is close to the network head, within the `SYNCED_EPOCH_THRESHOLD`

		// Node considered synced if the head is within this threshold.
		const SYNCED_EPOCH_THRESHOLD: u64 = 5;

	#[strum(to_string = "offline")]
	#[strum(to_string = "Offline")]

Comments

Conversation

akaladarshi commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Uh oh!

lemmih left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akaladarshi Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sudo-shashank commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akaladarshi Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

akaladarshi commented Apr 21, 2025 •

edited

Loading

akaladarshi Apr 23, 2025 •

edited

Loading

sudo-shashank commented Apr 23, 2025 •

edited

Loading

akaladarshi Apr 23, 2025 •

edited

Loading

LesnyRumcajs Apr 23, 2025 •

edited

Loading

akaladarshi Apr 23, 2025 •

edited

Loading