-
Notifications
You must be signed in to change notification settings - Fork 584
Add CLI tool to fix persistent frontier's root discrepancy #18116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: compatible
Are you sure you want to change the base?
Conversation
b548588 to
aa625e2
Compare
|
|
||
| let pending_coinbase t = Common.pending_coinbase t.common | ||
|
|
||
| let read_all_proofs_from_disk t = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this type could only be read from disk? Moreover, could you explain a bit how are these 3 different root data types being used in our codebase?
| open Frontier_base | ||
|
|
||
| (* Build path from frontier root to persistent root by walking backwards *) | ||
| let rec build_path_to_root ~(frontier : Transition_frontier.t) ~current_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens here if we're passing in invalid target_hash, say 0 and a valid current_hash?
| Persistent_frontier.with_instance_exn persistent_frontier ~f | ||
| in | ||
| let clean_frontier () = | ||
| let%bind () = Transition_frontier.close ~loc:__LOC__ frontier in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of mixing sync/async code.
We should consider refactor the io in stdlib to make it more flexible.
Probably something for later though.
glyh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code lgtm. But it'd be better if we have some test coverage to ensure the correctness instead of eyeing the code.
Problem
Frontier persisting is fundamentally broken.
Persistent frontier is managed by an on-disk database coupled with a file containing the latest root hash. While the root file is updated immediately after the in-memory frontier switched from one root to another, storing of the transitions on disk (in the transitions database) comes with up to 9 blocks delay.
This setup allows for better efficiency, but when the Mina node starts, it detects that the root as present in the file is not equal to the root from the database, and treats the whole on-disk persistence as corrupt, removing frontier entirely. This discrepancy between root file and transition database can occur easily, e.g. on the Mina node exiting abruptly (unsure whether it's the case when node is finished with a stop command sent via client).
Solution
Tool that is introduced in this PR allows to fix the state of the persistent frontier's database by removing some transitions fromt he database until the root read from the file becomes a root of the database as well. Removing is performed by executing part of the regular root-moving routine.
In future we must consider embedding this recovery mechanism into Mina node itself, but as the first step it's good to be able to deliver it wrapped in a CLI tool.
Commit structure
PR contains a few straightforward commits exposing some functions across the codebase:
transitionfun from root dataWith the final commit introducing the new tool:
Testing
Explain how you tested your changes:
Checklist