|
| 1 | +# Healing Algorithm Explanation and Documentation (Before Path Based) |
| 2 | + |
| 3 | +Healing is the last step of Snap Sync. Snap begins the downloading of the state and storage tries by downloading the leaves (account states and storage slots), and from those leaves we reconstruct the intermediate nodes (branches and extension). Afterwards we may be left with a malformed trie, as that step will resume the download of leaves with a new state root if the old one times out. |
| 4 | + |
| 5 | +The purpose of the healing algorithm is “heal” that trie so that it ends up in a consistent state. |
| 6 | + |
| 7 | +# Healing Conceptually |
| 8 | + |
| 9 | +The malformed trie is going to have large sections of the trie which are in a correct state, as we had all of the leaves in that sections and those accounts haven’t been modified in the blocks that happened concurrently to the snapsync algorithm. |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +Example of a trie where 3 leaves where downloaded in block 1 and 1 was downloaded in block 2. The trie root is different from the state root of block 2, as one of the leaf nodes was modified in block 2. |
| 14 | + |
| 15 | +The algorithm attempts to rebuild the trie through downloading the missing nodes, starting from the top. If the node is present in the database that means that we have that and all of their child nodes present in the database. If not, we download the node and check if the children of the root are present, applying the algorithm recursively. |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +Iteration 1 of algorithm |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +Iteration 2 of algorithm |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +Iteration 3 of algorithm |
| 28 | + |
| 29 | + |
| 30 | + |
| 31 | +Final state of trie after healing |
| 32 | + |
| 33 | +# Implementation |
| 34 | + |
| 35 | +The algorithm is implemented in ethrex currently in `crates/networking/p2p/sync/state_healings.rs` and `crates/networking/p2p/sync/storage_healing.rs`. All of our code examples are from the account state trie. |
| 36 | + |
| 37 | +### API |
| 38 | + |
| 39 | +The API used is the ethereum capability snap/1, documented at https://github.com/ethereum/devp2p/blob/master/caps/snap.md and for healing the only method used is `GetTrieNodes`. This method allows us to ask our peers for nodes in a trie. We ask the nodes by **path** to the node, not by hash. |
| 40 | + |
| 41 | +```rust |
| 42 | +pub struct GetTrieNodes { |
| 43 | + pub id: u64, |
| 44 | + pub root_hash: H256, |
| 45 | + // [[acc_path, slot_path_1, slot_path_2,...]...] |
| 46 | + // The paths can be either full paths (hash) or |
| 47 | + // only the partial path (compact-encoded nibbles) |
| 48 | + pub paths: Vec<Vec<Bytes>>, |
| 49 | + pub bytes: u64, |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +### Staleness |
| 54 | + |
| 55 | +The spec allows the nodes to stop responding if the request is older than 128 blocks. In that case, the response to the `GetTrieNodes` will be empty. As such, our algorithm checks periodically if the block is stale, and stops executing. In that scenario, we must be sure that the we leave the storage in a consistent state at any given time and doesn’t break our invariants. |
| 56 | + |
| 57 | +```rust |
| 58 | +// Current Staleness logic code |
| 59 | +// We check with a clock if we are stale |
| 60 | +if !is_stale && current_unix_time() > staleness_timestamp { |
| 61 | + info!("state healing is stale"); |
| 62 | + is_stale = true; |
| 63 | +} |
| 64 | +// We make sure that we have stored everything that we need to the database |
| 65 | +if is_stale && nodes_to_heal.is_empty() && inflight_tasks == 0 { |
| 66 | + info!("Finished inflight tasks"); |
| 67 | + db_joinset.join_all().await; |
| 68 | + break; |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +### Membatch |
| 73 | + |
| 74 | +Currently, our algorithm has an invariant, which is that if we have a node in storage we have its and all of its children are present. Therefore, when we download for a node if some of it’s children are missing we can’t immediately store it on disk. Our implementation currently stores the nodes in temporary structure called membatch, which stores the node and how many of it’s children are missing. When a child gets stored, we reduce the counter of missing children of the parent. If that numbers reaches 0, we write the parent to the database. |
| 75 | + |
| 76 | +In code, the membatch is current `HashMap<Nibbles, MembatchEntryValue>` with the value being the following struct |
| 77 | + |
| 78 | +```rust |
| 79 | +pub struct MembatchEntryValue { |
| 80 | + /// The node to be flushed into storage |
| 81 | + node: Node, |
| 82 | + /// How many of the nodes that are child of this are not in storage |
| 83 | + children_not_in_storage_count: u64, |
| 84 | + /// Which is the parent of this node |
| 85 | + parent_path: Nibbles, |
| 86 | +} |
| 87 | +``` |
| 88 | + |
| 89 | +## Known Optimization Issues |
| 90 | + |
| 91 | +- Membatch gets cleared between iterations, while it could be preserved and the hash checked. |
| 92 | +- When checking if a child is present in storage, we can also check if it’s in the membatch. If it is, we can skip that download and act like we have immediately downloaded that node. |
| 93 | +- Membatch is currently a `HashMap`, a `BTreeMap` or other structures may be faster in real use. |
| 94 | +- Storage healing receives as a parameter a list of accounts that need to be healed and it has get their state before it can run. Doing those reads could be more efficient. |
0 commit comments