-
Notifications
You must be signed in to change notification settings - Fork 583
Description
(I'm still trying to reproduce this locally. I got the crash with --hardfork-mode auto
enabled - which we still have to rename - but the code looks like it could be susceptible to the same bug without that enabled. I also did not save the exact crash message, unfortunately.).
I was periodically syncing a daemon to devnet, with and without --hardfork-mode auto
, and the daemon crashed in this bit of the code while --hardfork-mode auto
was enabled:
mina/src/lib/consensus/proof_of_stake.ml
Lines 2663 to 2672 in b0ab82e
let root_ledger_of_snapshot snapshot snapshot_config = | |
O1trace.sync_thread "root_ledger_of_snapshot" (fun () -> | |
match snapshot.ledger with | |
| Ledger_snapshot.Ledger_root ledger -> | |
Ok ledger | |
| Ledger_snapshot.Genesis_epoch_ledger packed -> | |
Genesis_ledger.Packed.create_root packed | |
~config:snapshot_config | |
~depth:Context.constraint_constants.ledger_depth () ) | |
in |
The create_root
function threw an exception when trying to sync one of the epoch snapshots because the rocksdb checkpoint failed - the target directory of the checkpoint already existed. In other words, there was an epoch ledger snapshot already at the snapshot_config
location while the daemon was still at the genesis epoch snapshot.
This was not failing in my local testing before - it may have started because of #17874. Before that PR, we'd do this in this situation:
mina/src/lib/consensus/proof_of_stake.ml
Lines 2668 to 2676 in 2c70f34
| Ledger_snapshot.Genesis_epoch_ledger packed -> | |
let fresh_root_ledger = | |
Mina_ledger.Ledger.Root.create ~logger | |
~config:snapshot_config | |
~depth:Context.constraint_constants.ledger_depth | |
() | |
in | |
Genesis_ledger.Packed.populate_root packed | |
fresh_root_ledger ) |
That Leder.Root.create
would open up whatever database is present at that config
location. (The code before we made all these root ledger handling changes did the same thing). It would then overwrite the contents of the database with the genesis ledger, and then sync the ledger to the network. Thus, the daemon did not have to care about cleaning up an old epoch ledger database that was lying around.
I'm unsure of a few things:
- If this can be reproduced with
--hardfork-mode auto
, or if I can get this to show up without that enabled. (I'm still looking at it). - If the daemon was correctly at the genesis epoch ledger snapshots at the moment it crashed.
We might want to add some code to delete any snapshot backing that might be present at the snapshot_config
location before creating a new root from genesis. Though, if this only shows up with --hardfork-mode auto
, then this kind of failure might be the result of a bug elsewhere.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status