Investigate possible crash during epoch ledger snapshot syncing

(I'm still trying to reproduce this locally. I got the crash with `--hardfork-mode auto` enabled - which we still have to rename - but the code looks like it could be susceptible to the same bug without that enabled. I also did not save the exact crash message, unfortunately.).

I was periodically syncing a daemon to devnet, with and without `--hardfork-mode auto`, and the daemon crashed in this bit of the code while `--hardfork-mode auto` was enabled:

https://github.com/MinaProtocol/mina/blob/b0ab82e68f7f00485557018e506ef402c630c84b/src/lib/consensus/proof_of_stake.ml#L2663-L2672

The `create_root` function threw an exception when trying to sync one of the epoch snapshots because the rocksdb checkpoint failed - the target directory of the checkpoint already existed. In other words, there was an epoch ledger snapshot already at the `snapshot_config` location while the daemon was still at the genesis epoch snapshot.

This was not failing in my local testing before - it may have started because of https://github.com/MinaProtocol/mina/pull/17874. Before that PR, we'd do this in this situation:

https://github.com/MinaProtocol/mina/blob/2c70f34e3a78a194fba5bb55230d5df5fec5efb5/src/lib/consensus/proof_of_stake.ml#L2668-L2676

That `Leder.Root.create` would open up whatever database is present at that `config` location. (The code before we made all these root ledger handling changes did the same thing). It would then overwrite the contents of the database with the genesis ledger, and then sync the ledger to the network. Thus, the daemon did not have to care about cleaning up an old epoch ledger database that was lying around.

I'm unsure of a few things:

- If this can be reproduced with `--hardfork-mode auto`, or if I can get this to show up without that enabled. (I'm still looking at it).
- If the daemon was correctly at the genesis epoch ledger snapshots at the moment it crashed.

We might want to add some code to delete any snapshot backing that might be present at the `snapshot_config` location before creating a new root from genesis. Though, if this only shows up with `--hardfork-mode auto`, then this kind of failure might be the result of a bug elsewhere.

	let root_ledger_of_snapshot snapshot snapshot_config =
	O1trace.sync_thread "root_ledger_of_snapshot" (fun () ->
	match snapshot.ledger with
	\| Ledger_snapshot.Ledger_root ledger ->
	Ok ledger
	\| Ledger_snapshot.Genesis_epoch_ledger packed ->
	Genesis_ledger.Packed.create_root packed
	~config:snapshot_config
	~depth:Context.constraint_constants.ledger_depth () )
	in

	\| Ledger_snapshot.Genesis_epoch_ledger packed ->
	let fresh_root_ledger =
	Mina_ledger.Ledger.Root.create ~logger
	~config:snapshot_config
	~depth:Context.constraint_constants.ledger_depth
	()
	in
	Genesis_ledger.Packed.populate_root packed
	fresh_root_ledger )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate possible crash during epoch ledger snapshot syncing #17899

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate possible crash during epoch ledger snapshot syncing #17899

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions