-
Notifications
You must be signed in to change notification settings - Fork 583
Fix problems with genesis epoch ledgers being equal #17935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: compatible
Are you sure you want to change the base?
Conversation
When both staking and next epoch ledgers have the same merkle root, reuse the staking ledger tar file instead of generating it twice. This prevents different s3_data_hash values caused by different tar creation timestamps, even though the ledger content is identical.
!ci-build-me |
!ci-bypass-changelog |
Deferred.Or_error.return | ||
( Some | ||
{ Consensus.Genesis_epoch_data.Data.ledger = | ||
staking.ledger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about this branch. We never assume ledgers to be shared between 2 "holders", AFAIK. Would there be some mutation/destroy of the ledgers on one end, causing the other end to panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do actually have code that assumes we can do this, later on in the proof of stake code:
mina/src/lib/consensus/proof_of_stake.ml
Lines 478 to 485 in 9b1a90e
(* TODO: remove this duplicate of the genesis ledger *) | |
let genesis_epoch_ledger_staking, genesis_epoch_ledger_next = | |
Option.value_map genesis_epoch_data | |
~default:(genesis_ledger, genesis_ledger) | |
~f:(fun { Genesis_epoch_data.staking; next } -> | |
( staking.ledger | |
, Option.value_map next ~default:staking.ledger ~f:(fun next -> | |
next.ledger ) ) ) |
That's using the genesis ledger as the genesis epoch snapshots, but it's similar. The proof of stake code also has special handling for genesis epoch snapshots; it specifically avoids closing or destroying them:
let close = function
| Genesis_epoch_ledger _ ->
()
| Ledger_root ledger ->
Mina_ledger.Ledger.Root.close ledger
let remove ~config = function
| Genesis_epoch_ledger _ ->
()
| Ledger_root ledger ->
Mina_ledger.Ledger.Root.close ledger ;
Mina_ledger.Ledger.Root.Config.delete_backing config
The daemon should be set up so that the genesis ledgers are always treated as read-only sources of data during node operation. I think that is actually the case, though it's hard to prove a negative. So I guess I share your concern, but I think in this case it should be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place I know of where we might possibly mutate the content of a genesis ledger database is in generate_ledger_tar
, where we commit
the mask over an underlying database. That's used in the AccountsOnly
variant of load_ledger_spec
; the daemon runs that when the genesis ledger is specified as a list of accounts in the daemon config. That commit
mutation should all be in the setup of the genesis data, though, before the rest of the daemon starts using it.
I was curious if that could be susceptible to the same kind of bug this PR fixes, and it actually is. If you have a config file with genesis staking and next epoch ledgers given as lists of accounts, and those lists are equal, then the daemon will double-generate the epoch snapshot tar file. The daemon will still run correctly the first time, because it populates temporary databases when it generates the genesis tar files and reuses those temporary databases during that run. However, when the daemon starts up again, it will crash because it will try to double-unpack the tar file and double-load the ledger database.
That isn't a defect with this PR, exactly. It's not the highest priority thing to fix either, because the mainnet and devnet genesis epoch snapshots are always going to be different in practice. Also @georgeee has mentioned in the past that we might want to get rid of this feature anyway, since we generally expect people to specify their genesis ledgers with a root hash and separate database. (Though, if we don't intend on getting rid of it soon we should eventually fix this code too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is okay. The hard fork integration test we have does encounter this scenario fairly regularly; that's a decent test that the daemon can function properly with the genesis epoch snapshots being the same mask over a database.
When running a test network withjust genesis ledger defined and doing fork config export during the first epoch, exported epoch ledgers (staking and next) will point to the same value and the node will fail to start because of trying to initiate epoch ledger DB from the same directory.
This PR fixes this issue in Mina node, and also ensures that when genesis ledger app generates a config, it won't show different SHA hashes for the same ledger.
Commits:
Explain how you tested your changes:
Checklist: