Skip to content

Conversation

georgeee
Copy link
Member

@georgeee georgeee commented Oct 9, 2025

When running a test network withjust genesis ledger defined and doing fork config export during the first epoch, exported epoch ledgers (staking and next) will point to the same value and the node will fail to start because of trying to initiate epoch ledger DB from the same directory.

This PR fixes this issue in Mina node, and also ensures that when genesis ledger app generates a config, it won't show different SHA hashes for the same ledger.

Commits:

  • Fix issue with the same epoch ledger being used twice
  • Fix: Reuse epoch ledger tar when staking and next have same hash

Explain how you tested your changes:

  • tested as part of HF integaration test

Checklist:

  • Dependency versions are unchanged
    • Notify Velocity team if dependencies must change in CI
  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? None

When both staking and next epoch ledgers have the same merkle root,
reuse the staking ledger tar file instead of generating it twice.
This prevents different s3_data_hash values caused by different
tar creation timestamps, even though the ledger content is identical.
@georgeee georgeee requested a review from a team as a code owner October 9, 2025 20:14
@georgeee
Copy link
Member Author

georgeee commented Oct 9, 2025

!ci-build-me

@georgeee
Copy link
Member Author

georgeee commented Oct 9, 2025

!ci-bypass-changelog

Deferred.Or_error.return
( Some
{ Consensus.Genesis_epoch_data.Data.ledger =
staking.ledger
Copy link
Member

@glyh glyh Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about this branch. We never assume ledgers to be shared between 2 "holders", AFAIK. Would there be some mutation/destroy of the ledgers on one end, causing the other end to panic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do actually have code that assumes we can do this, later on in the proof of stake code:

(* TODO: remove this duplicate of the genesis ledger *)
let genesis_epoch_ledger_staking, genesis_epoch_ledger_next =
Option.value_map genesis_epoch_data
~default:(genesis_ledger, genesis_ledger)
~f:(fun { Genesis_epoch_data.staking; next } ->
( staking.ledger
, Option.value_map next ~default:staking.ledger ~f:(fun next ->
next.ledger ) ) )

That's using the genesis ledger as the genesis epoch snapshots, but it's similar. The proof of stake code also has special handling for genesis epoch snapshots; it specifically avoids closing or destroying them:

          let close = function
            | Genesis_epoch_ledger _ ->
                ()
            | Ledger_root ledger ->
                Mina_ledger.Ledger.Root.close ledger

          let remove ~config = function
            | Genesis_epoch_ledger _ ->
                ()
            | Ledger_root ledger ->
                Mina_ledger.Ledger.Root.close ledger ;
                Mina_ledger.Ledger.Root.Config.delete_backing config

The daemon should be set up so that the genesis ledgers are always treated as read-only sources of data during node operation. I think that is actually the case, though it's hard to prove a negative. So I guess I share your concern, but I think in this case it should be fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place I know of where we might possibly mutate the content of a genesis ledger database is in generate_ledger_tar, where we commit the mask over an underlying database. That's used in the AccountsOnly variant of load_ledger_spec; the daemon runs that when the genesis ledger is specified as a list of accounts in the daemon config. That commit mutation should all be in the setup of the genesis data, though, before the rest of the daemon starts using it.

I was curious if that could be susceptible to the same kind of bug this PR fixes, and it actually is. If you have a config file with genesis staking and next epoch ledgers given as lists of accounts, and those lists are equal, then the daemon will double-generate the epoch snapshot tar file. The daemon will still run correctly the first time, because it populates temporary databases when it generates the genesis tar files and reuses those temporary databases during that run. However, when the daemon starts up again, it will crash because it will try to double-unpack the tar file and double-load the ledger database.

That isn't a defect with this PR, exactly. It's not the highest priority thing to fix either, because the mainnet and devnet genesis epoch snapshots are always going to be different in practice. Also @georgeee has mentioned in the past that we might want to get rid of this feature anyway, since we generally expect people to specify their genesis ledgers with a root hash and separate database. (Though, if we don't intend on getting rid of it soon we should eventually fix this code too).

Copy link
Member

@cjjdespres cjjdespres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is okay. The hard fork integration test we have does encounter this scenario fairly regularly; that's a decent test that the daemon can function properly with the genesis epoch snapshots being the same mask over a database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants