Skip to content

irmin-pack: error unknown inode key #2276

@metanivek

Description

@metanivek

Tezos is experiencing a periodic issue on their testnet (has not been observed on mainnet):

Sep 25 20:38:10.105:   Error:
Sep 25 20:38:10.105:     {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.105: 
Sep 25 20:38:10.106: worker crashed [validator-peer]:
Sep 25 20:38:10.106:   Error:
Sep 25 20:38:10.106:     {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.106: 
Sep 25 20:38:10.120: fetching branch of 3 blocks from peer idtw8Zm1g1bUJEPyqDP6tC8EvjFv4u
Sep 25 20:38:10.120: validation of block BLEzAQFH6kzPu3Q9wKpCK68UscAhfMQwGZHLrK7e3NSuzdwaHDr
Sep 25 20:38:10.120:   failed but precheck succeeded,
Sep 25 20:38:10.120:   Request pushed on 2023-09-25T20:38:10.120-00:00, treated in 4.988us, completed in 77.456us:
Sep 25 20:38:10.120:   {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.120: unexpected error while fetching headers:
Sep 25 20:38:10.120:   {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.122: request
Sep 25 20:38:10.122:   New branch BLm8Dx1TwWBN7DA2YfiULePTR7z5BaVAFDnvcqvLshe3GySoAeZ, locator length 423475
Sep 25 20:38:10.122:   failed
Sep 25 20:38:10.122:   (Request pushed on 2023-09-25T20:38:10.119-00:00, treated in 253us, completed in 2.519ms):

We do not yet have reproduction steps, but here is what we know:

  1. Has only been observed on testnet
  2. Happens roughly at the same time across nodes that are run by the foundation
  3. These nodes regularly export snapshots for network backup
  4. It is unclear whether the error is coming from the read-write instance or the read-only instance
  5. Running irmin 3.7.2

When this occurred earlier this year, we guessed it might be related to GC and read-only instances not properly clearing the LRU. This was fixed in #2200.

We think there could be a bug in reading data that hasn't been flushed to disk since the append-only file does not allow reading beyond the last flushed offset, but this should never happen given higher level code in pack_store prevents this with the staging hash table (although this could use some inspection). Having a control file for the node at the time of this error would help us disprove this idea since we could compare the last written offset in the control file to the offset in the key that is logged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    tezos-supportSupport for bugs related to Tezos

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions