Skip to content

RuntimeV2 corrupted checkpoints no automatic recovery #1317

@PHA-SYSOPS

Description

@PHA-SYSOPS

When a pruntimev2 has a corrupted checkpoint (you can reproduce this to restart the docker when a snapshot is in progress) it reports that it can not load the checkpoint and dies with error. No attempt is made to load one of the other (backup) snapshots. To solve the issue i have to manually delete the broken snapshot and then it would load a previous version and all is well.

I would expect pruntime to try and load the previous version on such error before die, it does not have to be all the oldder snaps, but go back 1. If that works , you might want to delete/rename the snapshot that was broken. This would allow easy and automated recovery

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions