You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kvnemesis: add crash operation support for fault injection testing
Previously, `kvnemesis` could only simulate graceful node stops and
restarts, which limited its ability to test crash recovery scenarios.
This was inadequate because real-world failures often involve abrupt
crashes of nodes, leaving data in an inconsistent state that must be
recovered on restart.
To address this, this patch adds crash operation support to
`kvnemesis`. It extends `testcluster.TestCluster` with a
`CrashServer` method that emulates a crash by stopping a server and
creating a snapshot of its in-memory filesystems at the last sync
point using `vfs.MemFS.CrashClone`. This simulates what would persist
on disk after a real crash. The method also isolates the crashed node
from peers by tripping circuit breakers, simulating network partition
behavior. Circuit breakers automatically reset when the node restarts
and connections are re-established.
The patch adds `CrashNodeOperation` to the kvnemesis protobuf schema,
integrates it into the generator to randomly crash nodes during test
runs, implements crash application in the applier, and updates the
validator to handle crash scenarios. The generator now tracks crashed
nodes separately from stopped nodes.
Release note: None
Informs: #64828
0 commit comments