Done Criteria
After the 202601-2 wave of Curio hardening/performance improvements, take an SP out of service for ~5 minutes (e.g., simulating a power outage) and measure how long it takes to recover from the backlog.
Why Important
Power outages will occur and we want to make sure that these events don't tank an SP.
Notes
- A real power outage happened on 2026-01-21 affecting calib.ezpdpz.net and calib2.ezpdpz.ne. It lasted 5 minutes, but took ~12 hours to catch up (slack thread). Note that this happened in calibration which has a 12x more frequent proving period, but we believe this is a simulation of what will happen if a mainnet node has 12x the number of datasets as these nodes had.