-
Notifications
You must be signed in to change notification settings - Fork 7
Description
There are two distinct but related issues when a host system running kes-agent enters power-saving modes (sleep or hibernation). This only affects the "Normal mode", used for testing and debugging, not the "Service Mode" used in production.
- Sleep (Suspend-to-RAM): Key Update Timer Drifts When the system enters "sleep" mode, memory is preserved, so mlock is not broken. However, the process is suspended.
What happens: The key update timer (see Agent.hs#L257) is paused.
On wake-up: The timer resumes counting from where it left off, without recalculating or accounting for the time that elapsed while the system was asleep.
Impact: This causes the key update schedule to drift. For example, if the machine sleeps for 8 hours, the next key update will be delayed by 8 hours.
- Hibernation (Suspend-to-Disk): mlock is Defeated When the system hibernates, the contents of RAM are written to disk.
What happens: The agent process is suspended, and its memory space (which contains sensitive key material) is written to the disk.
Impact: This completely breaks the security guarantee of mlock, which is intended to prevent sensitive data from ever being swapped to disk.
Scope & Context
This behavior primarily affects users running in "normal mode", which is defined as:
"Normal Mode", in which it runs as a regular process; the process starts immediately, does not fork or drop privileges, and writes log output to stdout. This mode is mainly useful for debugging and development purposes; it is not recommended for production use.
While production environments (like stake pools or kes-agent backup machines) should always have sleep and hibernation disabled as a standard operational security practice, we must avoid poor and confusing experience for developers and testers using "normal mode".
Expected Behavior
-
On wake from Sleep: The agent should ideally detect the time jump and recalculate the next key update. If an update was scheduled to occur during the sleep period, it should be triggered immediately upon wake-up.
-
Regarding Hibernation: This is a fundamentally incompatible state. The security guarantees are broken the moment hibernation is initiated.
Proposed Actions
Short-Term (Documentation): Update the documentation to explicitly state that kes-agent must be run on a system where sleep and hibernation modes are completely disabled. This warning should apply to all use cases, including development, backup, and production, to protect against both timer drift and mlock failure.
Long-Term (Code Fix): investigate if it's possible to change the timer logic.