Skip to content

Pod expiration drifts when system is suspended #302

@nightkr

Description

@nightkr

Affected Stackable version

dev (24.11 prerelease)

Current and expected behavior

@xeniape ran into an issue (sble employees: see slack) where pods would be left with expired certificates after a while, rather than getting evicted by commons-op as expected. Restarting commons-op evicted the pods, as expected.

Our current working hypothesis here is that commons-op's re-reconciliation timer didn't advance while the computer was suspended, causing the eviction to be delayed by the same amount of time.

Possible solution

Either:

  1. Change the timer to use wall time instead of monotonic/CPU time
  2. Cap the re-reconciliation timer, causing spurious reconciles but at least limiting the issue
  3. Make the timer automatically expire when resuming from suspend

Either way, we should probably also communicate upstream with kube-rs and either fix it there or highlight the issue somehow.

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions