Skip to content

Issue starting daemonset pods. #281

@juriskrumins

Description

@juriskrumins

I have an issue, when SNR creates daemonset and pods related to is starting.

Is it possible to make apiTimeout = 10 * time.Second parameter https://github.com/medik8s/self-node-remediation/blob/main/pkg/certificates/storage.go#L40 configurable.

The issue, is that it takes for our cluster more then 10 seconds to collect objects and it's basically gets cancelled due to timeout. Please see error below:

2026-02-06T10:36:19.033313609Z  INFO    setup   Go Version: go1.25.3
2026-02-06T10:36:19.033518315Z  INFO    setup   Go OS/Arch: linux/amd64
2026-02-06T10:36:19.03352624Z   INFO    setup   Operator Version: v0.11.0-19-g3477dbe4
2026-02-06T10:36:19.033533955Z  INFO    setup   Git Commit: 3477dbe48d1c5f3e04213c5917ccb1013278d75f
2026-02-06T10:36:19.033539986Z  INFO    setup   Build Date: 2026-02-06T09:27:02+00:00
2026-02-06T10:36:19.033545466Z  INFO    setup   HTTP/2 for metrics and webhook server disabled
2026-02-06T10:36:19.033692033Z  INFO    setup   OLM injected certs for webhooks not found
2026-02-06T10:36:19.04833439Z   INFO    utils-taints    out of service taint strategy   {"isSupported": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:36:19.0484458Z    INFO    utils-taints    out of service taint strategy   {"isGA": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:36:19.048463043Z  INFO    setup   Starting as a self node remediation agent that should run as part of the daemonset
2026-02-06T10:36:19.083209261Z  INFO    setup   init grpc server
2026-02-06T10:36:19.08326669Z   INFO    setup   starting manager
2026-02-06T10:36:19.083658539Z  INFO    controller-runtime.metrics      Starting metrics server
2026-02-06T10:36:19.083743018Z  INFO    starting server {"name": "health probe", "addr": "0.0.0.0:8081"}
2026-02-06T10:36:19.083890637Z  INFO    controller-runtime.metrics      Serving metrics server  {"bindAddress": ":8080", "secure": false}
2026-02-06T10:36:19.083914542Z  INFO    watchdog        watchdog started
2026-02-06T10:36:19.084137853Z  INFO    Starting EventSource    {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "source": "kind source: *v1alpha1.SelfNodeRemediation"}
2026-02-06T10:36:19.184579064Z  INFO    peers   peer starting   {"name": "knode6"}
2026-02-06T10:36:29.085426918Z  ERROR   peerhealth.server       failed to get server credentials        {"error": "Timeout: failed waiting for *v1.Secret Informer to sync"}
github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).Start
        /workspace/pkg/peerhealth/server.go:66
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:226
2026-02-06T10:36:29.085890192Z  INFO    Stopping and waiting for non leader election runnables
2026-02-06T10:36:29.19165137Z   INFO    Stopping and waiting for leader election runnables
2026-02-06T10:36:29.191883258Z  INFO    Starting Controller     {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:36:29.19213884Z   INFO    Starting workers        {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "worker count": 1}
2026-02-06T10:36:29.191866105Z  INFO    watchdog        disarmed watchdog
2026-02-06T10:36:29.192178254Z  INFO    Shutdown signal received, waiting for all workers to finish     {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:36:29.192198572Z  INFO    All workers finished    {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:36:29.192244389Z  INFO    Stopping and waiting for caches
2026-02-06T10:36:29.193198649Z  INFO    Stopping and waiting for webhooks
2026-02-06T10:36:29.193533111Z  INFO    Stopping and waiting for HTTP servers
2026-02-06T10:36:29.193560502Z  INFO    controller-runtime.metrics      Shutting down metrics server with timeout of 1 minute
2026-02-06T10:36:29.193598163Z  INFO    shutting down server    {"name": "health probe", "addr": "0.0.0.0:8081"}
2026-02-06T10:36:29.193667073Z  INFO    Wait completed, proceeding to shutdown the manager
2026-02-06T10:36:29.193685708Z  ERROR   setup   problem running manager {"error": "Timeout: failed waiting for *v1.Secret Informer to sync"}
main.main
        /workspace/main.go:168
runtime.main
        /usr/local/go/src/runtime/proc.go:285
bash-5.1# 

For example, when I set it to 30 seconds, rebuild and start we have:

2026-02-06T10:40:07.97615805Z   INFO    setup   Go Version: go1.25.3
2026-02-06T10:40:07.9764264Z    INFO    setup   Go OS/Arch: linux/amd64
2026-02-06T10:40:07.976435106Z  INFO    setup   Operator Version: v0.11.0-19-g3477dbe4
2026-02-06T10:40:07.976443938Z  INFO    setup   Git Commit: 3477dbe48d1c5f3e04213c5917ccb1013278d75f
2026-02-06T10:40:07.976453419Z  INFO    setup   Build Date: 2026-02-06T09:48:32+00:00
2026-02-06T10:40:07.976459713Z  INFO    setup   HTTP/2 for metrics and webhook server disabled
2026-02-06T10:40:07.976693578Z  INFO    setup   OLM injected certs for webhooks not found
2026-02-06T10:40:07.993838317Z  INFO    utils-taints    out of service taint strategy   {"isSupported": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:40:07.993886906Z  INFO    utils-taints    out of service taint strategy   {"isGA": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:40:07.993906594Z  INFO    setup   Starting as a self node remediation agent that should run as part of the daemonset
2026-02-06T10:40:08.035534838Z  INFO    setup   init grpc server
2026-02-06T10:40:08.035606437Z  INFO    setup   starting manager
2026-02-06T10:40:08.03838546Z   INFO    controller-runtime.metrics      Starting metrics server
2026-02-06T10:40:08.038462153Z  INFO    starting server {"name": "health probe", "addr": "0.0.0.0:8081"}
2026-02-06T10:40:08.038601629Z  INFO    controller-runtime.metrics      Serving metrics server  {"bindAddress": ":8080", "secure": false}
2026-02-06T10:40:08.0386955Z    INFO    watchdog        watchdog started
2026-02-06T10:40:08.038903095Z  INFO    Starting EventSource    {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "source": "kind source: *v1alpha1.SelfNodeRemediation"}
2026-02-06T10:40:08.139247066Z  INFO    peers   peer starting   {"name": "knode6"}
2026-02-06T10:40:37.340403099Z  INFO    Starting Controller     {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:40:37.340585961Z  INFO    Starting workers        {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "worker count": 1}
2026-02-06T10:40:37.341658013Z  INFO    peerhealth.server       peer health server started

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions