-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
I have an issue, when SNR creates daemonset and pods related to is starting.
Is it possible to make apiTimeout = 10 * time.Second parameter https://github.com/medik8s/self-node-remediation/blob/main/pkg/certificates/storage.go#L40 configurable.
The issue, is that it takes for our cluster more then 10 seconds to collect objects and it's basically gets cancelled due to timeout. Please see error below:
2026-02-06T10:36:19.033313609Z INFO setup Go Version: go1.25.3
2026-02-06T10:36:19.033518315Z INFO setup Go OS/Arch: linux/amd64
2026-02-06T10:36:19.03352624Z INFO setup Operator Version: v0.11.0-19-g3477dbe4
2026-02-06T10:36:19.033533955Z INFO setup Git Commit: 3477dbe48d1c5f3e04213c5917ccb1013278d75f
2026-02-06T10:36:19.033539986Z INFO setup Build Date: 2026-02-06T09:27:02+00:00
2026-02-06T10:36:19.033545466Z INFO setup HTTP/2 for metrics and webhook server disabled
2026-02-06T10:36:19.033692033Z INFO setup OLM injected certs for webhooks not found
2026-02-06T10:36:19.04833439Z INFO utils-taints out of service taint strategy {"isSupported": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:36:19.0484458Z INFO utils-taints out of service taint strategy {"isGA": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:36:19.048463043Z INFO setup Starting as a self node remediation agent that should run as part of the daemonset
2026-02-06T10:36:19.083209261Z INFO setup init grpc server
2026-02-06T10:36:19.08326669Z INFO setup starting manager
2026-02-06T10:36:19.083658539Z INFO controller-runtime.metrics Starting metrics server
2026-02-06T10:36:19.083743018Z INFO starting server {"name": "health probe", "addr": "0.0.0.0:8081"}
2026-02-06T10:36:19.083890637Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}
2026-02-06T10:36:19.083914542Z INFO watchdog watchdog started
2026-02-06T10:36:19.084137853Z INFO Starting EventSource {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "source": "kind source: *v1alpha1.SelfNodeRemediation"}
2026-02-06T10:36:19.184579064Z INFO peers peer starting {"name": "knode6"}
2026-02-06T10:36:29.085426918Z ERROR peerhealth.server failed to get server credentials {"error": "Timeout: failed waiting for *v1.Secret Informer to sync"}
github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).Start
/workspace/pkg/peerhealth/server.go:66
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:226
2026-02-06T10:36:29.085890192Z INFO Stopping and waiting for non leader election runnables
2026-02-06T10:36:29.19165137Z INFO Stopping and waiting for leader election runnables
2026-02-06T10:36:29.191883258Z INFO Starting Controller {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:36:29.19213884Z INFO Starting workers {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "worker count": 1}
2026-02-06T10:36:29.191866105Z INFO watchdog disarmed watchdog
2026-02-06T10:36:29.192178254Z INFO Shutdown signal received, waiting for all workers to finish {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:36:29.192198572Z INFO All workers finished {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:36:29.192244389Z INFO Stopping and waiting for caches
2026-02-06T10:36:29.193198649Z INFO Stopping and waiting for webhooks
2026-02-06T10:36:29.193533111Z INFO Stopping and waiting for HTTP servers
2026-02-06T10:36:29.193560502Z INFO controller-runtime.metrics Shutting down metrics server with timeout of 1 minute
2026-02-06T10:36:29.193598163Z INFO shutting down server {"name": "health probe", "addr": "0.0.0.0:8081"}
2026-02-06T10:36:29.193667073Z INFO Wait completed, proceeding to shutdown the manager
2026-02-06T10:36:29.193685708Z ERROR setup problem running manager {"error": "Timeout: failed waiting for *v1.Secret Informer to sync"}
main.main
/workspace/main.go:168
runtime.main
/usr/local/go/src/runtime/proc.go:285
bash-5.1#
For example, when I set it to 30 seconds, rebuild and start we have:
2026-02-06T10:40:07.97615805Z INFO setup Go Version: go1.25.3
2026-02-06T10:40:07.9764264Z INFO setup Go OS/Arch: linux/amd64
2026-02-06T10:40:07.976435106Z INFO setup Operator Version: v0.11.0-19-g3477dbe4
2026-02-06T10:40:07.976443938Z INFO setup Git Commit: 3477dbe48d1c5f3e04213c5917ccb1013278d75f
2026-02-06T10:40:07.976453419Z INFO setup Build Date: 2026-02-06T09:48:32+00:00
2026-02-06T10:40:07.976459713Z INFO setup HTTP/2 for metrics and webhook server disabled
2026-02-06T10:40:07.976693578Z INFO setup OLM injected certs for webhooks not found
2026-02-06T10:40:07.993838317Z INFO utils-taints out of service taint strategy {"isSupported": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:40:07.993886906Z INFO utils-taints out of service taint strategy {"isGA": true, "k8sMajorVersion": 1, "k8sMinorVersion": 31}
2026-02-06T10:40:07.993906594Z INFO setup Starting as a self node remediation agent that should run as part of the daemonset
2026-02-06T10:40:08.035534838Z INFO setup init grpc server
2026-02-06T10:40:08.035606437Z INFO setup starting manager
2026-02-06T10:40:08.03838546Z INFO controller-runtime.metrics Starting metrics server
2026-02-06T10:40:08.038462153Z INFO starting server {"name": "health probe", "addr": "0.0.0.0:8081"}
2026-02-06T10:40:08.038601629Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}
2026-02-06T10:40:08.0386955Z INFO watchdog watchdog started
2026-02-06T10:40:08.038903095Z INFO Starting EventSource {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "source": "kind source: *v1alpha1.SelfNodeRemediation"}
2026-02-06T10:40:08.139247066Z INFO peers peer starting {"name": "knode6"}
2026-02-06T10:40:37.340403099Z INFO Starting Controller {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}
2026-02-06T10:40:37.340585961Z INFO Starting workers {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "worker count": 1}
2026-02-06T10:40:37.341658013Z INFO peerhealth.server peer health server started
Thanks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels