Skip to content

[BUG] Hard crash and reboot of the hubagent if ClusterResourceSnapshot failed to list due to timeoutΒ #717

@d4rkhunt33r

Description

@d4rkhunt33r

Describe the bug

Hard crash and reboot of the hubagent if ClusterResourceSnapshot failed to list due to timeout

Environment

Please provide the following:

  • Hub cluster details
    The hub cluster is an aws eks cluster

The hubagent was installed with the following values

replicaCount: 1
logVerbosity: 1
enableWebhook: false
webhookServiceName: fleetwebhook
enableGuardRail: false
webhookClientConnectionType: service
enableV1Alpha1APIs: false
enableV1Beta1APIs: true
resources:
  requests:
    cpu: 2
    memory: 8Gi
  limits:
    cpu: 4
    memory: 16Gi

To Reproduce

Steps to reproduce the behavior:

  • Install hubagent in the cluster
  • Create a lot of objetcs in order to make the request to kubectl get clusterresourcesnapshot take more than 30 seconds

You should see the hubagent container to reboot showing the following errores in the logs

I0308 16:31:41.079738       1 controller/controller.go:190] "Starting controller" controller="cluster-resource-placement-controller-v1beta1"
I0308 16:31:41.147758       1 informer/informermanager.go:152] "Disabled an informer for a disappeared resource" res={"GroupVersionKind":{"Group":"","Version":"v1","Kind":"Event"},"GroupVersionResource":{"Group":"","Version":"v1","Resource":"events"},"IsClusterScoped":false}
W0308 16:32:21.369841       1 cache/reflector.go:535] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1beta1.ClusterResourceSnapshot: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io)
I0308 16:32:21.369941       1 trace/trace.go:236] Trace[1547232315]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229 (08-Mar-2024 16:31:21.299) (total time: 60070ms):
Trace[1547232315]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io) 60069ms (16:32:21.369)
Trace[1547232315]: [1m0.070028406s] [1m0.070028406s] END
E0308 16:32:21.369964       1 cache/reflector.go:147] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1beta1.ClusterResourceSnapshot: failed to list *v1beta1.ClusterResourceSnapshot: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io)```

### **Expected behavior**
The pod should not be rebooted

### **Screenshots**
If applicable, add screenshots to help explain your problem.

### **Additional context**
I think this could be solved if there is an option to increase the timeout for the kubernetes client.

Sorry for my english :S

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions