Replies: 3 comments 3 replies
-
More precisely, for example LIST requests for pods are in the format of:
These type of requests make full LIST call to Etcd and then Kubernetes API Server makes filtering. I've observed 10 QPS, which is quite significant for such expensive calls (not including LISTs for other resources like secrets etc) especially when there are tens of thousands of pods in cluster. |
Beta Was this translation helpful? Give feedback.
-
I think you should probably start by providing more details about how you use Strimzi and proper logs to see what where and how is happening. That might allow us to locate the exact area andsee if something can be done about it or not. |
Beta Was this translation helpful? Give feedback.
-
I will reach out to customer to see if they can provide that information. I was investigating it from Kubernetes Control Plane point of view and unfortunately I am unable to say how it was used/deployed etc. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
strimzi-cluster-operator/0.31.1 does not use Kubernetes API Server cache for listing resources. All list calls go directly to Etcd, which puts significant load on Etcd causing Kubernetes Control Plane instability.
Example logs from Kubernetes API Server:
Similarly, Strimzi is also listing configmaps/pods/persistentvolumeclaims/services/...
Short term mitigation:
For each LIST/GET request set resourceVersion=0 to use Kubernetes API Server cache. This will allow requests to be served from Kubernetes API Server cache without interaction with Etcd.
Long term solution:
Migrate to use List and Watch pattern.
Relevant documentation: https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
https://cloud.google.com/kubernetes-engine/docs/concepts/planning-scalability#use_list_and_watch_pattern_instead_of_periodic_listing
Expected behavior
By default, strimzi should use Kubernetes API Server cache and ideally List and Watch pattern instead of repeatable LIST calls.
Environment (please complete the following information):
Additional context
Related issue that I've opened in Fabric8: fabric8io/kubernetes-client#4670
Beta Was this translation helpful? Give feedback.
All reactions