Skip to content

Commit b475eb8

Browse files
authored
Rework control plane policies (#647)
* Switch control plane policies to correct namespace * Remove high-cpu pool scheduling * Disable VPA for etcd and kube-apiserver * Disable HPA for kube-apiserver * Disable kube-controller-manager client-side rate limiting * Drop obsolete kyverno tests and CLI * Update etcd labels
1 parent a3bb8d8 commit b475eb8

21 files changed

+32
-328
lines changed

Makefile

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,10 +82,6 @@ test: ## Run unit tests.
8282
test-integration: $(SETUP_ENVTEST) ## Run integration tests.
8383
./hack/test-integration.sh ./test/integration/...
8484

85-
.PHONY: test-kyverno
86-
test-kyverno: $(KYVERNO) ## Run kyverno policy tests.
87-
$(KYVERNO) test --remove-color -v 4 .
88-
8985
.PHONY: test-e2e
9086
test-e2e: $(GINKGO) ## Run e2e tests.
9187
./hack/test-e2e.sh $(GINKGO_FLAGS) ./test/e2e/... ./webhosting-operator/test/e2e/...
@@ -102,7 +98,7 @@ lint: $(GOLANGCI_LINT) ## Run golangci-lint against code.
10298
$(GOLANGCI_LINT) run ./... ./webhosting-operator/...
10399

104100
.PHONY: check
105-
check: lint test test-integration test-kyverno ## Check everything (lint + test + test-integration + test-kyverno).
101+
check: lint test test-integration ## Check everything (lint + test + test-integration).
106102

107103
.PHONY: verify-fmt
108104
verify-fmt: fmt ## Verify go code is formatted.

docs/evaluation.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,8 @@ In addition to the described components, [kyverno](https://github.com/kyverno/ky
124124
In the cluster itself, kyverno policies are used for scheduling the sharder and webhosting-operator to the dedicated `sharding` worker pool and experiment to the dedicated `experiment` worker pool.
125125
This makes sure that these components run on machines isolated from other system components and don't content for compute resources during load tests.
126126

127-
Furthermore, kyverno policies are added to the control plane to ensure a static size of etcd, kube-apiserver, and kube-controller-manager (requests=limits for guaranteed resources, disable vertical autoscaling, 4 replicas of kube-apiserver to disable horizontal autoscaling) and schedule them to a dedicated worker pool using a non-overcommit flavor with more CPU cores per machine.
127+
Furthermore, kyverno policies are added to the control plane to ensure a static size of etcd, kube-apiserver, and kube-controller-manager (requests=limits for guaranteed resources, disable vertical autoscaling, 4 replicas of kube-apiserver and disable horizontal autoscaling).
128+
Also, kube-controller-manager's client-side rate limiting is disabled (ref https://github.com/timebertt/kubernetes-controller-sharding/pull/610, [SIG api-machinery recommendation](https://kubernetes.slack.com/archives/C0EG7JC6T/p1680889646346859?thread_ts=1680791299.631439&cid=C0EG7JC6T)) and HTTP/2 is disabled so that API requests are distributed across API server instances (ref https://github.com/gardener/gardener/issues/8810).
128129
This is done to make load test experiments more stable and their results more reproducible.
129130

130131
## Measurements

hack/config/policy/controlplane/etcd-main.yaml

Lines changed: 10 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ apiVersion: kyverno.io/v1
22
kind: Policy
33
metadata:
44
name: etcd-main
5-
namespace: shoot--timebertt--sharding
5+
namespace: shoot--ixywdlfvei--sharding
66
spec:
77
failurePolicy: Fail
88
rules:
@@ -15,8 +15,7 @@ spec:
1515
- Pod
1616
selector:
1717
matchLabels:
18-
instance: etcd-main
19-
name: etcd
18+
app.kubernetes.io/name: etcd-main
2019
mutate:
2120
patchStrategicMerge:
2221
spec:
@@ -33,22 +32,16 @@ spec:
3332
env:
3433
- name: GOMAXPROCS
3534
value: "12"
36-
# schedule etcd-main on high-cpu worker pool for stable performance
37-
- name: add-scheduling-constraints
35+
- name: disable-vpa
3836
match:
3937
any:
4038
- resources:
4139
kinds:
42-
- Pod
43-
selector:
44-
matchLabels:
45-
instance: etcd-main
46-
name: etcd
40+
- VerticalPodAutoscaler
41+
names:
42+
- etcd-main
4743
mutate:
48-
patchesJson6902: |-
49-
- op: add
50-
path: "/spec/tolerations/-"
51-
value: {"key":"high-cpu","operator":"Equal","value":"true","effect":"NoSchedule"}
52-
- op: replace
53-
path: "/spec/affinity/nodeAffinity/requiredDuringSchedulingIgnoredDuringExecution/nodeSelectorTerms"
54-
value: [{"matchExpressions": [{"key":"high-cpu","operator":"In","values":["true"]}]}]
44+
patchStrategicMerge:
45+
spec:
46+
updatePolicy:
47+
updateMode: Off

hack/config/policy/controlplane/kube-apiserver-scale.yaml

Lines changed: 0 additions & 31 deletions
This file was deleted.

hack/config/policy/controlplane/kube-apiserver.yaml

Lines changed: 15 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,24 @@ apiVersion: kyverno.io/v1
22
kind: Policy
33
metadata:
44
name: kube-apiserver
5-
namespace: shoot--timebertt--sharding
5+
namespace: shoot--ixywdlfvei--sharding
66
spec:
77
failurePolicy: Fail
88
rules:
99
# set static replicas on kube-apiserver to ensure similar evaluation environment between load test runs
10-
# if the cluster is hibernated (spec.replicas=0), this rule is skipped
11-
- name: replicas
10+
- name: disable-hpa
1211
match:
1312
any:
1413
- resources:
1514
kinds:
16-
- Deployment
17-
selector:
18-
matchLabels:
19-
app: kubernetes
20-
role: apiserver
21-
preconditions:
22-
all:
23-
# Only patch spec.replicas if the control plane is not hibernated, i.e., if spec.replicas>=1.
24-
# NB: gardenlet deploys kube-apiserver with spec.replicas=null which is defaulted after the policy webhook call
25-
# to spec.replicas=1. Hence, treat spec.replicas=null the same way as spec.replicas=1.
26-
- key: "{{ request.object.spec.replicas || `1` }}"
27-
operator: GreaterThan
28-
value: 0
15+
- HorizontalPodAutoscaler
16+
names:
17+
- kube-apiserver
2918
mutate:
3019
patchStrategicMerge:
3120
spec:
32-
replicas: 4
21+
minReplicas: 4
22+
maxReplicas: 4
3323
# set static requests/limits on kube-apiserver to ensure similar evaluation environment between load test runs
3424
- name: resources
3525
match:
@@ -57,22 +47,16 @@ spec:
5747
env:
5848
- name: GOMAXPROCS
5949
value: "12"
60-
# schedule kube-apiserver on high-cpu worker pool for stable performance
61-
- name: add-scheduling-constraints
50+
- name: disable-vpa
6251
match:
6352
any:
6453
- resources:
6554
kinds:
66-
- Pod
67-
selector:
68-
matchLabels:
69-
app: kubernetes
70-
role: apiserver
55+
- VerticalPodAutoscaler
56+
names:
57+
- kube-apiserver-vpa
7158
mutate:
72-
patchesJson6902: |-
73-
- op: add
74-
path: "/spec/tolerations/-"
75-
value: {"key":"high-cpu","operator":"Equal","value":"true","effect":"NoSchedule"}
76-
- op: add
77-
path: "/spec/affinity/nodeAffinity/requiredDuringSchedulingIgnoredDuringExecution/nodeSelectorTerms"
78-
value: [{"matchExpressions": [{"key":"high-cpu","operator":"In","values":["true"]}]}]
59+
patchStrategicMerge:
60+
spec:
61+
updatePolicy:
62+
updateMode: Off

hack/config/policy/controlplane/kube-controller-manager.yaml

Lines changed: 4 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ apiVersion: kyverno.io/v1
22
kind: Policy
33
metadata:
44
name: kube-controller-manager
5-
namespace: shoot--timebertt--sharding
5+
namespace: shoot--ixywdlfvei--sharding
66
spec:
77
failurePolicy: Ignore
88
rules:
@@ -46,27 +46,8 @@ spec:
4646
spec:
4747
updatePolicy:
4848
updateMode: Off
49-
# schedule kube-controller-manager on high-cpu worker pool for stable performance
50-
- name: add-scheduling-constraints
51-
match:
52-
any:
53-
- resources:
54-
kinds:
55-
- Pod
56-
selector:
57-
matchLabels:
58-
app: kubernetes
59-
role: controller-manager
60-
mutate:
61-
patchesJson6902: |-
62-
- op: add
63-
path: "/spec/tolerations/-"
64-
value: {"key":"high-cpu","operator":"Equal","value":"true","effect":"NoSchedule"}
65-
- op: add
66-
path: "/spec/affinity/nodeAffinity/requiredDuringSchedulingIgnoredDuringExecution/nodeSelectorTerms"
67-
value: [{"matchExpressions": [{"key":"high-cpu","operator":"In","values":["true"]}]}]
68-
# increases kube-controller-manager's client-side rate limits to speed up garbage collection after executing load tests
69-
- name: increase-rate-limits
49+
# disable kube-controller-manager's client-side rate limits similar to webhosting-operator
50+
- name: disable-rate-limits
7051
match:
7152
any:
7253
- resources:
@@ -78,10 +59,7 @@ spec:
7859
patchesJson6902: |-
7960
- op: add
8061
path: /spec/template/spec/containers/0/command/-
81-
value: "--kube-api-qps=2000"
82-
- op: add
83-
path: /spec/template/spec/containers/0/command/-
84-
value: "--kube-api-burst=2200"
62+
value: "--kube-api-qps=-1"
8563
# disable HTTP2 in kube-controller-manager's so that API requests are distributed across API server instances
8664
- name: disable-http2
8765
match:

hack/config/policy/controlplane/kustomization.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,4 @@ kind: Kustomization
77
resources:
88
- etcd-main.yaml
99
- kube-apiserver.yaml
10-
- kube-apiserver-scale.yaml
1110
- kube-controller-manager.yaml

hack/config/policy/controlplane/tests/kube-apiserver-scale-awake/kyverno-test.yaml

Lines changed: 0 additions & 18 deletions
This file was deleted.

hack/config/policy/controlplane/tests/kube-apiserver-scale-awake/scale.yaml

Lines changed: 0 additions & 7 deletions
This file was deleted.

hack/config/policy/controlplane/tests/kube-apiserver-scale-awake/scale_expected.yaml

Lines changed: 0 additions & 7 deletions
This file was deleted.

0 commit comments

Comments
 (0)