You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-auth/4872-harden-kubelet-cert-validation/README.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -100,7 +100,7 @@ If a rogue node obtained a certificate for an IP it does not own and reroute tra
100
100
101
101
Provided an actor with control of a node can impersonate another node, the impact would be:
102
102
103
-
* Break confidentiality of the requests sent by the Kube-API server to the kubelet (e.g kubectl exec/logs).These are usually user-driven requests. That gives the threat actor the possibility of producing incorrect or mis-leading feedback. In the exec case, it could allow a threat actor to issue prompts for credentials. In addition, the exec commands might contain user secrets.
103
+
* Break confidentiality of the requests sent by the Kube-API server to the kubelet (e.g kubectl exec/logs).These are usually user-driven requests. That gives the threat actor the possibility of producing incorrect or misleading feedback. In the exec case, it could allow a threat actor to issue prompts for credentials. In addition, the exec commands might contain user secrets.
104
104
* Break confidentiality of credentials if the client uses token based authentication. This is probably more common for non Kube-API server clients, given mTLS is common for Kube-API server to kubelet communication.
105
105
106
106
### Goals
@@ -114,7 +114,7 @@ Provided an actor with control of a node can impersonate another node, the impac
114
114
115
115
## Proposal
116
116
117
-
We propose that the Kube API server is modified to validate the Common Name (CN) of the kubelet's serving certificate is equal to `system:node:<nodename>`.
117
+
We propose that the Kube API server is modified to validate the Common Name (CN) of the kubelet's serving certificate to be equal to `system:node:<nodename>`.
118
118
`nodename` is the name of the Node object as reported by the kubelet. When the Kube-API server connects to the kubelet server (e.g. for logs, exec, port-forward), it always knows the Node it's connecting to.
119
119
120
120
### User Stories (Optional)
@@ -148,7 +148,7 @@ Before enabling this feature on clusters with custom kubelet serving certificate
148
148
### Enabling the feature
149
149
150
150
We will introduce a feature flag `KubeletCertCNValidation` that will gate the usage of the new validation.
151
-
This gate will start off by default in Alpha, will be turned on by default in Beta and will be removed in GA.
151
+
This gate will start disabled by default in Alpha, will be turned on by default in Beta and will be removed in GA.
152
152
153
153
In addition, the validation will be opt-in and enabled through a new command-line flag `--enable-kubelet-cert-cn-validation`.
154
154
This flag can only be set if the `KubeletCertCNValidation` feature flag is enabled and if `--kubelet-certificate-authority` is set.
@@ -157,7 +157,7 @@ Making the feature opt-in maintains compatibility with existing clusters using c
157
157
158
158
#### Metrics
159
159
160
-
In order to help cluster administrators determine if it's safe to enable the feature, we propose to add a new metric `kube_apiserver_validation_kubelet_cert_cn_total`. We will have two labels `success` and `failure`, allowing to track the number of errors due to the new CN validation.
160
+
In order to help cluster administrators determine if it's safe to enable the feature, we propose to add a new metric `kube_apiserver_validation_kubelet_cert_cn_total`. We will have two labels `success` and `failure`, allowing us to track the number of errors due to the new CN validation.
161
161
In addition, we will log the error including the node name, so cluster administrators can identify which nodes are affected and need to reissue their certificates.
162
162
163
163
If the feature gate is disabled or if `--kubelet-certificate-authority` is not set, we won't publish the metric or run any validation code at all.
@@ -170,9 +170,9 @@ The purpose of the metric is to easily/cheaply tell administrators if they can f
170
170
### TLS insecure
171
171
172
172
Currently, if the Kube-API server is not configured with a `--kubelet-certificate-authority` the TLS client for kubelet server will skip the server certificate validation.
173
-
Additionally, `logs` requests allow to configure`InsecureSkipTLSVerifyBackend` per request to skip the server certificate validation.
173
+
Additionally, `logs` requests allow configuring`InsecureSkipTLSVerifyBackend` per request to skip the server certificate validation.
174
174
175
-
To align with this behavior, we won't allow to enable the validation if `--kubelet-certificate-authority` is not set and we won't execute the CN validation if `InsecureSkipTLSVerifyBackend` is set to true.
175
+
To align with this behavior, we won't allow enabling the validation if `--kubelet-certificate-authority` is not set and we won't execute the CN validation if `InsecureSkipTLSVerifyBackend` is set to true.
176
176
177
177
### Test Plan
178
178
@@ -198,10 +198,10 @@ On top of testing the validation itself, we will test that:
198
198
##### Integration tests
199
199
200
200
Integration tests will be added to ensure the following:
201
-
* Validation for custom certificates works if feature flag is not enabled.
202
-
* Validation for custom certificates works if feature flag enabled and `--enable-kubelet-cert-cn-validation` is not set or set to false.
203
-
* Validation for custom certificates fails if feature flag enabled, `--kubelet-certificate-authority` is set and `--enable-kubelet-cert-cn-validation` is set to true.
204
-
* Validation for kubernetes issued certificates works if feature flag enabled, `--kubelet-certificate-authority` is set and `--enable-kubelet-cert-cn-validation` is set to true.
201
+
* Validation for custom certificates works if the feature flag is not enabled.
202
+
* Validation for custom certificates works if the feature flag is enabled and `--enable-kubelet-cert-cn-validation` is not set or set to false.
203
+
* Validation for custom certificates fails if the feature flag is enabled, `--kubelet-certificate-authority` is set and `--enable-kubelet-cert-cn-validation` is set to true.
204
+
* Validation for kubernetes issued certificates works if the feature flag is enabled, `--kubelet-certificate-authority` is set and `--enable-kubelet-cert-cn-validation` is set to true.
205
205
206
206
##### e2e tests
207
207
@@ -255,7 +255,7 @@ Enabling the validation does change the default certificate validation behavior.
255
255
256
256
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
257
257
258
-
Yes, the feature can be disabled once enabled by just setting the command-line flag to true.
258
+
Yes, the feature can be disabled once enabled by not setting the command-line flag.
259
259
260
260
###### What happens if we reenable the feature if it was previously rolled back?
261
261
@@ -275,7 +275,7 @@ Already running workloads won't be impacted but cluster users won't be able to a
275
275
276
276
###### What specific metrics should inform a rollback?
277
277
278
-
`kube_apiserver_validation_kubelet_cert_cn_total` can help inform a rollback. A non-zero value for the `failure` label will require invetsigation: if the rejected requests are going to legitimate nodes, the feature should be rolled back until kuebeler serving certificates are reissued.
278
+
`kube_apiserver_validation_kubelet_cert_cn_total` can help inform a rollback. A non-zero value for the `failure` label will require investigation: if the rejected requests are going to legitimate nodes, the feature should be rolled back until kubelet serving certificates are reissued.
279
279
280
280
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
281
281
@@ -291,8 +291,8 @@ No.
291
291
###### How can an operator determine if the feature is in use by workloads?
292
292
293
293
The cluster administrators can check the flags passed to the kube-apiserver if they have access to the control plane nodes.
294
-
If the `--enable-kubelet-cert-cn-validation` flag set to true, the feature is being used.
295
-
Alternatively the can check the `kubernetes_feature_enabled` metric.
294
+
If the `--enable-kubelet-cert-cn-validation` flag is set to true, the feature is being used.
295
+
Alternatively, they can check the `kubernetes_feature_enabled` metric.
296
296
297
297
###### How can someone using this feature know that it is working for their instance?
298
298
@@ -302,7 +302,7 @@ Alternatively the can check the `kubernetes_feature_enabled` metric.
302
302
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
303
303
304
304
The average `apiserver_request_duration_seconds` for logs/exec/port-forward requests is within reasonable limits.
305
-
A raising value after enabling this feature could signal overhead introduced by the extra validation.
305
+
A rising value after enabling this feature could signal overhead introduced by the extra validation.
306
306
307
307
In addition, the number of TLS connections made from API server to nodes should not increase.
308
308
@@ -366,15 +366,15 @@ It's part of the API server, so the feature will be unavailable.
366
366
367
367
-[API server can't connect to Nodes with custom kubelet serving certificates that don't follow the `system:node:<node-name>` convention]
368
368
- Detection: `kubectl logs` returns a certificate validation error.
369
-
- Mitigations: disable the validation byt not setting `--enable-kubelet-cert-cn-validation` flag.
369
+
- Mitigations: disable the validation by not setting `--enable-kubelet-cert-cn-validation` flag.
370
370
- Diagnostics: error is returned by the API server, no additional logging needed.
371
371
- Testing: We will have tests for this, this is basically testing that the feature works.
372
372
373
373
###### What steps should be taken if SLOs are not being met to determine the problem?
0 commit comments