Skip to content

Commit ed8f3e2

Browse files
committed
Make the validation opt-in
1 parent 79fb591 commit ed8f3e2

File tree

1 file changed

+24
-25
lines changed
  • keps/sig-auth/4872-harden-kubelet-cert-validation

1 file changed

+24
-25
lines changed

keps/sig-auth/4872-harden-kubelet-cert-validation/README.md

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,10 @@ This will require cluster administrators to reissue any non-conforming certifica
138138
### Risks and Mitigations
139139

140140
This could disrupt existing clusters that are using custom kubelet serving certificates.
141-
These clusters will need to reissue their certificates before enabling this feature. We will allow to disable the validation through a command-line flag to allow for a smooth transition.
141+
142+
In order to maintain compatibility by default with these clusters even after this feature goes GA, we will make it opt-in.
143+
144+
Before enabling this feature on clusters with custom kubelet serving certificates, cluster administrators will need to reissue those certificates.
142145

143146
## Design Details
144147

@@ -147,33 +150,29 @@ These clusters will need to reissue their certificates before enabling this feat
147150
We will introduce a feature flag `KubeletCertCNValidation` that will gate the usage of the new validation.
148151
This gate will start off by default in Alpha, will be turned on by default in Beta and will be removed in GA.
149152

150-
In addition, we will allow to disable the validation through a command-line flag `--disable-kubelet-cert-cn-validation`.
151-
This flag can only be set if the `KubeletCertCNValidation` feature flag is enabled.
152-
This flag will allow cluster administrators to opt-out of this validation if they are using custom kubelet serving certificates that don't follow the `system:node:<nodename>` convention even after the feature gate is removed.
153+
In addition, the validation will be opt-in and enabled through a new command-line flag `--enable-kubelet-cert-cn-validation`.
154+
This flag can only be set if the `KubeletCertCNValidation` feature flag is enabled and if `--kubelet-certificate-authority` is set.
155+
156+
Making the feature opt-in maintains compatibility with existing clusters using custom kubelet serving certificates that don't follow the `system:node:<nodename>` convention even after the feature gate is removed.
153157

154158
#### Metrics
155159

156160
In order to help cluster administrators determine if it's safe to enable the feature, we propose to add a new metric `kube_apiserver_validation_kubelet_cert_cn_errors` that will track the number of errors due to the new CN validation.
157161
In addition, we will log the error including the node name, so cluster administrators can identify which nodes are affected and need to reissue their certificates.
158162

159-
If the feature gate is disabled, we won't publish the metric or run any validation code at all.
163+
If the feature gate is disabled or if `--kubelet-certificate-authority` is not set, we won't publish the metric or run any validation code at all.
160164

161-
If the feature gate is enabled but the feature is disabled (with `--disable-kubelet-cert-cn-validation`), we will still add the validation code to the HTTP transport, however, if the validation fails we won't return an error, we will just increment the metric counter.
165+
If the feature gate is enabled, the kubelet CA is set (`--kubelet-certificate-authority`) but this feature is disabled, we will still run the validation code to collect the metric. However, if the validation fails we won't return an error, we will just increment the metric counter.
162166

163167
We intentionally don't add the node name to the metric to avoid a high cardinality.
164168
The purpose of the metric is to easily/cheaply tell administrators if they can flip the feature on or not. If the answer is no (counter is greater than 0), the rest of the necessary information to detect the offending nodes will come from logs.
165169

166-
167-
We will remove the metric once the feature is GA.
168-
169-
> TODO: let's discuss this in the review. We could consider adding the node name to the metric or even keeping the metric post GA if it's valuable.
170-
171170
### TLS insecure
172171

173172
Currently, if the Kube-API server is not configured with a `--kubelet-certificate-authority` the TLS client for kubelet server will skip the server certificate validation.
174173
Additionally, `logs` requests allow to configure `InsecureSkipTLSVerifyBackend` per request to skip the server certificate validation.
175174

176-
To align with this behavior, we won't execute the CN validation if `--kubelet-certificate-authority` is not set or if `InsecureSkipTLSVerifyBackend` is set to true.
175+
To align with this behavior, we won't allow to enable the validation if `--kubelet-certificate-authority` is not set and we won't execute the CN validation if `InsecureSkipTLSVerifyBackend` is set to true.
177176

178177
### Test Plan
179178

@@ -195,11 +194,12 @@ Existing test coverage for the packages we anticipate modifying:
195194
##### Integration tests
196195

197196
Integration tests will be added to ensure the following:
198-
* An error is returned if `--disable-kubelet-cert-cn-validation` is set but `KubeletCertCNValidation` feature flag is not enabled.
197+
* An error is returned if `--enable-kubelet-cert-cn-validation` is set but `KubeletCertCNValidation` feature flag is not enabled.
198+
* An error is returned if the feature `KubeletCertCNValidation` is enabled, `--enable-kubelet-cert-cn-validation` is set to true but `--kubelet-certificate-authority` is not set.
199199
* Validation for custom certificates works if feature flag is not enabled.
200-
* Validation for custom certificates works if feature flag enabled and `--disable-kubelet-cert-cn-validation` is set to true.
201-
* Validation for custom certificates fails if feature flag enabled and `--disable-kubelet-cert-cn-validation` is set to false or not set.
202-
* Validation for kubernetes issued certificates works if feature flag enabled and `--disable-kubelet-cert-cn-validation` is set to false or not set.
200+
* Validation for custom certificates works if feature flag enabled and `--enable-kubelet-cert-cn-validation` is not set or set to false.
201+
* Validation for custom certificates fails if feature flag enabled, `--kubelet-certificate-authority` is set and `--enable-kubelet-cert-cn-validation` is set to true.
202+
* Validation for kubernetes issued certificates works if feature flag enabled, `--kubelet-certificate-authority` is set and `--enable-kubelet-cert-cn-validation` is set to true.
203203

204204
##### e2e tests
205205

@@ -222,9 +222,7 @@ We believe is likely end-to-end tests won't be needed as unit and integration te
222222

223223
### Upgrade / Downgrade Strategy
224224

225-
Once feature flag is on by default (starting in Beta), administrators using custom serving certs
226-
can use the proposed flag to disable the extra validation and maintain current behavior.
227-
They will be able to use this flag even after the feature flag is removed.
225+
The feature is opt-in and it can be disabled at any time by just not setting the `--enable-kubelet-cert-cn-validation` flag.
228226

229227
### Version Skew Strategy
230228

@@ -240,16 +238,17 @@ Not applicable.
240238
- Feature gate name: `KubeletCertCNValidation`
241239
- Components depending on the feature gate: kube-apiserver
242240
- [x] Other
243-
- Describe the mechanism: kube-apiserver command-line flag `--disable-kubelet-cert-cn-validation`
241+
- Describe the mechanism: kube-apiserver command-line flag `--enable-kubelet-cert-cn-validation`
244242
- Will enabling / disabling the feature require downtime of the control
245243
plane? No. But requires restarting the kube-apiserver.
246244
- Will enabling / disabling the feature require downtime or reprovisioning
247245
of a node? No.
248246

249247
###### Does enabling the feature change any default behavior?
250248

251-
Yes. If a cluster is using custom kubelet serving certificates that don't follow the same convention as kubernetes issued certificates (CN is `system:node:<node-name>`),
252-
enabling this feature will make any connection initiated by the kube-api server fail (logs, exec and port-forwarding).
249+
Enabling the feature gate doesn't change any behavior.
250+
251+
Enabling the validation does change the default certificate validation behavior.
253252

254253
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
255254

@@ -289,7 +288,7 @@ No.
289288
###### How can an operator determine if the feature is in use by workloads?
290289

291290
The cluster administrators can check the flags passed to the kube-apiserver if they have access to the control plane nodes.
292-
If the `--disable-kubelet-cert-cn-validation` flag is not set or set to false, the feature is being used.
291+
If the `--enable-kubelet-cert-cn-validation` flag set to true, the feature is being used.
293292
Alternatively the can check the `kubernetes_feature_enabled` metric.
294293

295294
###### How can someone using this feature know that it is working for their instance?
@@ -367,7 +366,7 @@ It's part of the API server, so the feature will be unavailable.
367366

368367
- [API server can't connect to Nodes with custom kubelet serving certificates that don't follow the `system:node:<node-name>` convention]
369368
- Detection: `kubectl logs` returns a certificate validation error.
370-
- Mitigations: disable the validation with the `--disable-kubelet-cert-cn-validation` flag.
369+
- Mitigations: disable the validation byt not setting `--enable-kubelet-cert-cn-validation` flag.
371370
- Diagnostics: error is returned by the API server, no additional logging needed.
372371
- Testing: We will have tests for this, this is basically testing that the feature works.
373372

@@ -377,7 +376,7 @@ It's part of the API server, so the feature will be unavailable.
377376

378377
## Drawbacks
379378

380-
This could disrupt clusters that are using custom kubelet serving certificates. These clusters will need to reissue their certificates before enabling this feature.
379+
None.
381380

382381
## Alternatives
383382

0 commit comments

Comments
 (0)