Skip to content

Commit 467165c

Browse files
committed
respond to review comments
Signed-off-by: Mike Brown <[email protected]>
1 parent b37e36d commit 467165c

File tree

2 files changed

+41
-26
lines changed

2 files changed

+41
-26
lines changed

keps/sig-node/2535-ensure-secret-pulled-images/README.md

Lines changed: 41 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
6161

6262
## Summary
6363

64-
We will add support in kubelet for the pullIfNotPresent image pull policy, for
64+
We will add support in kubelet for the `pullIfNotPresent` image pull policy, for
6565
ensuring images pulled with pod imagePullSecrets are re-authenticated for other
6666
pods that do not have the same imagePullSecret/auths used to successfully pull
6767
the images in the first place.
@@ -88,6 +88,8 @@ order to use a present image.
8888
This means that the image pull policy alwaysPull would no longer be required in
8989
every scenario to ensure image access rights by pods.
9090

91+
*** The issue and these changes improving the security posture without requiring the forcing of pull always, will be documented in the kubernetes image pull policy documentation. The new feature gate should also be documented in release notes. ***
92+
9193
## Motivation
9294

9395
There have been customer requests for improving upon kubernetes' ability to
@@ -159,11 +161,11 @@ to set the feature gate to true to gain these this Secure by Default benefit.
159161
### Risks and Mitigations
160162

161163
Image authentications with a registry may expire. To mitigate expirations a
162-
a timeout could be used to force re-authentication. The timeout could be a
164+
a timeout will be used to force re-authentication. The timeout could be a
163165
container runtime feature or a `kubelet` feature. If at the container runtime,
164166
images would not be present during the EnsureImagesExist step, thus would have
165167
to be pulled and authenticated if necessary. This timeout feature will be
166-
implemented in beta.
168+
implemented in alpha.
167169

168170
Since images can be pre-loaded, loaded outside the `kubelet` process, and
169171
garbage collected.. the list of images that required authentication in `kubelet`
@@ -180,13 +182,19 @@ or expect preloaded images since boot.
180182

181183
Kubelet will track, in memory, a hash map for the credentials that were successfully used to pull an image. It has been decided that the hash map will be persisted to disk, in alpha.
182184

185+
The persisted "cache" will undergo cleanup operations on a timely basis (by default once an hour).
186+
187+
The persistence of the on storage cache is mainly for restarting kubelet and/or node reboot.
188+
189+
The max size of the cache will scale with the number of unique cache entries * the number of unique images that have not been garbage collected. It is not expected that this will be a significant number of bytes. Will be verified by actual use in Alpha and subsequent metrics in Beta.
190+
183191
See `/var/lib/kubelet/image_manager_state` in [kubernetes/kubernetes#114847](https://github.com/kubernetes/kubernetes/pull/114847)
184192

185193
> ```
186194
> {
187195
> "images": {
188196
> "sha256:eb6cbbefef909d52f4b2b29f8972bbb6d86fc9dba6528e65aad4f119ce469f7a": {
189-
> "authHash": {
197+
> "authHash": { ** per review comment use SHA256 here vs hash **
190198
> "115b8808c3e7f073": {
191199
> "ensured": true,
192200
> "dueDate": "2023-05-30T05:26:53.76740982+08:00"
@@ -203,7 +211,7 @@ See PR linked above for detailed design / behavior documentation.
203211
### Test Plan
204212
205213
For alpha, exhaustive Kubelet unit tests will be provided. Functions affected by the feature gate will be run with the feature gate on and with the feature gate off. Unit buckets will be provided for:
206-
- HashAuth - (new, small) returns a hash code for a CRI pull image auth [link](https://github.com/kubernetes/kubernetes/pull/94899/files#diff-ca08601dfd2fdf846f066d0338dc332beddd5602ab3a71b8fac95b419842da63R704-R751)
214+
- HashAuth - (new, small) returns a hash code for a CRI pull image auth [link](https://github.com/kubernetes/kubernetes/pull/94899/files#diff-ca08601dfd2fdf846f066d0338dc332beddd5602ab3a71b8fac95b419842da63R704-R751) ** per review comment will use SHA256 **
207215
- shouldPullImage - (modified, large sized change) determines if image should be pulled based on presence, and image pull policy, and now with the feature gate on if the image has been pulled/ensured by a secret. A unit test bucket did not exist for this function. The unit bucket will cover a matrix for:
208216
```
209217
pullIfNotPresent := &v1.Container{
@@ -230,7 +238,7 @@ For alpha, exhaustive Kubelet unit tests will be provided. Functions affected by
230238
```
231239
[TestShouldPullImage link](https://github.com/kubernetes/kubernetes/pull/94899/files#diff-7297f08c72da9bf6479e80c03b45e24ea92ccb11c0031549e51b51f88a91f813R311-R438)
232240
233-
PersistHashMeta()
241+
PersistHashMeta() ** will be persisting SHA256 entries vs hash **
234242
235243
At beta we should revisit if integration buckets are warranted for e2e node and/or cri-tools/critest, and after gathering feedback.
236244
@@ -244,16 +252,21 @@ At beta we should revisit if integration buckets are warranted for e2e node and/
244252
#### Deprecation
245253
246254
N/A in alpha
255+
TBD subsequent to alpha
247256
248257
### Upgrade / Downgrade Strategy
249258
250259
### Version Skew Strategy
251260
252261
N/A for alpha
262+
TBD subsequent to alpha
253263
254264
## Production Readiness Review Questionnaire
255265
256266
### Feature Enablement and Rollback
267+
- At Alpha this feature will be disabled by default with a feature gate.
268+
- At Beta this feature will be enabled by default with the feature gate.
269+
- At GA the ability to gate the feature will be removed leaving the feature enabled.
257270
258271
###### How can this feature be enabled / disabled in a live cluster?
259272
@@ -274,40 +287,44 @@ Yes.
274287
275288
Will go back to working as designed.
276289
290+
enj comment: Admin would need to go back to whatever old way they were using to enforce this image pull auth check. And also, as the feature is rolling out to kubelets (which is slow), they need to retain any API server based checks until rollout has completed.
291+
277292
###### Are there any tests for feature enablement/disablement?
278293
279294
Yes, tests run both enabled and disabled.
280295
281296
### Rollout, Upgrade and Rollback Planning
282-
N/A
297+
TBD
283298
284299
###### How can a rollout or rollback fail? Can it impact already running workloads?
285300
286-
N/A
301+
TBD
287302
288303
###### What specific metrics should inform a rollback?
289304
290-
N/A
305+
TBD needed for Beta
291306
292307
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
293308
294-
N/A
309+
TBD
295310
296311
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
297312
298-
N/A
313+
TBD
299314
300315
### Monitoring Requirements
301316
302-
N/A
317+
TBD
303318
304319
###### How can an operator determine if the feature is in use by workloads?
305320
306-
Can check if images pulled with credentials by a first pod, are also pulled with credentials by a second pod that is
321+
For alpha can check if images pulled with credentials by a first pod, are also pulled with credentials by a second pod that is
307322
using the pull if not present image pull policy. Will show up as network events. Though only the manifests will be
308323
revalidated against the container image repository, large contents will not be pulled. Thus one could monitor traffic
309324
to the registry.
310325
326+
For beta will add metrics allowing an admin to determine how often an image has been reauthenticated to an image registry because of cache expiration or due to reuse across pods that have different authentication information. Success metrics will also be provided highlighting cache hits.
327+
311328
###### How can someone using this feature know that it is working for their instance?
312329
313330
Can test for an image pull failure event coming from a second pod that does not have credentials to pull the image
@@ -319,27 +336,27 @@ where the image is present and the image pull policy is if not present.
319336
320337
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
321338
322-
N/A
339+
TBD
323340
324341
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
325342
326-
N/A
343+
TBD
327344
328345
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
329346
330-
N/A
347+
TBD needed for Beta
331348
332349
### Dependencies
333350
334-
N/A for alpha
351+
TBD
335352
336353
###### Does this feature depend on any specific services running in the cluster?
337354
338355
No.
339356
340357
### Scalability
341358
342-
N/A
359+
TBD
343360
344361
###### Will enabling / using this feature result in any new API calls?
345362
@@ -370,27 +387,27 @@ When switched on see above.
370387
371388
### Troubleshooting
372389
373-
N/A
390+
TBD
374391
375392
###### How does this feature react if the API server and/or etcd is unavailable?
376393
377-
N/A
394+
TBD
378395
379396
###### What are other known failure modes?
380397
381-
N/A
398+
TBD
382399
383400
###### What steps should be taken if SLOs are not being met to determine the problem?
384401
385402
Check logs.
386403
387404
## Implementation History
388405
389-
tbd
406+
TBD
390407
391408
## Drawbacks [optional]
392409
393-
Why should this KEP _not_ be implemented. N/A
410+
Why should this KEP _not_ be implemented. TBD
394411
395412
## Alternatives [optional]
396413
@@ -402,4 +419,4 @@ ensure the image instead of kubelet.
402419
403420
## Infrastructure Needed [optional]
404421
405-
tbd
422+
TBD

keps/sig-node/2535-ensure-secret-pulled-images/kep.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@ reviewers:
1515
approvers:
1616
- "@dchen1107"
1717
- "@derekwaynecarr"
18-
prr-approvers:
19-
- "@johnbelarmic"
2018
stage: alpha
2119
latest-milestone: "v1.28"
2220
milestone:

0 commit comments

Comments
 (0)