Skip to content

Commit 91360e2

Browse files
authored
Merge pull request #4431 from pacoxu/ensure-image-pull-secret
kep-2535: not persist on disk and add PullImageSecretRecheckDuration flag to define different behaviors
2 parents 4983957 + 34df433 commit 91360e2

File tree

2 files changed

+56
-55
lines changed

2 files changed

+56
-55
lines changed

keps/sig-node/2535-ensure-secret-pulled-images/README.md

Lines changed: 50 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@
1515
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
1616
- [Risks and Mitigations](#risks-and-mitigations)
1717
- [Design Details](#design-details)
18+
- [Test Plan](#test-plan)
19+
- [Prerequisite testing updates](#prerequisite-testing-updates)
20+
- [Unit tests](#unit-tests)
1821
- [Integration tests](#integration-tests)
1922
- [e2e tests](#e2e-tests)
2023
- [Graduation Criteria](#graduation-criteria)
@@ -54,13 +57,10 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
5457
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
5558
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5659

57-
5860
[kubernetes.io]: https://kubernetes.io/
5961
[kubernetes/enhancements]: https://git.k8s.io/enhancements
60-
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
6162
[kubernetes/website]: https://git.k8s.io/website
6263

63-
6464
## Summary
6565

6666
We will add support in kubelet for the `pullIfNotPresent` image pull policy, for
@@ -133,6 +133,9 @@ runtimes through the CRI wrt. how they should treat the caching of images on a
133133
node. Such as store for public use but only if encrypted. Or Store for private
134134
use un-encrypted...
135135

136+
This feature will not change the behavior of pod with image pull policy `Always`
137+
and `Never`.
138+
136139
## Proposal
137140

138141
For alpha `kubelet` will keep a list, across reboots of host and restart of
@@ -143,18 +146,18 @@ For beta an API will be considered to manage the ensure metadata.
143146
`kubelet` will ensure any image in the list is always pulled if an authentication
144147
used is not present, thus enforcing authentication / re-authentication.
145148

146-
147149
### User Stories
148150

149151
#### Story 1
152+
150153
User with multiple tenants will be able to support all image pull policies without
151154
concern that one tenant will gain access to an image that they don't have rights to.
152155

153156
#### Story 2
157+
154158
User will will no longer have to inject the Pull Always Image Pull Policy to
155159
ensure all tenants have rights to the images that are already present on a host.
156160

157-
158161
### Notes/Constraints/Caveats (Optional)
159162

160163
With the default of the feature gate being off, users / cloud providers will have
@@ -173,42 +176,36 @@ Since images can be pre-loaded, loaded outside the `kubelet` process, and
173176
garbage collected.. the list of images that required authentication in `kubelet`
174177
will not be a source of truth for how all images were pulled that are in the
175178
container runtime cache. To mitigate, images can be garbage collected at boot.
176-
And we will persist ensure metadata across reboot of host, and restart
179+
And for alpha, we will not persist ensure metadata across reboot of host, and restart
177180
of kubelet, and possibly look at a way to add ensure metadata for images loaded
178181
outside of kubelet. In beta we will add a switch to enable re-auth on boot for
179182
admins seeking that instead of having to garbage collect where they do not use
180183
or expect preloaded images since boot.
181184

182-
183185
## Design Details
184186

185-
Kubelet will track, in memory, a hash map for the credentials that were successfully used to pull an image. It has been decided that the hash map will be persisted to disk, in alpha.
187+
Kubelet will track, in memory, a hash map for the credentials that were successfully
188+
used to pull an image.
186189

187-
The persisted "cache" will undergo cleanup operations on a timely basis (by default once an hour).
190+
See PR linked above for detailed design / behavior documentation.
188191

189-
The persistence of the on storage cache is mainly for restarting kubelet and/or node reboot.
192+
Kubelet will add a new flag, named `PullImageSecretRecheckDuration` to make
193+
the expired duration configurable. The default value could be 1d. For a pod with
194+
IfNotPresent image pull policy and an image pull secret, kubelet will recheck
195+
the secret after `PullImageSecretRecheckDuration`.
190196

191-
The max size of the cache will scale with the number of unique cache entries * the number of unique images that have not been garbage collected. It is not expected that this will be a significant number of bytes. Will be verified by actual use in Alpha and subsequent metrics in Beta.
197+
Use image pull policy `Always` if user want to recheck the secret everytime.
192198

193-
See `/var/lib/kubelet/image_manager_state` in [kubernetes/kubernetes#114847](https://github.com/kubernetes/kubernetes/pull/114847)
199+
For image pull policy "if not present", when admin/user doesn't want to automatically
200+
recheck the secret, set `PullImageSecretRecheckDuration` to 0 to disable it(which means
201+
never recheck).
194202

195-
> ```
196-
> {
197-
> "images": {
198-
> "sha256:eb6cbbefef909d52f4b2b29f8972bbb6d86fc9dba6528e65aad4f119ce469f7a": {
199-
> "authHash": { ** per review comment use SHA256 here vs hash **
200-
> "115b8808c3e7f073": {
201-
> "ensured": true,
202-
> "dueDate": "2023-05-30T05:26:53.76740982+08:00"
203-
> }
204-
> },
205-
> "name": "daocloud.io/daocloud/dce-registry-tool:3.0.8"
206-
> }
207-
> }
208-
> }
209-
> ```
203+
Note: using the tag `:latest` is equivalent to using the image pull policy `Always.`
210204

211-
See PR linked above for detailed design / behavior documentation.
205+
Note: since the cache is not persisted to disk, a recheck will happen every kubelet restart.
206+
This is acceptable because kubelet only restarts during upgrades or in maintenance modes.
207+
In other words, it should be relatively infrequent(and much less frequent than the default
208+
value of `PullImageSecretRecheckDuration`).
212209

213210
### Test Plan
214211

@@ -218,41 +215,43 @@ necessary to implement this enhancement.
218215

219216
##### Prerequisite testing updates
220217

221-
222218
##### Unit tests
223219

224220
For alpha, exhaustive Kubelet unit tests will be provided. Functions affected by the feature gate will be run with the feature gate on and with the feature gate off. Unit buckets will be provided for:
221+
225222
- HashAuth - (new, small) returns a hash code for a CRI pull image auth [link](https://github.com/kubernetes/kubernetes/pull/94899/files#diff-ca08601dfd2fdf846f066d0338dc332beddd5602ab3a71b8fac95b419842da63R704-R751) ** per review comment will use SHA256 **
226223
- shouldPullImage - (modified, large sized change) determines if image should be pulled based on presence, and image pull policy, and now with the feature gate on if the image has been pulled/ensured by a secret. A unit test bucket did not exist for this function. The unit bucket will cover a matrix for:
227-
```
228-
pullIfNotPresent := &v1.Container{
224+
225+
```golang
226+
pullIfNotPresent := &v1.Container{
229227
..
230-
ImagePullPolicy: v1.PullIfNotPresent,
231-
}
232-
pullNever := &v1.Container{
228+
ImagePullPolicy: v1.PullIfNotPresent,
229+
}
230+
pullNever := &v1.Container{
233231
..
234232
ImagePullPolicy: v1.PullNever,
235-
}
236-
pullAlways := &v1.Container{
237-
..
233+
}
234+
pullAlways := &v1.Container{
235+
..
238236
ImagePullPolicy: v1.PullAlways,
239-
}
240-
tests := []struct {
241-
description string
242-
container *v1.Container
243-
imagePresent bool
244-
pulledBySecret bool
245-
ensuredBySecret bool
246-
expectedWithFGOff bool
247-
expectedWithFGOn bool
248-
}
237+
}
238+
tests := []struct {
239+
description string
240+
container *v1.Container
241+
imagePresent bool
242+
pulledBySecret bool
243+
ensuredBySecret bool
244+
expectedWithFGOff bool
245+
expectedWithFGOn bool
246+
}
249247
```
250-
[TestShouldPullImage link](https://github.com/kubernetes/kubernetes/pull/94899/files#diff-7297f08c72da9bf6479e80c03b45e24ea92ccb11c0031549e51b51f88a91f813R311-R438)
251248

252-
PersistHashMeta() ** will be persisting SHA256 entries vs hash **
249+
[TestShouldPullImage link](https://github.com/kubernetes/kubernetes/pull/94899/files#diff-7297f08c72da9bf6479e80c03b45e24ea92ccb11c0031549e51b51f88a91f813R311-R438)
253250

254251
Additionally, for Alpha we will update this readme with an enumeration of the core packages being touched by the PR to implement this enhancement and provide the current unit coverage for those in the form of:
252+
255253
- <package>: <date> - <current test coverage>
254+
256255
The data will be read from:
257256
https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
258257

@@ -314,6 +313,7 @@ TBD subsequent to alpha
314313
## Production Readiness Review Questionnaire
315314

316315
### Feature Enablement and Rollback
316+
317317
- At Alpha this feature will be disabled by default with a feature gate.
318318
- At Beta this feature will be enabled by default with the feature gate.
319319
- At GA the ability to gate the feature will be removed leaving the feature enabled.
@@ -342,6 +342,7 @@ Will go back to working as designed.
342342
Yes, tests run both enabled and disabled.
343343

344344
### Rollout, Upgrade and Rollback Planning
345+
345346
TBD
346347

347348
###### How can a rollout or rollback fail? Can it impact already running workloads?

keps/sig-node/2535-ensure-secret-pulled-images/kep.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,25 @@ title: Ensure Secret Pulled Images
22
kep-number: 2535
33
authors:
44
- "@mikebrow"
5+
- "@pacoxu"
56
owning-sig: sig-node
6-
participating-sigs:
7-
- sig-node
87
status: implementable
98
creation-date: 2021-03-10
109
reviewers:
1110
- "@Random-Liu"
1211
- "@yujuhong"
1312
- "@mrunalp"
1413
- "@adisky"
14+
- "@haircommander"
1515
approvers:
1616
- "@dchen1107"
1717
- "@derekwaynecarr"
1818
stage: alpha
19-
latest-milestone: "v1.29"
19+
latest-milestone: "v1.30"
2020
milestone:
21-
alpha: "v1.29"
22-
beta: "v1.30"
23-
stable: "v1.32"
21+
alpha: "v1.30"
22+
beta: "v1.31"
23+
stable: "v1.33"
2424
feature-gates:
2525
- name: KubeletEnsureSecretPulledImages
2626
components:

0 commit comments

Comments
 (0)