Skip to content

Commit a4a8b2c

Browse files
committed
KEP-2535: update stylistic nits
Signed-off-by: Peter Hunt <[email protected]>
1 parent 61d1b11 commit a4a8b2c

File tree

1 file changed

+52
-61
lines changed
  • keps/sig-node/2535-ensure-secret-pulled-images

1 file changed

+52
-61
lines changed

keps/sig-node/2535-ensure-secret-pulled-images/README.md

Lines changed: 52 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@
1515
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
1616
- [Risks and Mitigations](#risks-and-mitigations)
1717
- [Design Details](#design-details)
18+
- [Test Plan](#test-plan)
19+
- [Prerequisite testing updates](#prerequisite-testing-updates)
20+
- [Unit tests](#unit-tests)
1821
- [Integration tests](#integration-tests)
1922
- [e2e tests](#e2e-tests)
2023
- [Graduation Criteria](#graduation-criteria)
@@ -60,67 +63,59 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
6063

6164
## Summary
6265

63-
We will add support in kubelet for the `pullIfNotPresent` image pull policy, for
64-
ensuring images pulled with pod imagePullSecrets are re-authenticated for other
65-
pods that do not have the same imagePullSecret/auths used to successfully pull
66-
the images in the first place.
66+
We will add support in the kubelet for an admin to enable the ability to ensure an image that is already present on a node because
67+
a pod with `ImagePullSecrets` previously pulled it is reauthenticated when a new pod with different `ImagePullSecrets` attempts to use the same image,
68+
when the `ImagePullPolicy` is `IfNotPresent`.
6769

68-
This policy change will have no affect on the `pull always` image pull
69-
policy or for images that are preloaded.
70+
In other words: ensure the pull secrets are rechecked for each new set of credentials, and ensure a pod has access to those images.
7071

71-
However, for the `pull never` policy if a first pod successfully pulled an image
72+
This policy change will have no affect on the `Always` `ImagePullPolicy` or for images that are preloaded.
73+
74+
However, for the `Never` policy if a first pod successfully pulled an image
7275
with credential and then a second pod with pull never tried to use the image,
7376
when the feature gate is on the second pod will receive an error message, where
7477
before and with the feature gate off the second pod would be able to use the image
7578
pulled with credentials by the first pod.
7679

77-
This new feature will be enabled with a feature gate in alpha. This feature
78-
improves the security posture for privacy/security of image contents by forcing
79-
images pulled with an imagePullSecret/auth of a first pod to be re-authenticated
80-
for a second pod even if the image is already present through the secure pull of
81-
the first pod.
82-
83-
The new behavior means that if a first pod results in an image pulled with
84-
imagePullSecrets a second pod would have to also have rights to the image in
85-
order to use a present image.
86-
87-
This means that the image pull policy alwaysPull would no longer be required in
88-
every scenario to ensure image access rights by pods.
80+
This new feature will be enabled with a feature gate in alpha, as well as two kubelet configuration
81+
fields `pullImageSecretRecheck` and `pullImageSecretRecheckPeriod`.
8982

9083
*** The issue and these changes improving the security posture without requiring the forcing of pull always, will be documented in the kubernetes image pull policy documentation. The new feature gate should also be documented in release notes. ***
9184

9285
## Motivation
9386

9487
There have been customer requests for improving upon kubernetes' ability to
95-
secure images pulled with auth. on a node. Issue
88+
secure images pulled with auth on a node. Issue
9689
[#18787](https://github.com/kubernetes/kubernetes/issues/18787) has been around
9790
for a while.
9891

99-
To secure images one currently needs to inject `AllwaysPullImages` into pod
92+
To secure images one currently needs to inject `Always` `ImagePullPolicy` into pod
10093
specs via an admission plugin. As @liggitt [notes](https://github.com/kubernetes/kubernetes/issues/18787#issuecomment-532280931)
10194
the `pull` does not re-pull already-pulled layers of the image, but simply
10295
resolves/verifies the image manifest has not changed in the registry (which
10396
incidentally requires authenticating to private registries, which enforces the
10497
image access). That means in the normal case (where the image has not changed
105-
since the last pull), the request size is O(kb). However, the `pull` does put
106-
the registry in the critical path of starting a container, since an unavailable
107-
registry will fail the pull image manifest check (with or without proper
108-
authentication.)
98+
since the last pull), the request size is O(kb).
99+
100+
However, the `pull` does put the registry in the critical path of starting a container,
101+
since an unavailable registry will fail the pull image manifest check (with or without proper authentication.)
102+
103+
Thus, the motivation is to allow users to ensure the kubelet requires an image pull auth check for each new set of credentials,
104+
regardless of whether the image is already present on the node.
109105

110106
### Goals
111107

112-
Modify the current pullIfNotPresent policy management enforced by `kubelet` to
113-
ensure the images pulled with a secret by `kubelet` since boot. During the
114-
EnsureImagesExist step `kubelet` will require authentication of present images
115-
pulled with auth since boot.
108+
Modify the current behavior of images with an `IfNotPresent` `ImagePullPolicy` enforced by the kubelet to
109+
ensure the images pulled with a secret by the kubelet are authenticated by the CRI implementation. During the
110+
EnsureImagesExist step the kubelet will require authentication of present images pulled with auth since boot.
116111

117112
Optimize to only force re-authentication for a pod container image when the
118-
secret used to pull the container image is not present. IOW if an image is
119-
pulled with authentication for a first pod, subsequent pods that have the same
120-
authentication information should not need to re-authenticate.
113+
`ImagePullSecrets` used to pull the container image has not already been authenticated.
114+
IOW if an image is pulled with authentication for a first pod, subsequent pods that have the same
115+
authentication information should not need to re-authenticate, unless the kubelet's `pullImageSecretRecheckPeriod` has passed.
121116

122-
Images already present at boot or loaded externally to `kubelet` or successfully
123-
pulled through `kubelet` with no imagePullSecret/authentication required will
117+
Images already present at boot or loaded externally to the kubelet or successfully
118+
pulled through the kubelet with no `ImagePullSecrets`/authentication required will
124119
not require authentication.
125120

126121
### Non-Goals
@@ -130,17 +125,17 @@ runtimes through the CRI wrt. how they should treat the caching of images on a
130125
node. Such as store for public use but only if encrypted. Or Store for private
131126
use un-encrypted...
132127

133-
This feature will not change the behavior of pod with image pull policy `Always`
128+
This feature will not change the behavior of pod with `ImagePullPolicy` `Always`
134129
and `Never`.
135130

136131
## Proposal
137132

138-
For alpha `kubelet` will keep a list, across reboots of host and restart of
133+
For alpha the kubelet will keep a list, across reboots of host and restart of
139134
kubelet, of container images that required authentication and a list of the
140135
authentications that successfully pulled the image.
141136
For beta an API will be considered to manage the ensure metadata.
142137

143-
`kubelet` will ensure any image in the list is always pulled if an authentication
138+
The kubelet will ensure any image in the list is always pulled if an authentication
144139
used is not present, thus enforcing authentication / re-authentication.
145140

146141

@@ -153,7 +148,7 @@ concern that one tenant will gain access to an image that they don't have rights
153148

154149
#### Story 2
155150

156-
User will will no longer have to inject the Pull Always Image Pull Policy to
151+
User will no longer have to inject the `PullAlways` imagePullPolicy to
157152
ensure all tenants have rights to the images that are already present on a host.
158153

159154
### Notes/Constraints/Caveats (Optional)
@@ -163,39 +158,34 @@ to set the feature gate to true to gain these this Secure by Default benefit.
163158

164159
### Risks and Mitigations
165160

166-
Image authentications with a registry may expire. To mitigate expirations a
167-
a timeout will be used to force re-authentication. The timeout could be a
168-
container runtime feature or a `kubelet` feature. If at the container runtime,
169-
images would not be present during the EnsureImagesExist step, thus would have
170-
to be pulled and authenticated if necessary. This timeout feature will be
171-
implemented in alpha.
161+
- Image authentications with a registry may expire.
162+
- To mitigate expirations a timeout will be used to force re-authentication.
163+
This timeout will be configured as a kubelet configuration field `pullImageSecretRecheckPeriod`.
164+
This timeout feature will be implemented in alpha.
172165

173-
Since images can be pre-loaded, loaded outside the `kubelet` process, and
174-
garbage collected.. the list of images that required authentication in `kubelet`
175-
will not be a source of truth for how all images were pulled that are in the
176-
container runtime cache. To mitigate, images can be garbage collected at boot.
177-
And we will persist ensure metadata across reboot of host, and restart
178-
of kubelet, and possibly look at a way to add ensure metadata for images loaded
179-
outside of kubelet. In beta we will add a switch to enable re-auth on boot for
180-
admins seeking that instead of having to garbage collect where they do not use
181-
or expect preloaded images since boot.
166+
- Images can be "pre-loaded", or pulled behind the kubelet's back before it starts.
167+
In this case, the kubelet is not managing the credentials for these images.
168+
- To mitigate, metadata will be persisted across reboot. The kubelet will compare previously
169+
cached credentials against the images that exist. On a new image pull, the kubelet will use
170+
its saved cache and revalidate as necessary.
171+
In other words: even if the images are already cached, if new images are present that have not
172+
previously been authenticated against a pods credentials, then the image will be revalidated.
182173

183174

184175
## Design Details
185176

186-
Kubelet will track, in memory, a hash map for the credentials that were successfully used to pull an image.
187-
It has been decided that the hash map will be persisted to disk, in alpha.
188-
189-
The persisted "cache" will undergo cleanup operations on a timely basis (by default once an hour).
177+
The kubelet will track, in memory, a pulled image auth cache for the credentials that were successfully used to pull an image.
178+
This cache will be persisted to disk, to allow nodes that are "disconnected", or unable to reach the registry to boot up, assuming
179+
they have previously been able to access a registry and authenticated the images present.
190180

191-
The persistence of the on storage cache is mainly for restarting kubelet and/or node reboot.
181+
The persisted cache will be cleaned up every `pullImageSecretRecheckPeriod`.
192182

193183
The max size of the cache will scale with the number of unique cache entries * the number of unique images that have not been garbage collected.
194184
It is not expected that this will be a significant number of bytes. Will be verified by actual use in Alpha and subsequent metrics in Beta.
195185

196186
See `/var/lib/kubelet/image_manager_state` in [kubernetes/kubernetes#114847](https://github.com/kubernetes/kubernetes/pull/114847)
197187

198-
> ```
188+
```
199189
> {
200190
> "images": {
201191
> "sha256:eb6cbbefef909d52f4b2b29f8972bbb6d86fc9dba6528e65aad4f119ce469f7a": {
@@ -209,7 +199,7 @@ See `/var/lib/kubelet/image_manager_state` in [kubernetes/kubernetes#114847](htt
209199
> }
210200
> }
211201
> }
212-
> ```
202+
```
213203

214204
See PR linked above for detailed design / behavior documentation.
215205

@@ -256,7 +246,7 @@ For alpha, exhaustive Kubelet unit tests will be provided. Functions affected by
256246

257247
[TestShouldPullImage link](https://github.com/kubernetes/kubernetes/pull/94899/files#diff-7297f08c72da9bf6479e80c03b45e24ea92ccb11c0031549e51b51f88a91f813R311-R438)
258248

259-
PersistHashMeta() ** will be persisting SHA256 entries vs hash **
249+
PersistMeta() ** will be persisting SHA256 entries vs hash **
260250

261251
Additionally, for Alpha we will update this readme with an enumeration of the core packages being touched by the PR to implement this enhancement and provide the current unit coverage for those in the form of:
262252

@@ -475,6 +465,7 @@ Why should this KEP _not_ be implemented. TBD
475465
- Set the flag at some other scope e.g. pod spec (doing it at the pod spec was rejected by SIG-Node).
476466
- For beta/ga we may revisit/replace the in memory hash map in kubelet design, with an extension to the CRI API for having the container runtime
477467
ensure the image instead of kubelet.
468+
- Discussions went back and forth as to whether to persist the cache across reboots. It was decided to do so.
478469

479470
## Infrastructure Needed [optional]
480471

0 commit comments

Comments
 (0)