Skip to content

Commit 37837bf

Browse files
committed
Address review from smarterclayton
Signed-off-by: Sascha Grunert <[email protected]>
1 parent 3f72dce commit 37837bf

File tree

1 file changed

+175
-7
lines changed
  • keps/sig-node/4639-oci-volume-source

1 file changed

+175
-7
lines changed

keps/sig-node/4639-oci-volume-source/README.md

Lines changed: 175 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ tags, and then generate with `hack/update-toc.sh`.
9898
- [Registry authentication](#registry-authentication)
9999
- [CRI](#cri)
100100
- [Container Runtimes](#container-runtimes)
101+
- [Filesystem representation](#filesystem-representation)
101102
- [SELinux](#selinux)
102103
- [Test Plan](#test-plan)
103104
- [Prerequisite testing updates](#prerequisite-testing-updates)
@@ -188,11 +189,15 @@ which go beyond running particular images.
188189

189190
- Introduce a new `VolumeSource` type that allows mounting OCI images and/or artifacts.
190191
- Simplify the process of sharing files among containers in a pod.
192+
- Providing a runtime guideline of how artifact files and directories should be
193+
mounted.
191194

192195
### Non-Goals
193196

194197
- This proposal does not aim to replace existing `VolumeSource` types.
195-
- This proposal does not address other use cases for OCI objects beyond file sharing among containers in a pod.
198+
- This proposal does not address other use cases for OCI objects beyond file or
199+
directory sharing among containers in a pod.
200+
- Mounting thousands of images and artifacts in a single pod.
196201

197202
## Proposal
198203

@@ -222,9 +227,13 @@ of OCI distribution.
222227

223228
#### Story 3
224229

225-
As a data scientist, MLOps engineer, or AI developer, I want to mount large language models or machine learning models in a pod alongside a model-server, so that I can efficiently serve the models
226-
without including them in the model-server container image. I want to package these models in an OCI object to take advantage of OCI distribution and ensure
227-
efficient model deployment. This allows to separate the model specifications/content from the executables that process them.
230+
As a data scientist, MLOps engineer, or AI developer, I want to mount large
231+
language model weights or machine learning model weights in a pod alongside a
232+
model-server, so that I can efficiently serve them without including them in the
233+
model-server container image. I want to package these in an OCI object to take
234+
advantage of OCI distribution and ensure efficient model deployment. This allows
235+
to separate the model specifications/content from the executables that process
236+
them.
228237

229238
#### Story 4
230239

@@ -258,12 +267,21 @@ the OS or version of the scanning software.
258267
- **Use Case:** Allows the distribution of non-container content using the same infrastructure and tools developed for OCI images.
259268

260269
**OCI Object:**
261-
- Umbrella term encompassing both OCI images and OCI artifacts. It represents any object that conforms to the OCI specifications for storage and distribution.
270+
- Umbrella term encompassing both OCI images and OCI artifacts. It represents
271+
any object that conforms to the OCI specifications for storage and
272+
distribution and can be represented as file or filesystem by an OCI container runtime.
262273

263274
### Risks and Mitigations
264275

265-
- **Security Risks:** Allowing direct mounting of OCI images introduces potential attack vectors. Mitigation includes thorough security reviews and
266-
limiting access to trusted registries. Limiting to OCI artifacts (non-runnable content) or read-only mode may lessen the security risk.
276+
- **Security Risks:**:
277+
- Allowing direct mounting of OCI images introduces potential attack
278+
vectors. Mitigation includes thorough security reviews and limiting access
279+
to trusted registries. Limiting to OCI artifacts (non-runnable content)
280+
and read-only mode will lessen the security risk.
281+
- Path traversal attacks are a high risk for introducing security
282+
vulnerabilities. Container Runtimes should re-use their existing
283+
implementations to merge layers as well as secure join symbolic links in
284+
the container storage prevent such issues.
267285
- **Compatibility Risks:** Existing webhooks watching for the images used by the pod using some policies will need to be updated to expect the image to be specified as a `VolumeSource`.
268286
- **Performance Risks:** Large images or artifacts could impact performance. Mitigation includes optimizations in the implementation and providing
269287
guidance on best practices for users.
@@ -310,6 +328,8 @@ potential enhancements may be required:
310328
311329
**Lifecycling and Garbage Collection:**
312330
- Reuse the existing kubelet logic for managing the lifecycle of OCI objects.
331+
- Extending the existing image garbage collection to not remove an OCI volume
332+
image if a pod is still referencing it.
313333
314334
**Artifact-Specific Configuration:**
315335
- Introduce new configuration options to handle the unique requirements of different types of OCI artifacts.
@@ -404,6 +424,154 @@ fail to run on the node.
404424
For security reasons, volume mounts should set the [`noexec`] and `ro`
405425
(read-only) options by default.
406426

427+
##### Filesystem representation
428+
429+
Container Runtimes are expected to return a `mountpoint`, which is a single
430+
directory containing the unpacked (in case of tarballs) and merged layer files
431+
from the image or artifact. If an OCI artifact has multiple layers (in the same
432+
way as for container images), then the runtime is expected to merge them
433+
together. Duplicate files from distinct layers will be overwritten from the
434+
higher indexed layer.
435+
436+
Runtimes are expected to be able to handle layers as tarballs (like they do for
437+
images right now) as well as plain single files. How the runtimes implement the
438+
expected output and which media types they want to support is deferred to them
439+
for now. Kubernetes only defines the expected output as a single directory
440+
containing the (unpacked) content.
441+
442+
###### Example using ORAS
443+
444+
Assuming the following directory structure:
445+
446+
```console
447+
./
448+
├── dir/
449+
│ └── file
450+
└── file
451+
```
452+
453+
```console
454+
$ cat dir/file
455+
layer0
456+
457+
$ cat file
458+
layer1
459+
```
460+
461+
Then we can manually create two distinct layers by:
462+
463+
```bash
464+
tar cfvz layer0.tar dir
465+
tar cfvz layer1.tar file
466+
```
467+
468+
We also need a `config.json`, ideally indicating the requested architecture:
469+
470+
```bash
471+
jq --null-input '.architecture = "amd64" | .os = "linux"' > config.json
472+
```
473+
474+
Now using [ORAS](https://oras.land) to push the distinct layers:
475+
476+
```bash
477+
oras push --config config.json:application/vnd.oci.image.config.v1+json \
478+
localhost:5000/image:v1 \
479+
layer0.tar:application/vnd.oci.image.layer.v1.tar+gzip \
480+
layer1.tar:application/vnd.oci.image.layer.v1.tar+gzip
481+
```
482+
483+
```console
484+
✓ Uploaded layer1.tar 129/129 B 100.00% 73ms
485+
└─ sha256:0c26e9128651086bd9a417c7f0f3892e3542000e1f1fe509e8fcfb92caec96d5
486+
✓ Uploaded application/vnd.oci.image.config.v1+json 47/47 B 100.00% 126ms
487+
└─ sha256:4a2128b14c6c3699084cd60f24f80ae2c822f9bd799b24659f9691cbbfccae6b
488+
✓ Uploaded layer0.tar 166/166 B 100.00% 132ms
489+
└─ sha256:43ceae9994ffc73acbbd123a47172196a52f7d1d118314556bac6c5622ea1304
490+
✓ Uploaded application/vnd.oci.image.manifest.v1+json 752/752 B 100.00% 40ms
491+
└─ sha256:7728cb2fa5dc31ad8a1d05d4e4259d37c3fc72e1fbdc0e1555901687e34324e9
492+
Pushed [registry] localhost:5000/image:v1
493+
ArtifactType: application/vnd.oci.image.config.v1+json
494+
Digest: sha256:7728cb2fa5dc31ad8a1d05d4e4259d37c3fc72e1fbdc0e1555901687e34324e9
495+
```
496+
497+
The resulting manifest looks like:
498+
499+
```bash
500+
oras manifest fetch localhost:5000/image:v1 | jq .
501+
```
502+
503+
```json
504+
{
505+
"schemaVersion": 2,
506+
"mediaType": "application/vnd.oci.image.manifest.v1+json",
507+
"config": {
508+
"mediaType": "application/vnd.oci.image.config.v1+json",
509+
"digest": "sha256:4a2128b14c6c3699084cd60f24f80ae2c822f9bd799b24659f9691cbbfccae6b",
510+
"size": 47
511+
},
512+
"layers": [
513+
{
514+
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
515+
"digest": "sha256:43ceae9994ffc73acbbd123a47172196a52f7d1d118314556bac6c5622ea1304",
516+
"size": 166,
517+
"annotations": {
518+
"org.opencontainers.image.title": "layer0.tar"
519+
}
520+
},
521+
{
522+
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
523+
"digest": "sha256:0c26e9128651086bd9a417c7f0f3892e3542000e1f1fe509e8fcfb92caec96d5",
524+
"size": 129,
525+
"annotations": {
526+
"org.opencontainers.image.title": "layer1.tar"
527+
}
528+
}
529+
],
530+
"annotations": {
531+
"org.opencontainers.image.created": "2024-06-14T07:49:06Z"
532+
}
533+
}
534+
```
535+
536+
The container runtime can now pull the artifact with the `mount = true` CRI
537+
field set, for example using an experimental [`crictl pull --mount` flag](https://github.com/kubernetes-sigs/cri-tools/compare/master...saschagrunert:oci-volumesource-poc):
538+
539+
```bash
540+
sudo crictl pull --mount localhost:5000/image:v1
541+
```
542+
543+
```console
544+
Image is up to date for localhost:5000/image@sha256:7728cb2fa5dc31ad8a1d05d4e4259d37c3fc72e1fbdc0e1555901687e34324e9
545+
Image mounted to: /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged
546+
```
547+
548+
And the returned `mountpoint` contains the unpacked layers as directory tree:
549+
550+
```bash
551+
sudo tree /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged
552+
```
553+
554+
```console
555+
/var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged
556+
├── dir
557+
│   └── file
558+
└── file
559+
560+
2 directories, 2 files
561+
```
562+
563+
```console
564+
$ sudo cat /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged/dir/file
565+
layer0
566+
567+
$ sudo cat /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged/file
568+
layer1
569+
```
570+
571+
ORAS (and other tools) are also able to push multiple files or directories
572+
within a single layer. This should be supported by container runtimes in the
573+
same way.
574+
407575
##### SELinux
408576

409577
Traditionally, the container runtime is responsible of applying SELinux labels

0 commit comments

Comments
 (0)