You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -482,8 +480,8 @@ While the `imagePullPolicy` is working on container level, the introduced
482
480
values `IfNotPresent`, `Always` and `Never`, but will only pull once per pod.
483
481
484
482
Technically it means that we need to pull in [`SyncPod`](https://github.com/kubernetes/kubernetes/blob/b498eb9/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1049)
485
-
for OCI objects on a pod level and not during [`EnsureImageExists`](https://github.com/kubernetes/kubernetes/blob/b498eb9/pkg/kubelet/images/image_manager.go#L102)
486
-
before the container gets started.
483
+
for OCI objects on a pod level and not for each container during [`EnsureImageExists`](https://github.com/kubernetes/kubernetes/blob/b498eb9/pkg/kubelet/images/image_manager.go#L102)
484
+
before they get started.
487
485
488
486
If users want to re-pull artifacts when referencing moving tags like `latest`,
489
487
then they need to restart / evict the pod.
@@ -500,50 +498,44 @@ container image.
500
498
#### CRI
501
499
502
500
The CRI API is already capable of managing container images [via the `ImageService`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L146-L161).
503
-
Those RPCs will be re-used for managing OCI artifacts, while the [`ImageSpec`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L798-L813)
504
-
as well as [`PullImageResponse`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L1530-L1534)
505
-
will be extended to mount the OCI object to a local path:
501
+
Those RPCs will be re-used for managing OCI artifacts, while the [`Mount`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L220-L247)
502
+
message will be extended to mount an OCI object using the existing [`ImageSpec`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L798-L813)
503
+
on container creation:
506
504
507
505
```protobuf
508
-
509
-
// ImageSpec is an internal representation of an image.
510
-
message ImageSpec {
511
-
// …
512
-
513
-
// Indicate that the OCI object should be mounted.
514
-
bool mount = 20;
515
-
516
-
// SELinux label to be used.
517
-
string mount_label = 21;
518
-
}
519
-
520
-
message PullImageResponse {
506
+
// Mount specifies a host volume to mount into a container.
507
+
message Mount {
521
508
// …
522
509
523
-
// Absolute local path where the OCI object got mounted.
524
-
string mountpoint = 2;
510
+
// Mount an image reference (image ID, with or without digest), which is a
511
+
// special use case for image volume mounts. If this field is set, then
512
+
// host_path should be unset. All OCI mounts are per feature definition
513
+
// readonly. The kubelet does an PullImage RPC and evaluates the returned
514
+
// PullImageResponse.image_ref value, which is then set to the
515
+
// ImageSpec.image field. Runtimes are expected to mount the image as
516
+
// required.
517
+
// Introduced in the OCI Volume Source KEP: https://kep.k8s.io/4639
518
+
ImageSpec image = 9;
525
519
}
526
520
```
527
521
528
522
This allows to re-use the existing kubelet logic for managing the OCI objects,
529
523
with the caveat that the new `VolumeSource` won't be isolated in a dedicated
530
524
plugin as part of the existing [volume manager](https://github.com/kubernetes/kubernetes/tree/6d0aab2/pkg/kubelet/volumemanager).
531
525
532
-
The added `mount_label` allow the kubelet to support SELinux contexts.
526
+
Runtimes are already aware of the correct SELinux parameters during container
527
+
creation and will re-use them for the OCI object mounts.
533
528
534
-
The kubelet will use the `mountpoint` on container creation
535
-
(by calling the `CreateContainer` RPC) to indicate the additional required volume mount ([`ContainerConfig.Mount`](https://github.com/kubernetes/cri-api/blob/3a66d9d/pkg/apis/runtime/v1/api.proto#L1102))
536
-
from the runtime. The runtime needs to ensure that mount and also manages its
537
-
lifecycle, for example to remove the bind mount on container removal.
529
+
The kubelet will use the returned `PullImageResponse.image_ref` on pull and sets
530
+
it to `Mount.image.image` together with the other fields for `Mount.image`. The
531
+
runtime will then mount the OCI object directly on container creation assuming
532
+
it's already present on disk. The runtime also manages the lifecycle of the
533
+
mount, for example to remove the OCI bind mount on container removal as well as
534
+
the object mount on the `RemoveImage` RPC.
538
535
539
536
The kubelet tracks the information about which OCI object is used by which
540
-
sandbox and therefore manages the lifecycle of them.
541
-
542
-
The proposal also considers smaller CRI changes, for example to add a list of
543
-
mounted volume paths to the `ImageStatusResponse.Image` message returned by the
544
-
`ImageStatus` RPC. This allows providing the right amount of information between
545
-
the kubelet and the runtime to ensure that no context gets lost in restart
546
-
scenarios.
537
+
sandbox and therefore manages the lifecycle of them for garbage collection
538
+
purposes.
547
539
548
540
The overall flow for container creation will look like this:
549
541
@@ -554,32 +546,30 @@ sequenceDiagram
554
546
Note left of K: During pod sync
555
547
Note over K,C: CRI
556
548
K->>+C: RPC: PullImage
557
-
Note right of C: Pull and mount<br/>OCI object
558
-
C-->>-K: PullImageResponse.Mountpoint
549
+
Note right of C: Pull OCI object
550
+
C-->>-K: PullImageResponse.image_ref
559
551
Note left of K: Add mount points<br/> to container<br/>creation request
560
552
K->>+C: RPC: CreateContainer
561
-
Note right of C: Add bind mounts<br/>from object mount<br/>point to container
553
+
Note right of C: Mount OCI object
554
+
Note right of C: Add OCI bind mounts<br/>from OCI object<br/>to container
562
555
C-->>-K: CreateContainerResponse
563
556
```
564
557
565
558
1.**Kubelet Initiates Image Pull**:
566
559
- During pod setup, the kubelet initiates the pull for the OCI object based on the volume source.
567
-
- The kubelet passes the necessary indicator to mount the object to the container runtime.
568
560
569
561
2.**Runtime Handles Mounting**:
570
-
- The container runtime mounts the OCI object as a filesystem using the metadata provided by the kubelet.
571
-
- The runtime returns the mount point information to the kubelet.
562
+
- The runtime returns the image reference information to the kubelet.
572
563
573
564
3.**Redirecting of the Mountpoint**:
574
-
- The kubelet uses the returned mount point to build the container creation request for each container using that mount.
575
-
- The kubelet initiates the container creation and the runtime creates the required bind mounts to the target location.
565
+
- The kubelet uses the returned image reference to build the container creation request for each container using that mount.
566
+
- The kubelet initiates the container creation and the runtime creates the required OCI object mount as well as bind mounts to the target location.
576
567
This is the current implemented behavior for all other mounts and should require no actual container runtime code change.
577
568
578
569
4.**Lifecycle Management**:
579
570
- The container runtime manages the lifecycle of the mounts, ensuring they are created during pod setup and cleaned up upon sandbox removal.
580
571
581
572
5.**Tracking and Coordination**:
582
-
- The kubelet and runtime coordinate to track pods requesting mounts to avoid removing containers with volumes in use.
583
573
- During image garbage collection, the runtime provides the kubelet with the necessary mount information to ensure proper cleanup.
584
574
585
575
6.**SELinux Context Handling**:
@@ -597,19 +587,17 @@ sequenceDiagram
597
587
598
588
#### Container Runtimes
599
589
600
-
Container runtimes need to support the new `mount` field, otherwise the
601
-
feature cannot be used. The kubelet will verify if the returned `mountpoint`
602
-
actually exists on disk to check the feature availability, because Protobuf will
603
-
strip the field in a backwards compatible way for older runtimes. Pods using the
604
-
new `VolumeSource` combined with a not supported container runtime version will
605
-
fail to run on the node.
590
+
Container runtimes need to support the new `Mount.image` field, otherwise the
591
+
feature cannot be used. Pods using the new `VolumeSource` combined with a not
592
+
supported container runtime version will fail to run on the node, because the
593
+
`Mount.host_path` field is not set for those mounts.
606
594
607
595
For security reasons, volume mounts should set the [`noexec`] and `ro`
608
596
(read-only) options by default.
609
597
610
598
##### Filesystem representation
611
599
612
-
Container Runtimes are expected to return a `mountpoint`, which is a single
600
+
Container Runtimes are expected to manage a `mountpoint`, which is a single
613
601
directory containing the unpacked (in case of tarballs) and merged layer files
614
602
from the image or artifact. If an OCI artifact has multiple layers (in the same
615
603
way as for container images), then the runtime is expected to merge them
The container runtime can now pull the artifact with the `mount = true` CRI
720
-
field set, for example using an experimental [`crictl pull --mount` flag](https://github.com/kubernetes-sigs/cri-tools/compare/master...saschagrunert:oci-volumesource-poc):
721
-
722
-
```bash
723
-
sudo crictl pull --mount localhost:5000/image:v1
724
-
```
725
-
726
-
```console
727
-
Image is up to date for localhost:5000/image@sha256:7728cb2fa5dc31ad8a1d05d4e4259d37c3fc72e1fbdc0e1555901687e34324e9
728
-
Image mounted to: /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged
729
-
```
730
-
731
-
And the returned `mountpoint` contains the unpacked layers as directory tree:
732
-
733
-
```bash
734
-
sudo tree /var/lib/containers/storage/overlay/7ee9a1dcea9f152b10590871e55e485b249cd42ea912111ff9f99ab663c1001a/merged
0 commit comments