|
| 1 | +# KEP-1845: Image Compatibility with NFD |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Currently, there is no standard solution for describing container image requirements in relation to hardware or operating systems. |
| 6 | +Cloud-native technologies are being adopted by high-demand industries where container compatibility is critical for service performance and cluster preparation. |
| 7 | +This proposal introduces the concept of NFD image compatibility metadata. |
| 8 | +NFD features via NodeFeatureRules CRs can be effectively added to images to specify requirements for a host or operating system. |
| 9 | + |
| 10 | +The document has been prepared based on the experience and progress of the [OCI Image Compatibility working group](https://github.com/opencontainers/wg-image-compatibility/tree/main/docs/proposals). |
| 11 | + |
| 12 | +## Motivation |
| 13 | + |
| 14 | +Image compatibility metadata will help container image authors describe compatibility requirements in a standardized way. |
| 15 | +This metadata will be uploaded with the image to the image registry. |
| 16 | +As a result, container compatibility requirements will become discoverable and programmable, supporting various consumers and use cases where applications require a specific compatible environment. |
| 17 | + |
| 18 | +### Goals |
| 19 | + |
| 20 | +#### Phase 1 |
| 21 | + |
| 22 | +- Use existing NFD features via the NodeFeatureRule API to describe container image requirements. |
| 23 | +- Create a new OCI artifact type for compatibility metadata. |
| 24 | +- Allow verification of node compatibilitym including nodes that are not yet part of the k8s cluster. |
| 25 | +- Add or extend the sources with missing features. |
| 26 | + |
| 27 | +#### Phase 2 |
| 28 | + |
| 29 | +Phase 2 involves future prediction and shows the general direction. |
| 30 | +After the completion of Phase 1, either this document should be updated, or a new proposal should be created that considers the following points: |
| 31 | + |
| 32 | +- Update or generate pods with appropriate node selectors via a mutation webhook or a scheduler plugin. |
| 33 | + |
| 34 | +### Non-Goals |
| 35 | + |
| 36 | +- Make image compatibility a hard requirement for the NFD installation/usage. |
| 37 | +- Cover applications ABI compatibility. |
| 38 | + |
| 39 | +## Proposal |
| 40 | + |
| 41 | +Build a new NFD client tool with the following initial scope: |
| 42 | + |
| 43 | +- CRUD OCI artifact. |
| 44 | +- Validate nodes based on provided metadata. |
| 45 | +- Run directly on a host which is not part of the Kubernetes cluster, or run as a Kubernetes job on a Kubernetes node. |
| 46 | + |
| 47 | +### Design Details |
| 48 | + |
| 49 | +#### OCI Artifact |
| 50 | + |
| 51 | +[An OCI artifact](https://github.com/opencontainers/image-spec/blob/main/manifest.md#guidelines-for-artifact-usage) should be created to store image compatibility metadata on the image side. |
| 52 | +The artifact can be connected with an image over [the subject field](https://github.com/opencontainers/distribution-spec/blob/11b8e3fba7d2d7329513d0cff53058243c334858/spec.md#pushing-manifests-with-subject). |
| 53 | + |
| 54 | +##### Manifest |
| 55 | + |
| 56 | +```json |
| 57 | +{ |
| 58 | + "schemaVersion": 2, |
| 59 | + "mediaType": "application/vnd.oci.image.manifest.v1+json", |
| 60 | + "artifactType": "application/vnd.k8s.nfd.image-compatibility.v1", |
| 61 | + "config": { |
| 62 | + "mediaType": "application/vnd.oci.empty.v1+json", |
| 63 | + "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a", |
| 64 | + "size": 2 |
| 65 | + }, |
| 66 | + "layers": [ |
| 67 | + { |
| 68 | + "mediaType": "application/vnd.k8s.nfd.image-compatibility.spec.v1+yaml", |
| 69 | + "digest": "sha256:4a47f8ae4c713906618413cb9795824d09eeadf948729e213a1ba11a1e31d052", |
| 70 | + "size": 1710 |
| 71 | + } |
| 72 | + ], |
| 73 | + "subject": { |
| 74 | + "mediaType": "application/vnd.oci.image.manifest.v1+json", |
| 75 | + "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", |
| 76 | + "size": 7682 |
| 77 | + }, |
| 78 | + "annotations": { |
| 79 | + "oci.opencontainers.image.created": "2024-03-27T08:08:08Z" |
| 80 | + } |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +##### Artifact Payload (Schema) |
| 85 | + |
| 86 | +- **version** - *string* |
| 87 | +This REQUIRED property specifies the version of the API being used. |
| 88 | + |
| 89 | +- **compatibilities** - *array of object* |
| 90 | +This REQUIRED property is a list of compatibility sets. |
| 91 | + |
| 92 | + - **rules** - *object* |
| 93 | + This REQUIRED property is a reference to the spec of [NodeFeatureRule API](https://kubernetes-sigs.github.io/node-feature-discovery/v0.16/usage/custom-resources.html#nodefeaturerule). |
| 94 | + The spec makes it possible to describe image requirements using the discovered features from NFD sources. |
| 95 | + For further reading, please review [the documentation](https://kubernetes-sigs.github.io/node-feature-discovery/v0.16/usage/customization-guide.html#nodefeaturerule-custom-resource). |
| 96 | + |
| 97 | + - **weight** - *int* |
| 98 | + This OPTIONAL property specify the [node affinity weight](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity-weight). |
| 99 | + |
| 100 | + - **tag** - *string* |
| 101 | + This OPTIONAL property allows grouping or dividing of compatibility sets. |
| 102 | + |
| 103 | + - **description** - *string* |
| 104 | + This OPTIONAL property is intended for a brief description of a compatibility set. |
| 105 | + |
| 106 | +Example |
| 107 | + |
| 108 | +```yaml |
| 109 | +version: v1alpha1 |
| 110 | +compatibilities: |
| 111 | +- tag: "prefered" |
| 112 | + weight: 10 |
| 113 | + description: "Prefered node configuration" |
| 114 | + rules: |
| 115 | + - name: "kernel and cpu" |
| 116 | + matchFeatures: |
| 117 | + - feature: kernel.loadedmodule |
| 118 | + matchExpressions: |
| 119 | + vfio-pci: {op: Exists} |
| 120 | + - feature: cpu.model |
| 121 | + matchExpressions: |
| 122 | + vendor_id: {op: In, value: ["Intel", "Amd"]} |
| 123 | +- tag: "fallback" |
| 124 | + weight: 1 |
| 125 | + description: "Minimal required configuration" |
| 126 | + rules: |
| 127 | + - name: "cpu" |
| 128 | + matchFeatures: |
| 129 | + - feature: cpu.model |
| 130 | + matchExpressions: |
| 131 | + vendor_id: {op: In, value: ["Intel", "Amd"]} |
| 132 | +``` |
| 133 | +
|
| 134 | +##### Discovery |
| 135 | +
|
| 136 | +A compatibility artifact shall be associated with either an image index or a specific image via the subject field of the OCI Image Spec. |
| 137 | +The Referrers API should be used to discover artifacts. |
| 138 | +If an image has multiple artifacts, it is up to the client to choose the correct one. |
| 139 | +By default, it is recommended to select the most recent artifact based on the 'created' timestamp. |
| 140 | +
|
| 141 | +#### NFD client |
| 142 | +
|
| 143 | +A new standalone command-line utility should be implemented for the NFD project that shares the same functionality as the [nfd kubectl plugin](https://nfd.sigs.k8s.io/usage/kubectl-plugin). |
| 144 | +Both clients should implemented the following commands: |
| 145 | +
|
| 146 | +- `validate` - validate a NodeFeatureRule object (implemented in kubectl plugin). |
| 147 | +- `test` - test a NodeFeatureRule object against a node (implemented in kubectl plugin). |
| 148 | +- `dryrun` - process a NodeFeatureRule file against a local NodeFeature file to dry run the rule against a node before applying it to a cluster (implemented in kubectl plugin). |
| 149 | +- `compat` - compatibility command with the following subcommands: |
| 150 | + - `attach-spec` - create an artifact with image compatibility specification and attach to the image (initially users have to create the spec by hand). |
| 151 | + - `remove-spec` - remove an artifact with image compatibility specification from the image. |
| 152 | + - `validate-spec` - validate an artifact and image compatibility specification. |
| 153 | + - `validate-node` - validate image compatibility against a node. |
| 154 | + |
| 155 | +### Test Plan |
| 156 | + |
| 157 | +To ensure the proper functioning of the nfd client, the following test plan should be executed: |
| 158 | + |
| 159 | +- **Unit Tests:** Write unit tests for the client. |
| 160 | +- **Manual e2e Tests:** Run nfd client with sample data to CRUD artifact and validate a local host. |
0 commit comments