|
| 1 | +# OpenContainers Content Descriptors |
| 2 | + |
| 3 | +OCI have several components that come to together to describe an image. |
| 4 | +References between components form a [Merkle Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Merkle_tree). |
| 5 | +The references in the _Merkle DAG_ are expressed through _Content Descriptors_. |
| 6 | +A _Content Descriptor_ or _Descriptor_, describes the disposition of targeted content. |
| 7 | +A _Descriptor_ includes the type of content, an independently-verifiable content identifier, known as a "digest" and the byte-size of the raw content. |
| 8 | + |
| 9 | +Descriptors SHOULD be embedded in other formats to securely reference external content. |
| 10 | + |
| 11 | +Other formats SHOULD use descriptors to securely reference external content. |
| 12 | + |
| 13 | +## Properties |
| 14 | + |
| 15 | +The following describe the primary set of properties that make up a _Descriptor_. |
| 16 | + |
| 17 | +- **`mediaType`** *string* |
| 18 | + |
| 19 | + This REQUIRED property contains the MIME type of the referenced object. |
| 20 | + |
| 21 | +- **`digest`** *string* |
| 22 | + |
| 23 | + This REQUIRED property is the _digest_ of the targeted content, meeting the requirements outlined in [Digests and Verification](#digests-and—verification). |
| 24 | + Retrieved content SHOULD be verified against this digest when consumed via untrusted sources. |
| 25 | + |
| 26 | +- **`size`** *int* |
| 27 | + This REQUIRED property specifies the size in bytes of the blob. |
| 28 | + This property exists so that a client will have an expected size for the content before validating. |
| 29 | + If the length of the retrieved content does not match the specified length, the content SHOULD NOT be trusted. |
| 30 | + |
| 31 | +### Reserved |
| 32 | + |
| 33 | +The following are field keys that MUST NOT be used in descriptors specified in other OCI specifications: |
| 34 | + |
| 35 | +- **`urls`** *array* |
| 36 | + |
| 37 | + This key is RESERVED for future versions of the specification. |
| 38 | + |
| 39 | +- **`data`** *string* |
| 40 | + |
| 41 | + This key is RESERVED for futures versions of the specification. |
| 42 | + |
| 43 | +All other fields may be included in other OCI specifications. |
| 44 | +Extended _Descriptor_ field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification. |
| 45 | + |
| 46 | +## Digests and Verification |
| 47 | + |
| 48 | +The _digest_ component of a _Descriptor_ acts as a content identifier, employing [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage) for the OCI image format. |
| 49 | +It uniquely identifies content by taking a collision-resistant hash of the bytes. |
| 50 | +Such an identifier can be independently calculated and verified by selection of a common _algorithm_. |
| 51 | +If such an identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate it independently and be certain that the correct content was obtained. |
| 52 | +Put simply, the identifier is a property of the content. |
| 53 | + |
| 54 | +To disambiguate from other concepts, we call this identifier a _digest_. |
| 55 | +A _digest_ is a serialized hash result, consisting of a _algorithm_ and _hex_ portion. |
| 56 | +The _algorithm_ identifies the methodology used to calculate the digest, which are shared by implementations. |
| 57 | +The _hex_ portion is the hex-encoded result of the hash. |
| 58 | + |
| 59 | +We define a _digest_ string to match the following grammar: |
| 60 | + |
| 61 | +``` |
| 62 | +digest := algorithm ":" hex |
| 63 | +algorithm := /[A-Fa-f0-9_+.-]+/ |
| 64 | +hex := /[A-Fa-f0-9]+/ |
| 65 | +``` |
| 66 | + |
| 67 | +Some examples of _digests_ include the following: |
| 68 | + |
| 69 | +digest | description | |
| 70 | +----------------------------------------------------------------------------------|------------------------------------------------ |
| 71 | +sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | Common sha256 based digest | |
| 72 | + |
| 73 | +Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the _digest_. |
| 74 | +The size of the content SHOULD be verified to reduce hash collision space. |
| 75 | +Heavy processing of before calculating a hash SHOULD be avoided. |
| 76 | +Implementations MAY employ some canonicalization to ensure stable content identifiers. |
| 77 | + |
| 78 | +### Algorithms |
| 79 | + |
| 80 | +While the _algorithm_ does allow one to implement a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#SHA-256). |
| 81 | + |
| 82 | +Let's use a simple example in pseudo-code to demonstrate a digest calculation: |
| 83 | +A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string `<alg>`: |
| 84 | +``` |
| 85 | +let ID(C) = Descriptor.digest |
| 86 | +let C = <bytes> |
| 87 | +let D = '<alg>:' + EncodeHex(H(C)) |
| 88 | +let verified = ID(C) == D |
| 89 | +``` |
| 90 | +Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field. |
| 91 | +Content `C` is a string of bytes. |
| 92 | +Function `H` returns a the hashs of `C` in bytes and is passed to function `EncodeHex` to obtain the _digest_. |
| 93 | +The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`. |
| 94 | +After verification, the following is true: |
| 95 | + |
| 96 | +``` |
| 97 | +D == ID(C) == '<alg>:' + EncodeHex(H(C)) |
| 98 | +``` |
| 99 | + |
| 100 | +The _digest_ is confirmed as the content identifier by independently calculating the _digest_. |
| 101 | + |
| 102 | +#### SHA-256 |
| 103 | + |
| 104 | +[SHA-256](https://tools.ietf.org/html/rfc4634#page-7) is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics. |
| 105 | +Implementations MUST implement SHA-256 digest verification for use in descriptors. |
| 106 | + |
| 107 | +## Examples |
| 108 | + |
| 109 | +The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", of size 7682 bytes: |
| 110 | + |
| 111 | +```json,title=Content%20Descriptor&mediatype=application/vnd.oci.descriptor.v1%2Bjson |
| 112 | +{ |
| 113 | + "mediaType": "application/vnd.oci.image.manifest.v1+json", |
| 114 | + "size": 7682, |
| 115 | + "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" |
| 116 | +} |
| 117 | +``` |
0 commit comments