Skip to content

Commit d49d055

Browse files
committed
spec: describe descriptors and digests
Add a definition for content descriptors and digests, which are used as content identifiers in the OCI image specification. Signed-off-by: Stephen J Day <[email protected]>
1 parent a4abdf4 commit d49d055

File tree

2 files changed

+124
-0
lines changed

2 files changed

+124
-0
lines changed

descriptor.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
<!--[metadata]>
2+
+++
3+
draft = true
4+
+++
5+
<![end-metadata]-->
6+
7+
# OpenContainers Content Descriptors
8+
9+
OCI have several components that come to together to describe an image.
10+
References between components form a [Merkle Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Merkle_tree).
11+
The references in the _Merkle DAG_ are expressed through _Content Descriptors_.
12+
A _Content Descriptor_ or _Descriptor_, describes the disposition of targeted content.
13+
A _Descriptor_ includes the type of content, an independently-verifiable content identifier, known as a "digest" and the byte-size of the raw content.
14+
15+
Descriptors SHOULD be embedded in other formats to securely reference external content.
16+
17+
Other formats SHOULD use descriptors to securely reference external content.
18+
19+
## Properties
20+
21+
The following describe the primary set of properties that make up a _Descriptor_.
22+
23+
- **`mediaType`** *string*
24+
25+
This REQUIRED property contains the MIME type of the referenced object.
26+
27+
- **`digest`** *string*
28+
29+
This REQUIRED property is the _digest_ of the targeted content, meeting the requirements outlined in [Digests and Verification](#digests-and—verification).
30+
Retrieved content SHOULD be verified against this digest when consumed via untrusted sources.
31+
32+
- **`size`** *int*
33+
This REQUIRED property specifies the size in bytes of the blob.
34+
This property exists so that a client will have an expected size for the content before validating.
35+
If the length of the retrieved content does not match the specified length, the content SHOULD NOT be trusted.
36+
37+
### Reserved
38+
39+
The following are field keys that MUST NOT be used in descriptors specified in other OCI specifications:
40+
41+
- **`urls`** *array*
42+
43+
This key is RESERVED for future versions of the specification.
44+
45+
- **`data`** *string*
46+
47+
This key is RESERVED for futures versions of the specification.
48+
49+
All other fields may be included in other OCI specifications.
50+
Extended _Descriptor_ field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification.
51+
52+
## Digests and Verification
53+
54+
The _digest_ component of a _Descriptor_ acts as a content identifier, employing [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage) for the OCI image format.
55+
It uniquely identifies content by taking a collision-resistant hash of the bytes.
56+
Such an identifier can be independently calculated and verified by selection of a common _algorithm_.
57+
If such an identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate it independently and be certain that the correct content was obtained.
58+
Put simply, the identifier is a property of the content.
59+
60+
To disambiguate from other concepts, we call this identifier a _digest_.
61+
A _digest_ is a serialized hash result, consisting of a _algorithm_ and _hex_ portion.
62+
The _algorithm_ identifies the methodology used to calculate the digest, which are shared by implementations.
63+
The _hex_ portion is the hex-encoded result of the hash.
64+
65+
We define a _digest_ string to match the following grammar:
66+
67+
```
68+
digest := algorithm ":" hex
69+
algorithm := /[A-Fa-f0-9_+.-]+/
70+
hex := /[A-Fa-f0-9]+/
71+
```
72+
73+
Some examples of _digests_ include the following:
74+
75+
digest | description |
76+
----------------------------------------------------------------------------------|------------------------------------------------
77+
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | Common sha256 based digest |
78+
79+
Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the _digest_.
80+
The size of the content SHOULD be verified, as well, to protect against [Length Extension Attacks](https://en.wikipedia.org/wiki/Length_extension_attack).
81+
Heavy processing of before calculating a hash SHOULD be avoided.
82+
Implementations MAY employ some canonicalization to ensure stable content identifiers.
83+
84+
### Algorithms
85+
86+
While the _algorithm_ does allow one to implement a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#SHA-256).
87+
88+
Let's use a simple example in pseudo-code to demonstrate a digest calculation:
89+
A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string `<alg>`:
90+
```
91+
let ID(C) = Descriptor.digest
92+
let C = <bytes>
93+
let D = '<alg>:' + EncodeHex(H(C))
94+
let verified = ID(C) == D
95+
```
96+
Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field.
97+
Content `C` is a string of bytes.
98+
Function `H` returns a the hashs of `C` in bytes and is passed to function `EncodeHex` to obtain the _digest_.
99+
The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`.
100+
After verification, the following is true:
101+
102+
```
103+
D == ID(C) == '<alg>:' + EncodeHex(H(C))
104+
```
105+
106+
The _digest_ is confirmed as the content identifier by independently calculating the _digest_.
107+
108+
#### SHA-256
109+
110+
[SHA-256](https://tools.ietf.org/html/rfc4634#page-7) is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics.
111+
Implementations MUST implement SHA-256 digest verification for use in descriptors.
112+
113+
## Examples
114+
115+
The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", of size 7682 bytes:
116+
117+
```json,title=Content%20Descriptor&mediatype=application/vnd.oci.descriptor.v1%2Bjson
118+
{
119+
"mediaType": "application/vnd.oci.image.manifest.v1+json",
120+
"size": 7682,
121+
"digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270"
122+
}
123+
```

media-types.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
The following `mediaType` MIME types are used by the formats described here, and the resources they reference:
44

5+
- `application/vnd.oci.descriptor.v1+json`: [Content Descriptor](descriptor.md)
56
- `application/vnd.oci.image.manifest.list.v1+json`: [Manifest list](manifest.md#manifest-list)
67
- `application/vnd.oci.image.manifest.v1+json`: [Image manifest format](manifest.md#image-manifest)
78
- `application/vnd.oci.image.serialization.rootfs.tar.gzip`: ["Layer", as a gzipped tar archive](serialization.md#creating-an-image-filesystem-changeset)

0 commit comments

Comments
 (0)