You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: descriptor.md
+32-19Lines changed: 32 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,58 +57,71 @@ The following field keys are reserved and MUST NOT be used by other specificatio
57
57
All other fields may be included in other OCI specifications.
58
58
Extended _Descriptor_ field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification.
59
59
60
-
## Digests and Verification
60
+
## Digests
61
61
62
62
The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
63
63
It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes.
64
-
If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained.
64
+
If the _digest_ can be communicated in a secure manner, one can verify content from an insecure source by recalculating the digest independently, ensuring the content has not been modified.
65
65
66
-
The value of the digest property is a string consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion.
67
-
The algorithm identifier specifies the cryptographic hash function used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash.
66
+
The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion.
67
+
The _algorithm_specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function.
68
68
69
-
The digest string MUST match the following grammar:
`multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8` | Multihash | No |
86
+
`sha256+b64u:LCa0a2j_xo_5m0U8HTBBNBNCLXBkg7-g-YpeiGJm564` | SHA-256 with urlsafe base64 | No|
82
87
83
-
* Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string.
84
-
* Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
85
-
* Heavy processing before calculating a hash SHOULD be avoided.
86
-
* Implementations MAY employ [canonicalization](canonicalization.md) of the underlying content to ensure stable content identifiers.
88
+
Please see [Registered Algorithms](#registered-identifiers) for a list of supported algorithms.
89
+
90
+
Implementations SHOULD allow digests that are unsupported to pass validation if they comply with the above grammar.
91
+
While `sha256` will only use hex encoded digests, support for separators in _algorithm_ and alpha numeric in _encoded_ is included to allow for future extension of digest support.
92
+
As an example, we can paramterize the encoding and algorithm as `multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8`, which would be considered valid but unsupported by this specification.
93
+
94
+
### Verification
95
+
96
+
Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string.
97
+
Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
98
+
Heavy processing before calculating a hash SHOULD be avoided.
99
+
Implementations MAY employ [canonicalization](canonicalization.md) of the underlying content to ensure stable content identifiers.
87
100
88
101
### Digest calculations
89
102
90
103
A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string `<alg>`:
91
104
```
92
105
let ID(C) = Descriptor.digest
93
106
let C = <bytes>
94
-
let D = '<alg>:' + EncodeHex(H(C))
107
+
let D = '<alg>:' + Encode(H(C))
95
108
let verified = ID(C) == D
96
109
```
97
110
Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field.
98
111
Content `C` is a string of bytes.
99
-
Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` and prefixed with the algorithm to obtain the digest.
112
+
Function `H` returns the hash of `C` in bytes and is passed to function `Encode` and prefixed with the algorithm to obtain the digest.
100
113
The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`.
101
114
After verification, the following is true:
102
115
103
116
```
104
-
D == ID(C) == '<alg>:' + EncodeHex(H(C))
117
+
D == ID(C) == '<alg>:' + Encode(H(C))
105
118
```
106
119
107
120
The _digest_ is confirmed as the content identifier by independently calculating the _digest_.
108
121
109
122
### Registered algorithms
110
123
111
-
While the _algorithm_portion (the "algorithm identifier") of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
124
+
While the _algorithm_component of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
112
125
113
126
The following algorithm identifiers are currently defined by this specification:
* Object names in the `blobs` subdirectories are composed of a directory for each hash algorithm, the children of which will contain the actual content.
56
-
* A blob, referenced with digest `<alg>:<hex>` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs/<alg>/<hex>`.
57
-
* The character set of the entry name for `<hex>` and `<alg>` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification).
56
+
* A blob, referenced with digest `<alg>:<encoded>` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs/<alg>/<encoded>`.
57
+
* The character set of the entry name for `<encoded>` and `<alg>` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification).
58
58
* For example `sha256:5b` will map to the layout `blobs/sha256/5b`.
59
59
* The blobs directory MAY contain blobs which are not referenced by any of the [refs](#indexjson-file).
60
60
* The blobs directory MAY be missing referenced blobs, in which case the missing blobs SHOULD be fulfilled by an external blob store.
0 commit comments