Skip to content

Commit d75e562

Browse files
committed
descriptor: make definition of digest consistent
After some changes to the schema to open up the character set and add separators to the digest algorithm, this change set ensures we have a consistent definition for the components of a digest. The specification has been updated to clarify this decision as well as ensure the specification matches the validation components across the board. The portion of a digest known as `hex` is now known as `encoded` to correspond with the wider character set allowed. Signed-off-by: Stephen J Day <[email protected]>
1 parent 2e9f3dd commit d75e562

File tree

5 files changed

+27
-20
lines changed

5 files changed

+27
-20
lines changed

descriptor.md

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -61,24 +61,31 @@ Extended _Descriptor_ field additions proposed in other OCI specifications SHOUL
6161

6262
The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
6363
It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes.
64-
If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained.
64+
If the _digest_ can be communicated in a secure manner, one can verify content from an insecure source by recalculating the digest independently, ensuring the content has not been modified.
6565

66-
The value of the digest property is a string consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion.
67-
The algorithm identifier specifies the cryptographic hash function used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash.
66+
The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion.
67+
The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function.
6868

69-
The digest string MUST match the following grammar:
69+
A digest string MUST match the following grammar:
7070

7171
```
72-
digest := algorithm ":" hex
73-
algorithm := /[a-z0-9_+.-]+/
74-
hex := /[a-f0-9]+/
72+
digest := algorithm ":" encoded
73+
algorithm := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/
74+
encoded := /[a-zA-Z0-9]+/
7575
```
76+
Some example digests include the following:
7677

77-
Some example digest strings include the following:
78+
digest | algorithm | Supported |
79+
------------------------------------------------------------------------|---------------------|-----------|
80+
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) | Yes |
81+
sha512:401b09eab3c013d4ca54922bb802bec8fd5318192b0a75f201d8b3727429080fb337591abd3e44453b954555b7a0812e1081c39b740293f765eae731f5a65ed1 | [SHA-256](#sha-512) | Yes |
82+
multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8` | Multihash | No |
7883

79-
digest string | algorithm |
80-
------------------------------------------------------------------------|---------------------|
81-
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) |
84+
Please see [Registered Algorithms](#registered-identifiers) for a list of supported algorithms.
85+
86+
Implementations SHOULD allow digests that are unsupported to pass validation if they comply with the above grammar.
87+
While `sha256` will only use hex encoded digests, support for separators in _algorithm_ and alpha numeric in _encoded_ is included to allow for future extension of digest support.
88+
As an example, we can paramterize the encoding and algorithm as `multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8`, which would be considered valid but unsupported by this specification.
8289

8390
* Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string.
8491
* Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
@@ -91,24 +98,24 @@ A _digest_ is calculated by the following pseudo-code, where `H` is the selected
9198
```
9299
let ID(C) = Descriptor.digest
93100
let C = <bytes>
94-
let D = '<alg>:' + EncodeHex(H(C))
101+
let D = '<alg>:' + Encode(H(C))
95102
let verified = ID(C) == D
96103
```
97104
Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field.
98105
Content `C` is a string of bytes.
99-
Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` and prefixed with the algorithm to obtain the digest.
106+
Function `H` returns the hash of `C` in bytes and is passed to function `Encode` and prefixed with the algorithm to obtain the digest.
100107
The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`.
101108
After verification, the following is true:
102109

103110
```
104-
D == ID(C) == '<alg>:' + EncodeHex(H(C))
111+
D == ID(C) == '<alg>:' + Encode(H(C))
105112
```
106113

107114
The _digest_ is confirmed as the content identifier by independently calculating the _digest_.
108115

109116
### Registered algorithms
110117

111-
While the _algorithm_ portion (the "algorithm identifier") of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
118+
While the _algorithm_ component of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
112119

113120
The following algorithm identifiers are currently defined by this specification:
114121

image-layout.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,8 @@ afff3924849e458c5ef237db5f89539274d5e609db5db935ed3959c90f1f2d51 ./blobs/sha256/
5353
## Blobs
5454

5555
* Object names in the `blobs` subdirectories are composed of a directory for each hash algorithm, the children of which will contain the actual content.
56-
* A blob, referenced with digest `<alg>:<hex>` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs/<alg>/<hex>`.
57-
* The character set of the entry name for `<hex>` and `<alg>` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification).
56+
* A blob, referenced with digest `<alg>:<encoded>` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs/<alg>/<encoded>`.
57+
* The character set of the entry name for `<encoded>` and `<alg>` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification).
5858
* For example `sha256:5b` will map to the layout `blobs/sha256/5b`.
5959
* The blobs directory MAY contain blobs which are not referenced by any of the [refs](#indexjson-file).
6060
* The blobs directory MAY be missing referenced blobs, in which case the missing blobs SHOULD be fulfilled by an external blob store.

schema/content-descriptor.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"$ref": "defs.json#/definitions/int64"
1414
},
1515
"digest": {
16-
"description": "the cryptographic checksum digest of the object, in the pattern '<hash>:<hexadecimal digest>'",
16+
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<encoded>'",
1717
"$ref": "defs-descriptor.json#/definitions/digest"
1818
},
1919
"urls": {

schema/defs-descriptor.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"pattern": "^[A-Za-z0-9][A-Za-z0-9!#$&-^_.+]{0,126}/[A-Za-z0-9][A-Za-z0-9!#$&-^_.+]{0,126}$"
88
},
99
"digest": {
10-
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<digest>'",
10+
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<encoded>'",
1111
"type": "string",
1212
"pattern": "^[a-z0-9]+(?:[+._-][a-z0-9]+)*:[a-zA-Z0-9]+$"
1313
},

schema/image-index-schema.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
"$ref": "defs.json#/definitions/int64"
3232
},
3333
"digest": {
34-
"description": "the cryptographic checksum digest of the object, in the pattern '<hash>:<hexadecimal digest>'",
34+
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<encoded>'",
3535
"$ref": "defs-descriptor.json#/definitions/digest"
3636
},
3737
"urls": {

0 commit comments

Comments
 (0)