|
| 1 | +# Background |
| 2 | + |
| 3 | +## What is the intended use case? |
| 4 | + |
| 5 | +This can be used anywhere digital signatures are needed. |
| 6 | + |
| 7 | +The initial application is for signing software supply chain metadata in [TUF] |
| 8 | +and [in-toto]. |
| 9 | + |
| 10 | +## Why do we need this? |
| 11 | + |
| 12 | +There is no other simple, foolproof signature scheme that we are aware of. |
| 13 | + |
| 14 | +* Raw signatures are too fragile. Every public key must be used for exactly |
| 15 | + one purpose over exactly one message type, lest the system be vulnerable to |
| 16 | + [confusion attacks](#motivation). In many cases, this results in a difficult |
| 17 | + key management problem. |
| 18 | + |
| 19 | +* [TUF] and [in-toto] currently use a scheme that avoids these problems but is |
| 20 | + JSON-specific and relies on [canonicalization](motivation.md), which is an |
| 21 | + unnecessarily large attack surface. |
| 22 | + |
| 23 | +* [JWS] is JSON-specific, complicated, and error-prone. |
| 24 | + |
| 25 | +* [PASETO] is JSON-specific and too opinionated. For example, it mandates |
| 26 | + ed25519 signatures, which may not be useful in all cases. |
| 27 | + |
| 28 | +The intent of this project is to define a minimal signature scheme that avoids |
| 29 | +these issues. |
| 30 | + |
| 31 | +## Design requirements |
| 32 | + |
| 33 | +The [protocol](protocol.md): |
| 34 | + |
| 35 | +* MUST reduce the possibility of a client misinterpreting the payload (e.g. |
| 36 | + interpreting a JSON message as protobuf) |
| 37 | +* MUST support arbitrary payload types (e.g. not just JSON) |
| 38 | +* MUST support arbitrary crypto primitives, libraries, and key management |
| 39 | + systems (e.g. Tink vs openssl, Google KMS vs Amazon KMS) |
| 40 | +* SHOULD avoid depending on canonicalization for security |
| 41 | +* SHOULD NOT require unnecessary encoding (e.g. base64) |
| 42 | +* SHOULD NOT require the verifier to parse the payload before verifying |
| 43 | + |
| 44 | +The [data structure](envelope.md): |
| 45 | + |
| 46 | +* MUST include both message and signature(s) |
| 47 | + * NOTE: Detached signatures are supported by having the included message |
| 48 | + contain a cryptographic hash of the external data. |
| 49 | +* MUST support multiple signatures in one structure / file |
| 50 | +* SHOULD discourage users from reading the payload without verifying the |
| 51 | + signatures |
| 52 | +* SHOULD be easy to parse using common libraries (e.g. JSON) |
| 53 | +* SHOULD support a hint indicating what signing key was used |
| 54 | + |
| 55 | +## Motivation |
| 56 | + |
| 57 | +There are two concerns with the current [in-toto]/[TUF] signature envelope. |
| 58 | + |
| 59 | +First, the signature scheme depends on [Canonical JSON], which has one practical |
| 60 | +problem and two theoretical ones: |
| 61 | + |
| 62 | +1. Practical problem: It requires the payload to be JSON or convertible to |
| 63 | + JSON. While this happens to be true of in-toto and TUF today, a generic |
| 64 | + signature layer should be able to handle arbitrary payloads. |
| 65 | +1. Theoretical problem 1: Two semantically different payloads could have the |
| 66 | + same canonical encoding. Although there are currently no known attacks on |
| 67 | + Canonical JSON, there have been attacks in the past on other |
| 68 | + canonicalization schemes |
| 69 | + ([example](https://latacora.micro.blog/2019/07/24/how-not-to.html#canonicalization)). |
| 70 | + It is safer to avoid canonicalization altogether. |
| 71 | +1. Theoretical problem 2: It requires the verifier to parse the payload before |
| 72 | + verifying, which is both error-prone—too easy to forget to verify—and an |
| 73 | + unnecessarily increased attack surface. |
| 74 | + |
| 75 | +The preferred solution is to transmit the encoded byte stream exactly as it was |
| 76 | +signed, which the verifier verifies before parsing. This is what is done in |
| 77 | +[JWS] and [PASETO], for example. |
| 78 | + |
| 79 | +Second, the scheme does not include an authenticated "context" indicator to |
| 80 | +ensure that the signer and verifier interpret the payload in the same exact way. |
| 81 | +For example, if in-toto were extended to support CBOR and protobuf encoding, the |
| 82 | +signer could get a CI/CD system to produce a CBOR message saying X and then a |
| 83 | +verifier to interpret it as a protobuf message saying Y. While we don't know of |
| 84 | +an exploitable attack on in-toto or TUF today, potential changes could introduce |
| 85 | +such a vulnerability. The signature scheme should be resilient against these |
| 86 | +classes of attacks. See [example attack](hypothetical_signature_attack.ipynb) |
| 87 | +for more details. |
| 88 | + |
| 89 | +## Reasoning |
| 90 | + |
| 91 | +Our goal was to create a signature envelope that is as simple and foolproof as |
| 92 | +possible. Alternatives such as [JWS] are extremely complex and error-prone, |
| 93 | +while others such as [PASETO] are overly specific. (Both are also |
| 94 | +JSON-specific.) We believe our proposal strikes the right balance of simplicity, |
| 95 | +usefulness, and security. |
| 96 | + |
| 97 | +Rationales for specific decisions: |
| 98 | + |
| 99 | +- Why use base64 for payload and sig? |
| 100 | + |
| 101 | + - Because JSON strings do not allow binary data, so we need to either |
| 102 | + encode the data or escape it. Base64 is a standard, reasonably |
| 103 | + space-efficient way of doing so. Protocols that have a first-class |
| 104 | + concept of "bytes", such as protobuf or CBOR, do not need to use base64. |
| 105 | + |
| 106 | +- Why sign raw bytes rather than base64 encoded bytes (as per JWS)? |
| 107 | + |
| 108 | + - Because it's simpler. Base64 is only needed for putting binary data in a |
| 109 | + text field, such as JSON. In other formats, such as protobuf or CBOR, |
| 110 | + base64 isn't needed at all. |
| 111 | + |
| 112 | +- Why does payloadType need to be signed? |
| 113 | + |
| 114 | + - See [Motivation](#motivation). |
| 115 | + |
| 116 | +- Why use PAE? |
| 117 | + |
| 118 | + - Because we need an unambiguous way of serializing two fields, |
| 119 | + payloadType and payload. PAE is already documented and good enough. No |
| 120 | + need to reinvent the wheel. |
| 121 | + |
| 122 | +- Why use a URI for payloadType rather than |
| 123 | + [Media Type](https://www.iana.org/assignments/media-types/media-types.xhtml) |
| 124 | + (a.k.a. MIME type)? |
| 125 | + |
| 126 | + - Because Media Type only indicates how to parse but does not indicate |
| 127 | + purpose, schema, or versioning. If it were just "application/json", for |
| 128 | + example, then every application would need to impose some "type" field |
| 129 | + within the payload, lest we have similar vulnerabilities as if |
| 130 | + payloadType were not signed. |
| 131 | + - Also, URIs don't need to be registered while Media Types do. |
| 132 | + |
| 133 | +- Why not stay backwards compatible by requiring the payload to always be JSON |
| 134 | + with a "_type" field? Then if you want a non-JSON payload, you could simply |
| 135 | + have a field that contains the real payload, e.g. `{"_type":"my-thing", |
| 136 | + "value":"base64…"}`. |
| 137 | + |
| 138 | + 1. It encourages users to add a "_type" field to their payload, which in |
| 139 | + turn: |
| 140 | + - (a) Ties the payload type to the authentication type. Ideally the |
| 141 | + two would be independent. |
| 142 | + - (b) May conflict with other uses of that same field. |
| 143 | + - (c) May require the user to specify type multiple times with |
| 144 | + different field names, e.g. with "@context" for |
| 145 | + [JSON-LD](https://json-ld.org/). |
| 146 | + 2. It would incur double base64 encoding overhead for non-JSON payloads. |
| 147 | + 3. It is more complex than PAE. |
| 148 | + |
| 149 | +## Backwards Compatibility |
| 150 | + |
| 151 | +Backwards compatibility with the old [in-toto]/[TUF] format will be handled by |
| 152 | +the application and explained in the corresponding application-specific change |
| 153 | +proposal, namely [ITE-5](https://github.com/in-toto/ITE/pull/13) for in-toto and |
| 154 | +via the principles laid out in |
| 155 | +[TAP-14](https://github.com/theupdateframework/taps/blob/master/tap14.md) for |
| 156 | +TUF. |
| 157 | + |
| 158 | +Verifiers can differentiate between the |
| 159 | +[old](https://github.com/in-toto/docs/blob/master/in-toto-spec.md#42-file-formats-general-principles) |
| 160 | +and new envelope format by detecting the presence of the `payload` field (new |
| 161 | +format) vs `signed` field (old format). |
| 162 | + |
| 163 | +[Canonical JSON]: http://wiki.laptop.org/go/Canonical_JSON |
| 164 | +[in-toto]: https://in-toto.io |
| 165 | +[JWS]: https://tools.ietf.org/html/rfc7515 |
| 166 | +[PASETO]: https://github.com/paragonie/paseto/blob/master/docs/01-Protocol-Versions/Version2.md#sig |
| 167 | +[TUF]: https://theupdateframework.io |
0 commit comments