|
| 1 | +# MSC2801: Make it explicit that event bodies are untrusted data |
| 2 | + |
| 3 | +As the Matrix Specification stands today, it is easy for client developers to |
| 4 | +assume that all events will be structured as documented. For example, the |
| 5 | +`displayname` key of an [`m.room.member` |
| 6 | +event](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-member) is |
| 7 | +specified to be "`string` or `null`"; and the `info` key of an [`m.image` |
| 8 | +message event](https://matrix.org/docs/spec/client_server/r0.6.1#m-image) must |
| 9 | +be a Javascript object. |
| 10 | + |
| 11 | +In reality, these are not safe assumptions. This MSC proposes that the |
| 12 | +specification be updated to make it clear that developers should treat all |
| 13 | +event data as untrusted. |
| 14 | + |
| 15 | +## Reasons why events may not match the specification |
| 16 | + |
| 17 | +Firstly, let's examine the reasons why such malformed events can exist. Some of |
| 18 | +these reasons may appear to have trivial solutions; these are discussed below. |
| 19 | + |
| 20 | + 1. Most obviously, Synapse as it currently stands does very little validation |
| 21 | + of event bodies sent over the Client-Server API. There is nothing stopping |
| 22 | + any user sending a malformed `m.room.message` event with a simple `curl` |
| 23 | + command or Element-Web's `/devtools`. |
| 24 | + |
| 25 | + Any event sent in this way will be returned to the clients of all the other |
| 26 | + users in the room. |
| 27 | + |
| 28 | + 2. Events may be encrypted. Any client implementing E2EE must be prepared to |
| 29 | + deal with any encrypted content, since by definition a server cannot |
| 30 | + validate it. |
| 31 | + |
| 32 | + 3. In order to allow the Matrix protocol to be extensible, server |
| 33 | + implementations must tolerate unknown event types, and allow them to be |
| 34 | + passed between clients. It is obvious that a pair of custom clients |
| 35 | + implementing a `com.example.special.event` event type cannot rely on a |
| 36 | + standard server implementation to do any validation for them. |
| 37 | + |
| 38 | + However, this problem extends to event types and keys which have been added |
| 39 | + to the specification. For example, the [`m.room.encrypted` event |
| 40 | + type](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-encrypted) |
| 41 | + was added in Client-Server API r0.4.0. It therefore follows that a server |
| 42 | + implementing CS-API r0.3.0 would have no way to validate an |
| 43 | + `m.room.encrypted` event, so if a client is connected to such a server, it |
| 44 | + could receive malformed events. |
| 45 | + |
| 46 | + 4. To extend from point 3, the problem is not even resolved by upgrading the |
| 47 | + server. There may now be rooms which contain historical events which would |
| 48 | + no longer be accepted, but these will still be served by the server. |
| 49 | + |
| 50 | + This problem also applies to non-room data such as account data. For |
| 51 | + example, Client-Server API r0.6.0 added the [`m.identity_server` account |
| 52 | + data event |
| 53 | + type](https://matrix.org/docs/spec/client_server/r0.6.1#m-identity-server). |
| 54 | + It is possible, if unlikely, that a client could have uploaded an |
| 55 | + `m.identity_server` event before the administrator upgraded the server. |
| 56 | + |
| 57 | + 5. Event redaction removes certain keys from an event. This is a bit of a |
| 58 | + trivial case, though it is worth noting that the rules for event redaction |
| 59 | + vary between room versions, so it's possible to see a variety of "partial" |
| 60 | + events. |
| 61 | + |
| 62 | + 6. All the cases above can occur without federation. Federation adds |
| 63 | + additional complexities due to the structure of Matrix rooms. In |
| 64 | + particular, a server implementation cannot simply ignore any malformed |
| 65 | + events since such events may either be critical to the structure of the |
| 66 | + room (for example, they may be `m.room.membership` events), or at the very |
| 67 | + least ignoring them would leave "holes" in the event graph which would |
| 68 | + prevent correct back-pagination. |
| 69 | + |
| 70 | +## Ideas for mitigating the problem |
| 71 | + |
| 72 | +The problems above appear to have some easy solutions. Let's describe some of |
| 73 | +those ideas, before considering the fundamental problem that none of them can |
| 74 | +solve. |
| 75 | + |
| 76 | +### Validate all events when they are submitted |
| 77 | + |
| 78 | +In this idea, we would require all server implementations to strictly validate |
| 79 | +any data which is sent over the Client-Server API, to ensure that it complied |
| 80 | +with the specified formats. |
| 81 | + |
| 82 | +This evidently solves problem 1 above, in that it would prevent local users from |
| 83 | +creating malformed events of any event types that the server supports; however, |
| 84 | +it would do nothing to address any of the other problems. |
| 85 | + |
| 86 | +### Validate events at the point they are served to a client |
| 87 | + |
| 88 | +We could require that server implementations validate any data that they are |
| 89 | +about to serve to a client; we might recommend that malformed data be stripped |
| 90 | +out of the response, or redacted, or similar. |
| 91 | + |
| 92 | +It is worth mentioning that this would be tricky to implement efficiently on |
| 93 | +the server side, but it at least helps address most of the problems above, such |
| 94 | +as historical data, or malformed events received over federation. |
| 95 | + |
| 96 | +### Have servers re-validate data on upgrade |
| 97 | + |
| 98 | +Similar to the previous idea, but rather than validating data each time it is |
| 99 | +served to a client, any stored data could be re-validated to check that it |
| 100 | +complies with new validation requirements. |
| 101 | + |
| 102 | +This could be more efficient in the case that changes to validation rules are |
| 103 | +rare, but it could still be a huge amount of data processing on a large server. |
| 104 | + |
| 105 | +### Create new room versions which require events to be well-formed |
| 106 | + |
| 107 | +As outlined above, one of the big problems in this area is how we deal with |
| 108 | +events sent over federation; in particular, if subsets of the servers in a room |
| 109 | +have different ideas as to which events are "valid", then their concepts of the |
| 110 | +room state can begin to drift, and the room can eventually become |
| 111 | +"split-brained". This makes it hard to simply say, for example, |
| 112 | +"`m.room.member` events with a non-string `displayname` are invalid and should |
| 113 | +not form part of the room state": we have a risk that some servers will accept |
| 114 | +the event, and some will not. |
| 115 | + |
| 116 | +One approach to solving this is via [room |
| 117 | +versions](https://matrix.org/docs/spec/index#room-versions). By specifying that |
| 118 | +a change of rules only applies for a future room version, we can eliminate this |
| 119 | +potential disagreement. |
| 120 | + |
| 121 | +The process of changing a room from one version to another is intrusive, not |
| 122 | +least because it requires that all servers in a room support the new room |
| 123 | +version (or risk being locked out). For that reason, it is extremely |
| 124 | +undesirable that any new feature require a new room version: whenever possible, |
| 125 | +it should be possible to use new features in existing rooms. It therefore |
| 126 | +follows that we cannot rely on room versions to provide validation of event |
| 127 | +data. |
| 128 | + |
| 129 | +### Create a single new room version which applies all event validation rules |
| 130 | + |
| 131 | +This idea is included for completeness, though it is unclear how it would work |
| 132 | +in practice. |
| 133 | + |
| 134 | +It has been suggested that we create a new room version which explicitly states |
| 135 | +that events which fail the current event schema, whatever that is at that |
| 136 | +moment in time, should be rejected. |
| 137 | + |
| 138 | +Let's imagine that in future, the `m.room.member` event schema is extended to |
| 139 | +include an optional `location` key, which, if given, must be a string. The |
| 140 | +implication of this idea is that servers should reject any `m.room.member` |
| 141 | +event whose `location` is not a string. We now have a problem: any servers in |
| 142 | +the room which are updated to the latest spec will reject such malformed |
| 143 | +events, but any other servers yet to be upgraded will allow it. So is that user |
| 144 | +in the room or not? |
| 145 | + |
| 146 | +Even if all the servers in the room can be upgraded at once, what about any |
| 147 | +`m.room.member` events which were sent before the rule change? |
| 148 | + |
| 149 | +## The fundamental problem |
| 150 | + |
| 151 | +The ideas above all mitigate the problems discussed earlier to a greater or |
| 152 | +lesser extent, and may indeed be worth doing on their own merits. However, none |
| 153 | +of them can address the problem of outdated server implementations. |
| 154 | + |
| 155 | +For example, consider the case of a new key being added to an event body, say |
| 156 | +`m.relates_to`. Now, we may have decided as above that all new specced keys |
| 157 | +must be validated by the server, so `vN+1` of Synapse dutifully implements such |
| 158 | +validation and refuses to accept events with a malformed `m.relates_to`. |
| 159 | + |
| 160 | +The problem comes for users whose server is still Synapse `vN`. It knows |
| 161 | +nothing of `m.relates_to`, so accepts and passes it through even if |
| 162 | +malformed. The only potential solution is for clients seeking to implement |
| 163 | +`m.relates_to` to refuse to talk to servers which do not declare support for |
| 164 | +it. |
| 165 | + |
| 166 | +However, this is an entirely client-side feature: it is illogical to require |
| 167 | +that servers must be upgraded before it can be used. Consider that the hosted |
| 168 | +element-web at `https://app.element.io` is upgraded to support the new feature; |
| 169 | +in this scenario, that would lock out any user whose homeserver had not yet |
| 170 | +been upgraded. This is not an acceptable user experience. |
| 171 | + |
| 172 | +In short, we are left with the reality that clients must still handle the |
| 173 | +unvalidated data. |
| 174 | + |
| 175 | +## Conclusions |
| 176 | + |
| 177 | +Short of closely coupling server and client versions - which violates the |
| 178 | +fundamental ethos of the Matrix project - there is nothing that can completely |
| 179 | +prevent clients from having to handle untrusted data. In addition, encrypted |
| 180 | +events eliminate any possibility of server-side validation. |
| 181 | + |
| 182 | +With that in mind, the advantages of the ideas above are diminished. If clients |
| 183 | +must handle untrusted data in some circumstances, why not in all? "You can |
| 184 | +trust the content of this data structure, provided you have checked that the |
| 185 | +server knows how to validate it, in which case you need to treat it as |
| 186 | +untrusted" is not a useful message for a client developer. |
| 187 | + |
| 188 | +It may be possible to assert that specific, known cases can be treated as |
| 189 | +trusted data, but these should be called out as specific cases. The default |
| 190 | +message should be that clients must treat all event data as untrusted. |
0 commit comments