Skip to content

Commit b81d9fc

Browse files
turt2liverichvdh
authored andcommitted
Merge pull request #2801 from matrix-org/rav/proposal/untrusted_event_data
MSC2801: Make it explicit that event bodies are untrusted data
2 parents deb15ac + 52dfaaa commit b81d9fc

File tree

1 file changed

+190
-0
lines changed

1 file changed

+190
-0
lines changed
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# MSC2801: Make it explicit that event bodies are untrusted data
2+
3+
As the Matrix Specification stands today, it is easy for client developers to
4+
assume that all events will be structured as documented. For example, the
5+
`displayname` key of an [`m.room.member`
6+
event](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-member) is
7+
specified to be "`string` or `null`"; and the `info` key of an [`m.image`
8+
message event](https://matrix.org/docs/spec/client_server/r0.6.1#m-image) must
9+
be a Javascript object.
10+
11+
In reality, these are not safe assumptions. This MSC proposes that the
12+
specification be updated to make it clear that developers should treat all
13+
event data as untrusted.
14+
15+
## Reasons why events may not match the specification
16+
17+
Firstly, let's examine the reasons why such malformed events can exist. Some of
18+
these reasons may appear to have trivial solutions; these are discussed below.
19+
20+
1. Most obviously, Synapse as it currently stands does very little validation
21+
of event bodies sent over the Client-Server API. There is nothing stopping
22+
any user sending a malformed `m.room.message` event with a simple `curl`
23+
command or Element-Web's `/devtools`.
24+
25+
Any event sent in this way will be returned to the clients of all the other
26+
users in the room.
27+
28+
2. Events may be encrypted. Any client implementing E2EE must be prepared to
29+
deal with any encrypted content, since by definition a server cannot
30+
validate it.
31+
32+
3. In order to allow the Matrix protocol to be extensible, server
33+
implementations must tolerate unknown event types, and allow them to be
34+
passed between clients. It is obvious that a pair of custom clients
35+
implementing a `com.example.special.event` event type cannot rely on a
36+
standard server implementation to do any validation for them.
37+
38+
However, this problem extends to event types and keys which have been added
39+
to the specification. For example, the [`m.room.encrypted` event
40+
type](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-encrypted)
41+
was added in Client-Server API r0.4.0. It therefore follows that a server
42+
implementing CS-API r0.3.0 would have no way to validate an
43+
`m.room.encrypted` event, so if a client is connected to such a server, it
44+
could receive malformed events.
45+
46+
4. To extend from point 3, the problem is not even resolved by upgrading the
47+
server. There may now be rooms which contain historical events which would
48+
no longer be accepted, but these will still be served by the server.
49+
50+
This problem also applies to non-room data such as account data. For
51+
example, Client-Server API r0.6.0 added the [`m.identity_server` account
52+
data event
53+
type](https://matrix.org/docs/spec/client_server/r0.6.1#m-identity-server).
54+
It is possible, if unlikely, that a client could have uploaded an
55+
`m.identity_server` event before the administrator upgraded the server.
56+
57+
5. Event redaction removes certain keys from an event. This is a bit of a
58+
trivial case, though it is worth noting that the rules for event redaction
59+
vary between room versions, so it's possible to see a variety of "partial"
60+
events.
61+
62+
6. All the cases above can occur without federation. Federation adds
63+
additional complexities due to the structure of Matrix rooms. In
64+
particular, a server implementation cannot simply ignore any malformed
65+
events since such events may either be critical to the structure of the
66+
room (for example, they may be `m.room.membership` events), or at the very
67+
least ignoring them would leave "holes" in the event graph which would
68+
prevent correct back-pagination.
69+
70+
## Ideas for mitigating the problem
71+
72+
The problems above appear to have some easy solutions. Let's describe some of
73+
those ideas, before considering the fundamental problem that none of them can
74+
solve.
75+
76+
### Validate all events when they are submitted
77+
78+
In this idea, we would require all server implementations to strictly validate
79+
any data which is sent over the Client-Server API, to ensure that it complied
80+
with the specified formats.
81+
82+
This evidently solves problem 1 above, in that it would prevent local users from
83+
creating malformed events of any event types that the server supports; however,
84+
it would do nothing to address any of the other problems.
85+
86+
### Validate events at the point they are served to a client
87+
88+
We could require that server implementations validate any data that they are
89+
about to serve to a client; we might recommend that malformed data be stripped
90+
out of the response, or redacted, or similar.
91+
92+
It is worth mentioning that this would be tricky to implement efficiently on
93+
the server side, but it at least helps address most of the problems above, such
94+
as historical data, or malformed events received over federation.
95+
96+
### Have servers re-validate data on upgrade
97+
98+
Similar to the previous idea, but rather than validating data each time it is
99+
served to a client, any stored data could be re-validated to check that it
100+
complies with new validation requirements.
101+
102+
This could be more efficient in the case that changes to validation rules are
103+
rare, but it could still be a huge amount of data processing on a large server.
104+
105+
### Create new room versions which require events to be well-formed
106+
107+
As outlined above, one of the big problems in this area is how we deal with
108+
events sent over federation; in particular, if subsets of the servers in a room
109+
have different ideas as to which events are "valid", then their concepts of the
110+
room state can begin to drift, and the room can eventually become
111+
"split-brained". This makes it hard to simply say, for example,
112+
"`m.room.member` events with a non-string `displayname` are invalid and should
113+
not form part of the room state": we have a risk that some servers will accept
114+
the event, and some will not.
115+
116+
One approach to solving this is via [room
117+
versions](https://matrix.org/docs/spec/index#room-versions). By specifying that
118+
a change of rules only applies for a future room version, we can eliminate this
119+
potential disagreement.
120+
121+
The process of changing a room from one version to another is intrusive, not
122+
least because it requires that all servers in a room support the new room
123+
version (or risk being locked out). For that reason, it is extremely
124+
undesirable that any new feature require a new room version: whenever possible,
125+
it should be possible to use new features in existing rooms. It therefore
126+
follows that we cannot rely on room versions to provide validation of event
127+
data.
128+
129+
### Create a single new room version which applies all event validation rules
130+
131+
This idea is included for completeness, though it is unclear how it would work
132+
in practice.
133+
134+
It has been suggested that we create a new room version which explicitly states
135+
that events which fail the current event schema, whatever that is at that
136+
moment in time, should be rejected.
137+
138+
Let's imagine that in future, the `m.room.member` event schema is extended to
139+
include an optional `location` key, which, if given, must be a string. The
140+
implication of this idea is that servers should reject any `m.room.member`
141+
event whose `location` is not a string. We now have a problem: any servers in
142+
the room which are updated to the latest spec will reject such malformed
143+
events, but any other servers yet to be upgraded will allow it. So is that user
144+
in the room or not?
145+
146+
Even if all the servers in the room can be upgraded at once, what about any
147+
`m.room.member` events which were sent before the rule change?
148+
149+
## The fundamental problem
150+
151+
The ideas above all mitigate the problems discussed earlier to a greater or
152+
lesser extent, and may indeed be worth doing on their own merits. However, none
153+
of them can address the problem of outdated server implementations.
154+
155+
For example, consider the case of a new key being added to an event body, say
156+
`m.relates_to`. Now, we may have decided as above that all new specced keys
157+
must be validated by the server, so `vN+1` of Synapse dutifully implements such
158+
validation and refuses to accept events with a malformed `m.relates_to`.
159+
160+
The problem comes for users whose server is still Synapse `vN`. It knows
161+
nothing of `m.relates_to`, so accepts and passes it through even if
162+
malformed. The only potential solution is for clients seeking to implement
163+
`m.relates_to` to refuse to talk to servers which do not declare support for
164+
it.
165+
166+
However, this is an entirely client-side feature: it is illogical to require
167+
that servers must be upgraded before it can be used. Consider that the hosted
168+
element-web at `https://app.element.io` is upgraded to support the new feature;
169+
in this scenario, that would lock out any user whose homeserver had not yet
170+
been upgraded. This is not an acceptable user experience.
171+
172+
In short, we are left with the reality that clients must still handle the
173+
unvalidated data.
174+
175+
## Conclusions
176+
177+
Short of closely coupling server and client versions - which violates the
178+
fundamental ethos of the Matrix project - there is nothing that can completely
179+
prevent clients from having to handle untrusted data. In addition, encrypted
180+
events eliminate any possibility of server-side validation.
181+
182+
With that in mind, the advantages of the ideas above are diminished. If clients
183+
must handle untrusted data in some circumstances, why not in all? "You can
184+
trust the content of this data structure, provided you have checked that the
185+
server knows how to validate it, in which case you need to treat it as
186+
untrusted" is not a useful message for a client developer.
187+
188+
It may be possible to assert that specific, known cases can be treated as
189+
trusted data, but these should be called out as specific cases. The default
190+
message should be that clients must treat all event data as untrusted.

0 commit comments

Comments
 (0)