Skip to content

Commit 8c8d5e3

Browse files
committed
MSC2716: Incrementally importing history into existing rooms
A proposal for letting ASes specify event parents and timestamps when submitting events, letting them much more effectively insert past conversation history. cc @tulir for feedback, as the main consumer of the ?ts= API today...
1 parent 8eb1c53 commit 8c8d5e3

File tree

1 file changed

+115
-0
lines changed

1 file changed

+115
-0
lines changed
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# MSC2716: Incrementally importing history into existing rooms
2+
3+
## Problem
4+
5+
Matrix has historically been unable to easily import existing history into a
6+
room that already exists. This is a major problem when bridging existing
7+
conversations into Matrix, particularly if the scrollback is being
8+
incrementally or lazily imported.
9+
10+
For instance, an NNTP bridge might work by letting a user join a room that
11+
maps to a given newsgroup, first showing an empty room, and then importing the
12+
most recent 1000 newsgroup posts for that room to flesh out some history. The
13+
bridge might then choose to slowly import additional posts for that newsgroup
14+
in the background, until however many decades of backfill were complete.
15+
Finally, as more archives surface, they might also need to be manually
16+
gradually added into the history of the room - slowly building up the complete
17+
history of the conversations over time.
18+
19+
This is currently not supported because:
20+
* There is no way to set historical room state in a room via the CS or AS API -
21+
you can only edit current room state.
22+
* There is no way to create messages in the context of historical room state in
23+
a room via CS or AS API - you can only create events relative to current room
24+
state.
25+
* There is currently no way to override the timestamp on an event via the AS API.
26+
(We used to have the concept of [timestamp
27+
massaging](https://matrix.org/docs/spec/application_service/r0.1.2#timestamp-massaging),
28+
but it never got properly specified)
29+
30+
## Proposal
31+
32+
1. We let the AS API override the parent(s) of an event when injecting it into
33+
the room, thus letting bridges consciously specify the topological ordering of
34+
the room DAG. We do this by adding a `parent` querystring parameter on the
35+
`PUT /_matrix/client/r0/rooms/{roomId}/send/{eventType}/{txnId}` and
36+
`PUT /_matrix/client/r0/rooms/{roomId}/state/{eventType}/{stateKey}` endpoints.
37+
The `parent` parameter can be repeated multiple times to specify multiple parent
38+
event IDs of the event being submitted. An event must not have more than 20 parents.
39+
If a `parent` parameter is not presented, the server assumes the event is being
40+
appended to the current timeline and calculates the parents as normal. If an
41+
unrecognised event ID is specified as a `parent`, the request fails with a 404.
42+
43+
2. We also let the AS API override ('massage') the `origin_server_ts` timestamp applied
44+
to sent events. We do this by adding a `ts` querystring parameter on the
45+
`PUT /_matrix/client/r0/rooms/{roomId}/send/{eventType}/{txnId}` and
46+
`PUT /_matrix/client/r0/rooms/{roomId}/state/{eventType}/{stateKey}`endpoints, specifying
47+
the value to apply to `origin_server_ts` on the event (UNIX epoch milliseconds).
48+
49+
3. Finally, we can add a optional `"m.historical": true` field to events to
50+
indicate that they are historical at the point of being added to a room, and
51+
as such servers should not serve them to clients via the CS `/sync` API -
52+
instead preferring clients to discover them by paginating scrollback via
53+
`/messages`.
54+
55+
This lets history be injected at the right place topologically in the room. For instance, different eras of the room could
56+
end up as branches off the original `m.room.create` event, each first setting up the contextual room state for that era before
57+
the block of imported history. So, you could end up with something like this:
58+
59+
```
60+
m.room.create
61+
|\
62+
| \___________________________________
63+
| \ \
64+
| \ \
65+
live timeline previous 1000 messages another block of ancient history
66+
```
67+
68+
We consciously don't support the new `parent` and `ts` parameters on the
69+
various helper syntactic-sugar APIs like `/kick` and `/ban`. If a bridge/bot is
70+
smart enough to be faking history, it is already in the business of dealing
71+
with raw events, and should not be using the syntactic sugar APIs.
72+
73+
## Potential issues
74+
75+
There are a bunch of security considerations here - see below.
76+
77+
## Alternatives
78+
79+
We could insist that we use the SS API to import history history in this manner rather than
80+
extending the AS API. However, it seems unnecessarily burdensome to make bridge authors
81+
understand the SS API, especially when we already have so many AS API bridges. Hence these
82+
minor extensions to the existing AS API.
83+
84+
Another way of doing this might be to store the different eras of the room as
85+
different versions of the room, using `m.room.tombstone` events to form a
86+
linked list of the eras. This has the advantage of isolating room state
87+
between different eras of the room, simplifying state resolution calculations
88+
and avoiding risk of any cross-talk. It's also easier to reason about, and
89+
avoids exposing the DAG to bridge developers. However, it would require
90+
better presentation of room versions in clients, and it would require support
91+
for retrospectively specifying the `predecessor` of the current room when you
92+
retrospectively import history. Currently `predecessor` is in the immutable
93+
`m.room.create` event of a room, so cannot be changed retrospectively - and
94+
doing so in a safe and race-free manner sounds Hard.
95+
96+
## Security considerations
97+
98+
This allows an AS to tie the room DAG in knots by specifying inappropriate
99+
event IDs as parents, potentially DoSing the state resolution algorithm, or
100+
triggering undesired state resolution results. This is already possible by the
101+
SS API today however, and given AS API requires the homeserver admin to
102+
explicitly authorise the AS in question, this doesn't feel too bad.
103+
104+
This also makes it much easier for an AS to maliciously spoof history. This
105+
is a bit unavoidable given the nature of the feature, and is also possible
106+
today via SS API.
107+
108+
If the state changes from under us due to importing history, we have no way to
109+
tell the client about it. This is an [existing
110+
bug](https://github.com/matrix-org/synapse/issues/4508) that can be triggered
111+
today by SS API traffic, so is orthogonal to this proposal.
112+
113+
## Unstable prefix
114+
115+
Feels unnecessary.

0 commit comments

Comments
 (0)