|
| 1 | +# MSC2716: Incrementally importing history into existing rooms |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +Matrix has historically been unable to easily import existing history into a |
| 6 | +room that already exists. This is a major problem when bridging existing |
| 7 | +conversations into Matrix, particularly if the scrollback is being |
| 8 | +incrementally or lazily imported. |
| 9 | + |
| 10 | +For instance, an NNTP bridge might work by letting a user join a room that |
| 11 | +maps to a given newsgroup, first showing an empty room, and then importing the |
| 12 | +most recent 1000 newsgroup posts for that room to flesh out some history. The |
| 13 | +bridge might then choose to slowly import additional posts for that newsgroup |
| 14 | +in the background, until however many decades of backfill were complete. |
| 15 | +Finally, as more archives surface, they might also need to be manually |
| 16 | +gradually added into the history of the room - slowly building up the complete |
| 17 | +history of the conversations over time. |
| 18 | + |
| 19 | +This is currently not supported because: |
| 20 | + * There is no way to set historical room state in a room via the CS or AS API - |
| 21 | + you can only edit current room state. |
| 22 | + * There is no way to create messages in the context of historical room state in |
| 23 | + a room via CS or AS API - you can only create events relative to current room |
| 24 | + state. |
| 25 | + * There is currently no way to override the timestamp on an event via the AS API. |
| 26 | + (We used to have the concept of [timestamp |
| 27 | + massaging](https://matrix.org/docs/spec/application_service/r0.1.2#timestamp-massaging), |
| 28 | + but it never got properly specified) |
| 29 | + |
| 30 | +## Proposal |
| 31 | + |
| 32 | + 1. We let the AS API override the parent(s) of an event when injecting it into |
| 33 | + the room, thus letting bridges consciously specify the topological ordering of |
| 34 | + the room DAG. We do this by adding a `parent` querystring parameter on the |
| 35 | + `PUT /_matrix/client/r0/rooms/{roomId}/send/{eventType}/{txnId}` and |
| 36 | + `PUT /_matrix/client/r0/rooms/{roomId}/state/{eventType}/{stateKey}` endpoints. |
| 37 | + The `parent` parameter can be repeated multiple times to specify multiple parent |
| 38 | + event IDs of the event being submitted. An event must not have more than 20 parents. |
| 39 | + If a `parent` parameter is not presented, the server assumes the event is being |
| 40 | + appended to the current timeline and calculates the parents as normal. If an |
| 41 | + unrecognised event ID is specified as a `parent`, the request fails with a 404. |
| 42 | + |
| 43 | + 2. We also let the AS API override ('massage') the `origin_server_ts` timestamp applied |
| 44 | + to sent events. We do this by adding a `ts` querystring parameter on the |
| 45 | + `PUT /_matrix/client/r0/rooms/{roomId}/send/{eventType}/{txnId}` and |
| 46 | + `PUT /_matrix/client/r0/rooms/{roomId}/state/{eventType}/{stateKey}`endpoints, specifying |
| 47 | + the value to apply to `origin_server_ts` on the event (UNIX epoch milliseconds). |
| 48 | + |
| 49 | + 3. Finally, we can add a optional `"m.historical": true` field to events to |
| 50 | + indicate that they are historical at the point of being added to a room, and |
| 51 | + as such servers should not serve them to clients via the CS `/sync` API - |
| 52 | + instead preferring clients to discover them by paginating scrollback via |
| 53 | + `/messages`. |
| 54 | + |
| 55 | +This lets history be injected at the right place topologically in the room. For instance, different eras of the room could |
| 56 | +end up as branches off the original `m.room.create` event, each first setting up the contextual room state for that era before |
| 57 | +the block of imported history. So, you could end up with something like this: |
| 58 | + |
| 59 | +``` |
| 60 | +m.room.create |
| 61 | + |\ |
| 62 | + | \___________________________________ |
| 63 | + | \ \ |
| 64 | + | \ \ |
| 65 | +live timeline previous 1000 messages another block of ancient history |
| 66 | +``` |
| 67 | + |
| 68 | +We consciously don't support the new `parent` and `ts` parameters on the |
| 69 | +various helper syntactic-sugar APIs like `/kick` and `/ban`. If a bridge/bot is |
| 70 | +smart enough to be faking history, it is already in the business of dealing |
| 71 | +with raw events, and should not be using the syntactic sugar APIs. |
| 72 | + |
| 73 | +## Potential issues |
| 74 | + |
| 75 | +There are a bunch of security considerations here - see below. |
| 76 | + |
| 77 | +## Alternatives |
| 78 | + |
| 79 | +We could insist that we use the SS API to import history history in this manner rather than |
| 80 | +extending the AS API. However, it seems unnecessarily burdensome to make bridge authors |
| 81 | +understand the SS API, especially when we already have so many AS API bridges. Hence these |
| 82 | +minor extensions to the existing AS API. |
| 83 | + |
| 84 | +Another way of doing this might be to store the different eras of the room as |
| 85 | +different versions of the room, using `m.room.tombstone` events to form a |
| 86 | +linked list of the eras. This has the advantage of isolating room state |
| 87 | +between different eras of the room, simplifying state resolution calculations |
| 88 | +and avoiding risk of any cross-talk. It's also easier to reason about, and |
| 89 | +avoids exposing the DAG to bridge developers. However, it would require |
| 90 | +better presentation of room versions in clients, and it would require support |
| 91 | +for retrospectively specifying the `predecessor` of the current room when you |
| 92 | +retrospectively import history. Currently `predecessor` is in the immutable |
| 93 | +`m.room.create` event of a room, so cannot be changed retrospectively - and |
| 94 | +doing so in a safe and race-free manner sounds Hard. |
| 95 | + |
| 96 | +## Security considerations |
| 97 | + |
| 98 | +This allows an AS to tie the room DAG in knots by specifying inappropriate |
| 99 | +event IDs as parents, potentially DoSing the state resolution algorithm, or |
| 100 | +triggering undesired state resolution results. This is already possible by the |
| 101 | +SS API today however, and given AS API requires the homeserver admin to |
| 102 | +explicitly authorise the AS in question, this doesn't feel too bad. |
| 103 | + |
| 104 | +This also makes it much easier for an AS to maliciously spoof history. This |
| 105 | +is a bit unavoidable given the nature of the feature, and is also possible |
| 106 | +today via SS API. |
| 107 | + |
| 108 | +If the state changes from under us due to importing history, we have no way to |
| 109 | +tell the client about it. This is an [existing |
| 110 | +bug](https://github.com/matrix-org/synapse/issues/4508) that can be triggered |
| 111 | +today by SS API traffic, so is orthogonal to this proposal. |
| 112 | + |
| 113 | +## Unstable prefix |
| 114 | + |
| 115 | +Feels unnecessary. |
0 commit comments