-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix historical messages backfilling in random order on remote homeservers (MSC2716) #11114
Changes from 6 commits
f30302d
438e222
4983739
a64bb2e
260ca06
886071b
477c15d
4191f56
f39c1da
7da8012
69dfa16
83474d9
1263c7e
ee47878
5bfde7b
2fbe3f1
1d3f417
4a12304
9a6d8fa
3e09d49
5afc264
6ea263b
3d387f9
fb8e281
c772b35
e0ff66d
76d454f
3529449
321f9ea
15c3282
5db717a
e96fd5c
f3b7b3e
7f2105a
246278e
ec35be5
bc0ba8c
363aed6
d771fbd
b559e23
6b64184
1d00043
b071426
ec33a40
b99efa8
3810ae1
df2a152
cc4eb72
47590bb
a38befa
033360a
3f22e42
e5670ff
023bd3e
b3fcffb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Fix [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) historical messages backfilling in random order on remote homeservers. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -416,6 +416,18 @@ async def backfill( | |
| events = await self._federation_client.backfill( | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @erikjohnston After experimenting with bigger datasets using the Gitter import script and writing bigger Complement tests, I'm seeing problems with our whole With all of the massaging I'm trying to do in this PR to make it work for the basic scrollback case, it won't solve all of the problems. Technically, it shouldn't really matter how Even if we can backfill messages in order, it still doesn't guarantee the same This will be even more of an exact use case when we add the MSC3030 jump to date API endpoint so the static archives can navigate and jump to a certain date. The real solution around event ordering looks related to these issues where we don't rely on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can we sort the events returned by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That may help for the specific backfill case I'm sorta optimizing here in this PR where the user is scrolling back in time sequentially. But it doesn't help the case where you jump back in time via permalink or jump to date endpoint and start the That's the most obvious case of it going wrong, but could also go wrong in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking of doing the sorted_topologically on the receiving side? That way it doesn't matter what order the other server returns them in. Ideally ideally, we wouldn't use depths and instead use an "online topological sorting" algorithm (e.g. Katriel–Bodlaender algorithm) to ensure that we correctly handle disjoint chunks of the graph becoming connected. But that is fiddly to do and I'm not quite sure how things like pagination tokens would work if the ordering could change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think sorting on both the sending and receiving sides would be good to try to ensure best chance of it working (just for server differences between sending and receiving). But it's still not bullet-proof in the permalink case where
Does the chunking you worked on achieve this? #3785 Worth reviving? Any good references for the Katriel–Bodlaender algorithm? Something more grokable over text PDF 😵 -- maybe how that algorithm works doesn't matter and the aspect I'm curious about is how we store this topological order and how it's updated if we insert an event in the middle. I assume it's more than just I see matrix-org/gomatrixserverlib#187 which discusses the "online topological sorting" and additionally the chunking stuff.
For reference: current pagination tokens, https://github.com/matrix-org/matrix-doc/issues/574
Could pagination tokens just become the event ID itself and then you just use the direction to paginate from there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I have not found any helpful references to that algorithm beyond compsci papers. The actual algorithm is relatively fine, the major problem is picking a data structure for the labels that satisfy the necessary properties and is efficient to store in the DB:
While they suggest just using integers, that is a bit problematic for us as that would require potentially rewriting a lot of rows in the DB. There's no reason that they need to be integers, the previous work used fractions to allow inserting nodes in between existing nodes without renumbering everything (and also operated on chunks rather than individual events, iirc).
It would need to be a set of events to handle forks in the graph, annoyingly, as you basically need to serialise the state of a breadth-first search algorithm, iyswim. I.e. if you have a DAG There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be valuable to resurrect the chunking project fwiw, its just a faff without concrete benefits to end users so its just never been prioritised. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The conversation for topological sorting when we receive backfill events continues in #11114 (comment) For chunking and online topological sorting, we discussed this in an out of band call:
|
||
| dest, room_id, limit=limit, extremities=extremities | ||
| ) | ||
| logger.info( | ||
| "from remote server: got backfill response events=%s", | ||
| [ | ||
| { | ||
| "event_id": ev.event_id, | ||
| "type": ev["type"], | ||
| "depth": ev["depth"], | ||
| "content": ev["content"].get("body", None), | ||
| } | ||
| for ev in events | ||
| ], | ||
| ) | ||
|
|
||
| if not events: | ||
| return | ||
|
|
@@ -429,7 +441,39 @@ async def backfill( | |
| f"room {ev.room_id}, when we were backfilling in {room_id}" | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ) | ||
|
|
||
| await self._process_pulled_events(dest, events, backfilled=True) | ||
| await self._process_pulled_events( | ||
| dest, | ||
| # The /backfill response should start from `?v` and include the | ||
| # events that preceded it (so the list will be newest -> oldest). We | ||
| # reverse that order so the messages are oldest -> newest and we can | ||
| # persist the backfilled events without constantly have to go fetch | ||
| # missing prev_events which are probably included in the same | ||
| # backfill chunk. | ||
| # TODO: If we try to reverse this list, the stream_ordering will be backwards | ||
| # reversed(events), | ||
MadLittleMods marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| events, | ||
| backfilled=True, | ||
| ) | ||
|
|
||
| for ev in events: | ||
| event_after_persisted = await self._store.get_event( | ||
| ev.event_id, allow_none=True | ||
| ) | ||
|
|
||
| if event_after_persisted: | ||
| logger.info( | ||
| "from remote server: processed backfilled event_id=%s type=%s depth=%s stream_ordering=%s content=%s", | ||
| ev.event_id, | ||
| event_after_persisted["type"], | ||
| event_after_persisted["depth"], | ||
| event_after_persisted.internal_metadata.stream_ordering, | ||
| event_after_persisted["content"].get("body", None), | ||
| ) | ||
| else: | ||
| logger.info( | ||
| "from remote server: processed backfilled event_id=%s failed to lookup", | ||
| ev.event_id, | ||
| ) | ||
|
|
||
| async def _get_missing_events_for_pdu( | ||
| self, origin: str, pdu: EventBase, prevs: Set[str], min_depth: int | ||
|
|
@@ -1229,7 +1273,12 @@ def prep(event: EventBase) -> Optional[Tuple[EventBase, EventContext]]: | |
| return event, context | ||
|
|
||
| events_to_persist = (x for x in (prep(event) for event in fetched_events) if x) | ||
| await self.persist_events_and_notify(room_id, tuple(events_to_persist)) | ||
| await self.persist_events_and_notify( | ||
| room_id, | ||
| tuple(events_to_persist), | ||
| # TODO: Maybe this to get fetched missing events during backfill as backfilled also :/ | ||
| backfilled=True, | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ) | ||
|
|
||
| async def _check_event_auth( | ||
| self, | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.