Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit fef2e79

Browse files
Fix historical messages backfilling in random order on remote homeservers (MSC2716) (#11114)
Fix #11091 Fix #10764 (side-stepping the issue because we no longer have to deal with `fake_prev_event_id`) 1. Made the `/backfill` response return messages in `(depth, stream_ordering)` order (previously only sorted by `depth`) - Technically, it shouldn't really matter how `/backfill` returns things but I'm just trying to make the `stream_ordering` a little more consistent from the origin to the remote homeservers in order to get the order of messages from `/messages` consistent ([sorted by `(topological_ordering, stream_ordering)`](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)). - Even now that we return backfilled messages in order, it still doesn't guarantee the same `stream_ordering` (and more importantly the [`/messages` order](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in the `stream_order` and totally throw off the sort. - This will be even more the case when we add the [MSC3030 jump to date API endpoint](matrix-org/matrix-spec-proposals#3030) so the static archives can navigate and jump to a certain date. - We're solving this in the future by switching to [online topological ordering](matrix-org/gomatrixserverlib#187) and [chunking](#3785) which by its nature will apply retroactively to fix any inconsistencies introduced by people permalinking 2. As we're navigating `prev_events` to return in `/backfill`, we order by `depth` first (newest -> oldest) and now also tie-break based on the `stream_ordering` (newest -> oldest). This is technically important because MSC2716 inserts a bunch of historical messages at the same `depth` so it's best to be prescriptive about which ones we should process first. In reality, I think the code already looped over the historical messages as expected because the database is already in order. 3. Making the historical state chain and historical event chain float on their own by having no `prev_events` instead of a fake `prev_event` which caused backfill to get clogged with an unresolvable event. Fixes #11091 and #10764 4. We no longer find connected insertion events by finding a potential `prev_event` connection to the current event we're iterating over. We now solely rely on marker events which when processed, add the insertion event as an extremity and the federating homeserver can ask about it when time calls. - Related discussion, #11114 (comment) Before | After --- | --- ![](https://user-images.githubusercontent.com/558581/139218681-b465c862-5c49-4702-a59e-466733b0cf45.png) | ![](https://user-images.githubusercontent.com/558581/146453159-a1609e0a-8324-439d-ae44-e4bce43ac6d1.png) #### Why aren't we sorting topologically when receiving backfill events? > The main reason we're going to opt to not sort topologically when receiving backfill events is because it's probably best to do whatever is easiest to make it just work. People will probably have opinions once they look at [MSC2716](matrix-org/matrix-spec-proposals#2716) which could change whatever implementation anyway. > > As mentioned, ideally we would do this but code necessary to make the fake edges but it gets confusing and gives an impression of “just whyyyy” (feels icky). This problem also dissolves with online topological ordering. > > -- #11114 (comment) See #11114 (comment) for the technical difficulties
1 parent cf06783 commit fef2e79

File tree

9 files changed

+342
-149
lines changed

9 files changed

+342
-149
lines changed

changelog.d/11114.bugfix

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fix [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) historical messages backfilling in random order on remote homeservers.

synapse/handlers/federation.py

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -166,9 +166,14 @@ async def _maybe_backfill_inner(
166166
oldest_events_with_depth = (
167167
await self.store.get_oldest_event_ids_with_depth_in_room(room_id)
168168
)
169-
insertion_events_to_be_backfilled = (
170-
await self.store.get_insertion_event_backwards_extremities_in_room(room_id)
171-
)
169+
170+
insertion_events_to_be_backfilled: Dict[str, int] = {}
171+
if self.hs.config.experimental.msc2716_enabled:
172+
insertion_events_to_be_backfilled = (
173+
await self.store.get_insertion_event_backward_extremities_in_room(
174+
room_id
175+
)
176+
)
172177
logger.debug(
173178
"_maybe_backfill_inner: extremities oldest_events_with_depth=%s insertion_events_to_be_backfilled=%s",
174179
oldest_events_with_depth,
@@ -271,11 +276,12 @@ async def _maybe_backfill_inner(
271276
]
272277

273278
logger.debug(
274-
"room_id: %s, backfill: current_depth: %s, limit: %s, max_depth: %s, extrems: %s filtered_sorted_extremeties_tuple: %s",
279+
"room_id: %s, backfill: current_depth: %s, limit: %s, max_depth: %s, extrems (%d): %s filtered_sorted_extremeties_tuple: %s",
275280
room_id,
276281
current_depth,
277282
limit,
278283
max_depth,
284+
len(sorted_extremeties_tuple),
279285
sorted_extremeties_tuple,
280286
filtered_sorted_extremeties_tuple,
281287
)
@@ -1047,6 +1053,19 @@ async def on_backfill_request(
10471053
limit = min(limit, 100)
10481054

10491055
events = await self.store.get_backfill_events(room_id, pdu_list, limit)
1056+
logger.debug(
1057+
"on_backfill_request: backfill events=%s",
1058+
[
1059+
"event_id=%s,depth=%d,body=%s,prevs=%s\n"
1060+
% (
1061+
event.event_id,
1062+
event.depth,
1063+
event.content.get("body", event.type),
1064+
event.prev_event_ids(),
1065+
)
1066+
for event in events
1067+
],
1068+
)
10501069

10511070
events = await filter_events_for_server(self.storage, origin, events)
10521071

synapse/handlers/federation_event.py

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,11 @@ async def backfill(
508508
f"room {ev.room_id}, when we were backfilling in {room_id}"
509509
)
510510

511-
await self._process_pulled_events(dest, events, backfilled=True)
511+
await self._process_pulled_events(
512+
dest,
513+
events,
514+
backfilled=True,
515+
)
512516

513517
async def _get_missing_events_for_pdu(
514518
self, origin: str, pdu: EventBase, prevs: Set[str], min_depth: int
@@ -626,11 +630,24 @@ async def _process_pulled_events(
626630
backfilled: True if this is part of a historical batch of events (inhibits
627631
notification to clients, and validation of device keys.)
628632
"""
633+
logger.debug(
634+
"processing pulled backfilled=%s events=%s",
635+
backfilled,
636+
[
637+
"event_id=%s,depth=%d,body=%s,prevs=%s\n"
638+
% (
639+
event.event_id,
640+
event.depth,
641+
event.content.get("body", event.type),
642+
event.prev_event_ids(),
643+
)
644+
for event in events
645+
],
646+
)
629647

630648
# We want to sort these by depth so we process them and
631649
# tell clients about them in order.
632650
sorted_events = sorted(events, key=lambda x: x.depth)
633-
634651
for ev in sorted_events:
635652
with nested_logging_context(ev.event_id):
636653
await self._process_pulled_event(origin, ev, backfilled=backfilled)
@@ -992,6 +1009,8 @@ async def _process_received_pdu(
9921009

9931010
await self._run_push_actions_and_persist_event(event, context, backfilled)
9941011

1012+
await self._handle_marker_event(origin, event)
1013+
9951014
if backfilled or context.rejected:
9961015
return
9971016

@@ -1071,8 +1090,6 @@ async def _process_received_pdu(
10711090
event.sender,
10721091
)
10731092

1074-
await self._handle_marker_event(origin, event)
1075-
10761093
async def _resync_device(self, sender: str) -> None:
10771094
"""We have detected that the device list for the given user may be out
10781095
of sync, so we try and resync them.
@@ -1323,7 +1340,14 @@ def prep(event: EventBase) -> Optional[Tuple[EventBase, EventContext]]:
13231340
return event, context
13241341

13251342
events_to_persist = (x for x in (prep(event) for event in fetched_events) if x)
1326-
await self.persist_events_and_notify(room_id, tuple(events_to_persist))
1343+
await self.persist_events_and_notify(
1344+
room_id,
1345+
tuple(events_to_persist),
1346+
# Mark these events backfilled as they're historic events that will
1347+
# eventually be backfilled. For example, missing events we fetch
1348+
# during backfill should be marked as backfilled as well.
1349+
backfilled=True,
1350+
)
13271351

13281352
async def _check_event_auth(
13291353
self,

synapse/handlers/message.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -490,12 +490,12 @@ async def create_event(
490490
requester: Requester,
491491
event_dict: dict,
492492
txn_id: Optional[str] = None,
493+
allow_no_prev_events: bool = False,
493494
prev_event_ids: Optional[List[str]] = None,
494495
auth_event_ids: Optional[List[str]] = None,
495496
require_consent: bool = True,
496497
outlier: bool = False,
497498
historical: bool = False,
498-
allow_no_prev_events: bool = False,
499499
depth: Optional[int] = None,
500500
) -> Tuple[EventBase, EventContext]:
501501
"""
@@ -510,6 +510,10 @@ async def create_event(
510510
requester
511511
event_dict: An entire event
512512
txn_id
513+
allow_no_prev_events: Whether to allow this event to be created an empty
514+
list of prev_events. Normally this is prohibited just because most
515+
events should have a prev_event and we should only use this in special
516+
cases like MSC2716.
513517
prev_event_ids:
514518
the forward extremities to use as the prev_events for the
515519
new event.
@@ -604,10 +608,10 @@ async def create_event(
604608
event, context = await self.create_new_client_event(
605609
builder=builder,
606610
requester=requester,
611+
allow_no_prev_events=allow_no_prev_events,
607612
prev_event_ids=prev_event_ids,
608613
auth_event_ids=auth_event_ids,
609614
depth=depth,
610-
allow_no_prev_events=allow_no_prev_events,
611615
)
612616

613617
# In an ideal world we wouldn't need the second part of this condition. However,
@@ -764,6 +768,7 @@ async def create_and_send_nonmember_event(
764768
self,
765769
requester: Requester,
766770
event_dict: dict,
771+
allow_no_prev_events: bool = False,
767772
prev_event_ids: Optional[List[str]] = None,
768773
auth_event_ids: Optional[List[str]] = None,
769774
ratelimit: bool = True,
@@ -781,6 +786,10 @@ async def create_and_send_nonmember_event(
781786
Args:
782787
requester: The requester sending the event.
783788
event_dict: An entire event.
789+
allow_no_prev_events: Whether to allow this event to be created an empty
790+
list of prev_events. Normally this is prohibited just because most
791+
events should have a prev_event and we should only use this in special
792+
cases like MSC2716.
784793
prev_event_ids:
785794
The event IDs to use as the prev events.
786795
Should normally be left as None to automatically request them
@@ -880,16 +889,20 @@ async def create_new_client_event(
880889
self,
881890
builder: EventBuilder,
882891
requester: Optional[Requester] = None,
892+
allow_no_prev_events: bool = False,
883893
prev_event_ids: Optional[List[str]] = None,
884894
auth_event_ids: Optional[List[str]] = None,
885895
depth: Optional[int] = None,
886-
allow_no_prev_events: bool = False,
887896
) -> Tuple[EventBase, EventContext]:
888897
"""Create a new event for a local client
889898
890899
Args:
891900
builder:
892901
requester:
902+
allow_no_prev_events: Whether to allow this event to be created an empty
903+
list of prev_events. Normally this is prohibited just because most
904+
events should have a prev_event and we should only use this in special
905+
cases like MSC2716.
893906
prev_event_ids:
894907
the forward extremities to use as the prev_events for the
895908
new event.
@@ -908,7 +921,6 @@ async def create_new_client_event(
908921
Returns:
909922
Tuple of created event, context
910923
"""
911-
912924
# Strip down the auth_event_ids to only what we need to auth the event.
913925
# For example, we don't need extra m.room.member that don't match event.sender
914926
full_state_ids_at_event = None

synapse/handlers/room_batch.py

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,6 @@
1313
logger = logging.getLogger(__name__)
1414

1515

16-
def generate_fake_event_id() -> str:
17-
return "$fake_" + random_string(43)
18-
19-
2016
class RoomBatchHandler:
2117
def __init__(self, hs: "HomeServer"):
2218
self.hs = hs
@@ -182,11 +178,12 @@ async def persist_state_events_at_start(
182178
state_event_ids_at_start = []
183179
auth_event_ids = initial_auth_event_ids.copy()
184180

185-
# Make the state events float off on their own so we don't have a
186-
# bunch of `@mxid joined the room` noise between each batch
187-
prev_event_id_for_state_chain = generate_fake_event_id()
181+
# Make the state events float off on their own by specifying no
182+
# prev_events for the first one in the chain so we don't have a bunch of
183+
# `@mxid joined the room` noise between each batch.
184+
prev_event_ids_for_state_chain: List[str] = []
188185

189-
for state_event in state_events_at_start:
186+
for index, state_event in enumerate(state_events_at_start):
190187
assert_params_in_dict(
191188
state_event, ["type", "origin_server_ts", "content", "sender"]
192189
)
@@ -222,7 +219,10 @@ async def persist_state_events_at_start(
222219
content=event_dict["content"],
223220
outlier=True,
224221
historical=True,
225-
prev_event_ids=[prev_event_id_for_state_chain],
222+
# Only the first event in the chain should be floating.
223+
# The rest should hang off each other in a chain.
224+
allow_no_prev_events=index == 0,
225+
prev_event_ids=prev_event_ids_for_state_chain,
226226
# Make sure to use a copy of this list because we modify it
227227
# later in the loop here. Otherwise it will be the same
228228
# reference and also update in the event when we append later.
@@ -242,7 +242,10 @@ async def persist_state_events_at_start(
242242
event_dict,
243243
outlier=True,
244244
historical=True,
245-
prev_event_ids=[prev_event_id_for_state_chain],
245+
# Only the first event in the chain should be floating.
246+
# The rest should hang off each other in a chain.
247+
allow_no_prev_events=index == 0,
248+
prev_event_ids=prev_event_ids_for_state_chain,
246249
# Make sure to use a copy of this list because we modify it
247250
# later in the loop here. Otherwise it will be the same
248251
# reference and also update in the event when we append later.
@@ -253,15 +256,14 @@ async def persist_state_events_at_start(
253256
state_event_ids_at_start.append(event_id)
254257
auth_event_ids.append(event_id)
255258
# Connect all the state in a floating chain
256-
prev_event_id_for_state_chain = event_id
259+
prev_event_ids_for_state_chain = [event_id]
257260

258261
return state_event_ids_at_start
259262

260263
async def persist_historical_events(
261264
self,
262265
events_to_create: List[JsonDict],
263266
room_id: str,
264-
initial_prev_event_ids: List[str],
265267
inherited_depth: int,
266268
auth_event_ids: List[str],
267269
app_service_requester: Requester,
@@ -277,9 +279,6 @@ async def persist_historical_events(
277279
events_to_create: List of historical events to create in JSON
278280
dictionary format.
279281
room_id: Room where you want the events persisted in.
280-
initial_prev_event_ids: These will be the prev_events for the first
281-
event created. Each event created afterwards will point to the
282-
previous event created.
283282
inherited_depth: The depth to create the events at (you will
284283
probably by calling inherit_depth_from_prev_ids(...)).
285284
auth_event_ids: Define which events allow you to create the given
@@ -291,11 +290,14 @@ async def persist_historical_events(
291290
"""
292291
assert app_service_requester.app_service
293292

294-
prev_event_ids = initial_prev_event_ids.copy()
293+
# Make the historical event chain float off on its own by specifying no
294+
# prev_events for the first event in the chain which causes the HS to
295+
# ask for the state at the start of the batch later.
296+
prev_event_ids: List[str] = []
295297

296298
event_ids = []
297299
events_to_persist = []
298-
for ev in events_to_create:
300+
for index, ev in enumerate(events_to_create):
299301
assert_params_in_dict(ev, ["type", "origin_server_ts", "content", "sender"])
300302

301303
assert self.hs.is_mine_id(ev["sender"]), "User must be our own: %s" % (
@@ -319,6 +321,9 @@ async def persist_historical_events(
319321
ev["sender"], app_service_requester.app_service
320322
),
321323
event_dict,
324+
# Only the first event in the chain should be floating.
325+
# The rest should hang off each other in a chain.
326+
allow_no_prev_events=index == 0,
322327
prev_event_ids=event_dict.get("prev_events"),
323328
auth_event_ids=auth_event_ids,
324329
historical=True,
@@ -370,7 +375,6 @@ async def handle_batch_of_events(
370375
events_to_create: List[JsonDict],
371376
room_id: str,
372377
batch_id_to_connect_to: str,
373-
initial_prev_event_ids: List[str],
374378
inherited_depth: int,
375379
auth_event_ids: List[str],
376380
app_service_requester: Requester,
@@ -385,9 +389,6 @@ async def handle_batch_of_events(
385389
room_id: Room where you want the events created in.
386390
batch_id_to_connect_to: The batch_id from the insertion event you
387391
want this batch to connect to.
388-
initial_prev_event_ids: These will be the prev_events for the first
389-
event created. Each event created afterwards will point to the
390-
previous event created.
391392
inherited_depth: The depth to create the events at (you will
392393
probably by calling inherit_depth_from_prev_ids(...)).
393394
auth_event_ids: Define which events allow you to create the given
@@ -436,7 +437,6 @@ async def handle_batch_of_events(
436437
event_ids = await self.persist_historical_events(
437438
events_to_create=events_to_create,
438439
room_id=room_id,
439-
initial_prev_event_ids=initial_prev_event_ids,
440440
inherited_depth=inherited_depth,
441441
auth_event_ids=auth_event_ids,
442442
app_service_requester=app_service_requester,

0 commit comments

Comments
 (0)