Optimize backfill receiving to have less missing `prev_event` thrashing (scratch) #13864

MadLittleMods · 2022-09-22T04:35:27Z

Addressing:

Optimize synapse/handlers/federation_event.py#L545(backfill receiving) to be more efficient. Calculate the stream_ordering from newest -> oldest (in the correct order) and persist in the oldest -> newest to get the least missing prev_event fetch thrashing

Spawned from Fix historical messages backfilling in random order on remote homeservers (MSC2716) #11114 (comment)

Why aren't we sorting topologically when receiving backfill events? See Fix historical messages backfilling in random order on remote homeservers (MSC2716) #11114 (comment)

-- #10737

So Synapse is fast enough to merge this MSC2716 Complement test for importing many batches, matrix-org/complement#214 (comment)

Complement tests: matrix-org/complement#214

Dev notes

COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestImportHistoricalMessages/parallel/Federation/Backfill_still_works_after_many_contiguous_batches_are_imported -p 1

Why are we seeing some of these historical events being rejected?

I think it's because of changes to the auth-event reconciliation in #12943 (comment) which I was brought in on but didn't realize the magnitude of the change since the MSC2716 tests still passed. Although not sure I realized that it actually removed one the MSC2716 tests from the Synapse code base.

event's auth_events are different to our calculated auth_events. Claimed but not calculated: [<FrozenEventV3 event_id=$iTgahP49zNUQYuUfVnC0bkfNZxX_dJqu4KXUdL_GvBY, type=m.room.power_levels, state_key=, outlier=0>, <FrozenEventV3 event_id=$ycSMCM1-gaeUIFP_oR8FflBu8DSv5hfYBOk2SbHXsik, type=m.room.create, state_key=, outlier=0>, <FrozenEventV3 event_id=$COUGQXCMRhmShggKIuD7E2PiGbOKbMeXylf2RmYByIg, type=m.room.member, state_key=@maria:hs1, outlier=1>]. Calculated but not claimed: []

While checking auth of <FrozenEventV3 event_id=$VM325DZMXTz9mjGGG6pfAIK0R6ubNdmNIf5YIMdWk3o, type=m.room.message, state_key=None, outlier=False> against room state before the event: 403: User @maria:hs1 not in room !vXsKJXhHXQtktxuNfm:hs1 (None)

Why aren't we sorting topologically when receiving backfill events?

See #11114 (comment)

How is `stream_ordering` given out?

stream_id_gen
stream_ordering_manager

Persisting events

See _persist_events_and_state_updates for where we normally assign stream_ordering and continue down to _persist_events_txn.

_process_pulled_events
_process_pulled_event
_process_received_pdu
_run_push_actions_and_persist_event
persist_events_and_notify
persist_events
self._event_persist_queue
_process_event_persist_queue_task
_persist_event_batch
_persist_events_and_state_updates
_persist_events_txn
_store_event_txn

_process_pulled_event
_process_received_pdu
_check_event_auth

Pretty print list

This one is great if you're printing a JSON-like thing:

import json

json.dumps(expected_event_order, indent=4)

This one sucks because it doesn't print the first and last items on indented new lines

import pprint

pprint.pformat(expected_event_order, indent=4)

Random

\\n["|'],

", \n\t

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Pull request includes a sign off
Code style is correct
(run the linters)

Fix #13856 `_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT). Additionally there is bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run. Wrong: ```py self.have_seen_event.invalidate((room_id, event_id)) ``` Correct: ```py self.have_seen_event.invalidate(((room_id, event_id),)) ```

…rder) and persist in the oldest -> newest to get the least missing prev_event fetch thrashing

…not-being-invalidated

Copying what #13796 is doing

@erikjohnston

As mentioned by @erikjohnston, #13865 (comment)

…lidated' into maddlittlemods/msc2716-many-batches-optimization Conflicts: tests/storage/databases/main/test_events_worker.py

…into maddlittlemods/msc2716-many-batches-optimization Conflicts: synapse/handlers/federation.py synapse/storage/databases/main/cache.py synapse/storage/databases/main/event_federation.py

…sertion event rejected

…in so everyhting is valid We are going to lose the benefit of keeping the join noise out of the timeline. And will probably have to hide "historical" state on the client.

MadLittleMods · 2022-09-29T09:16:09Z

tests/handlers/test_federation_event.py

+    def test_process_pulled_events_asdf(self) -> None:
+        main_store = self.hs.get_datastores().main
+        state_storage_controller = self.hs.get_storage_controllers().state
+
+        def _debug_event_string(event: EventBase) -> str:
+            debug_body = event.content.get("body", event.type)
+            maybe_state_key = getattr(event, "state_key", None)
+            return f"event_id={event.event_id},depth={event.depth},body={debug_body}({maybe_state_key}),prevs={event.prev_event_ids()}"
+
+        known_event_dict: Dict[str, Tuple[EventBase, List[EventBase]]] = {}
+
+        def _add_to_known_event_list(
+            event: EventBase, state_events: Optional[List[EventBase]] = None
+        ) -> None:
+            if state_events is None:
+                state_map = self.get_success(
+                    state_storage_controller.get_state_for_event(event.event_id)
+                )
+                state_events = list(state_map.values())
+
+            known_event_dict[event.event_id] = (event, state_events)
+
+        async def get_room_state_ids(
+            destination: str, room_id: str, event_id: str
+        ) -> JsonDict:
+            self.assertEqual(destination, self.OTHER_SERVER_NAME)
+            known_event_info = known_event_dict.get(event_id)
+            if known_event_info is None:
+                self.fail(
+                    f"stubbed get_room_state_ids: Event ({event_id}) not part of our known events list"
+                )
+
+            known_event, known_event_state_list = known_event_info
+            logger.info(
+                "stubbed get_room_state_ids: destination=%s event_id=%s auth_event_ids=%s",
+                destination,
+                event_id,
+                known_event.auth_event_ids(),
+            )
+
+            # self.assertEqual(event_id, missing_event.event_id)
+            return {
+                "pdu_ids": [
+                    state_event.event_id for state_event in known_event_state_list
+                ],
+                "auth_chain_ids": known_event.auth_event_ids(),
+            }
+
+        async def get_room_state(
+            room_version: RoomVersion, destination: str, room_id: str, event_id: str
+        ) -> StateRequestResponse:
+            self.assertEqual(destination, self.OTHER_SERVER_NAME)
+            known_event_info = known_event_dict.get(event_id)
+            if known_event_info is None:
+                self.fail(
+                    f"stubbed get_room_state: Event ({event_id}) not part of our known events list"
+                )
+
+            known_event, known_event_state_list = known_event_info
+            logger.info(
+                "stubbed get_room_state: destination=%s event_id=%s auth_event_ids=%s",
+                destination,
+                event_id,
+                known_event.auth_event_ids(),
+            )
+
+            auth_event_ids = known_event.auth_event_ids()
+            auth_events = []
+            for auth_event_id in auth_event_ids:
+                known_event_info = known_event_dict.get(event_id)
+                if known_event_info is None:
+                    self.fail(
+                        f"stubbed get_room_state: Auth event ({auth_event_id}) is not part of our known events list"
+                    )
+                known_auth_event, _ = known_event_info
+                auth_events.append(known_auth_event)
+
+            return StateRequestResponse(
+                state=known_event_state_list,
+                auth_events=auth_events,
+            )
+
+        async def get_event(destination: str, event_id: str, timeout=None):
+            self.assertEqual(destination, self.OTHER_SERVER_NAME)
+            known_event_info = known_event_dict.get(event_id)
+            if known_event_info is None:
+                self.fail(
+                    f"stubbed get_event: Event ({event_id}) not part of our known events list"
+                )
+
+            known_event, _ = known_event_info
+            return {"pdus": [known_event.get_pdu_json()]}
+
+        self.mock_federation_transport_client.get_room_state_ids.side_effect = (
+            get_room_state_ids
+        )
+        self.mock_federation_transport_client.get_room_state.side_effect = (
+            get_room_state
+        )
+
+        self.mock_federation_transport_client.get_event.side_effect = get_event
+
+        # create the room
+        room_creator = self.appservice.sender
+        room_id = self.helper.create_room_as(
+            room_creator=self.appservice.sender, tok=self.appservice.token
+        )
+        room_version = self.get_success(main_store.get_room_version(room_id))
+
+        event_before = self.get_success(
+            inject_event(
+                self.hs,
+                room_id=room_id,
+                sender=room_creator,
+                type=EventTypes.Message,
+                content={"body": "eventBefore0", "msgtype": "m.text"},
+            )
+        )
+        _add_to_known_event_list(event_before)
+
+        event_after = self.get_success(
+            inject_event(
+                self.hs,
+                room_id=room_id,
+                sender=room_creator,
+                type=EventTypes.Message,
+                content={"body": "eventAfter0", "msgtype": "m.text"},
+            )
+        )
+        _add_to_known_event_list(event_after)
+
+        state_map = self.get_success(
+            state_storage_controller.get_state_for_event(event_before.event_id)
+        )
+
+        room_create_event = state_map.get((EventTypes.Create, ""))
+        pl_event = state_map.get((EventTypes.PowerLevels, ""))
+        as_membership_event = state_map.get((EventTypes.Member, room_creator))
+        assert room_create_event is not None
+        assert pl_event is not None
+        assert as_membership_event is not None
+
+        for state_event in state_map.values():
+            _add_to_known_event_list(state_event)
+
+        # This should be the successor of the event we want to insert next to
+        # (the successor of event_before is event_after).
+        inherited_depth = event_after.depth
+
+        historical_base_auth_event_ids = [
+            room_create_event.event_id,
+            pl_event.event_id,
+        ]
+        historical_state_events = list(state_map.values())
+        historical_state_event_ids = [
+            state_event.event_id for state_event in historical_state_events
+        ]
+
+        maria_mxid = "@maria:test"
+        maria_membership_event, _ = self.get_success(
+            create_event(
+                self.hs,
+                room_id=room_id,
+                sender=maria_mxid,
+                state_key=maria_mxid,
+                type=EventTypes.Member,
+                content={
+                    "membership": "join",
+                },
+                # It all works when I add a prev_event for the floating
+                # insertion event but the event no longer floats.
+                # It's able to resolve state at the prev_events though.
+                prev_event_ids=[event_before.event_id],
+                # allow_no_prev_events=True,
+                # prev_event_ids=[],
+                auth_event_ids=historical_base_auth_event_ids,
+                state_event_ids=historical_state_event_ids,
+                depth=inherited_depth,
+            )
+        )
+        _add_to_known_event_list(maria_membership_event, historical_state_events)
+
+        historical_state_events.append(maria_membership_event)
+        historical_state_event_ids.append(maria_membership_event.event_id)
+
+        batch_id = random_string(8)
+        next_batch_id = random_string(8)
+        insertion_event, _ = self.get_success(
+            create_event(
+                self.hs,
+                room_id=room_id,
+                sender=room_creator,
+                type=EventTypes.MSC2716_INSERTION,
+                content={
+                    EventContentFields.MSC2716_NEXT_BATCH_ID: next_batch_id,
+                    EventContentFields.MSC2716_HISTORICAL: True,
+                },
+                # The difference from the actual room /batch_send is that this is normally
+                # floating as well. But seems to work once we connect it to the
+                # floating historical state chain.
+                prev_event_ids=[maria_membership_event.event_id],
+                # allow_no_prev_events=True,
+                # prev_event_ids=[],
+                auth_event_ids=[
+                    *historical_base_auth_event_ids,
+                    as_membership_event.event_id,
+                ],
+                state_event_ids=historical_state_event_ids,
+                depth=inherited_depth,
+            )
+        )
+        _add_to_known_event_list(insertion_event, historical_state_events)
+        historical_message_event, _ = self.get_success(
+            create_event(
+                self.hs,
+                room_id=room_id,
+                sender=maria_mxid,
+                type=EventTypes.Message,
+                content={"body": "Historical message", "msgtype": "m.text"},
+                prev_event_ids=[insertion_event.event_id],
+                auth_event_ids=[
+                    *historical_base_auth_event_ids,
+                    maria_membership_event.event_id,
+                ],
+                depth=inherited_depth,
+            )
+        )
+        _add_to_known_event_list(historical_message_event, historical_state_events)
+        batch_event, _ = self.get_success(
+            create_event(
+                self.hs,
+                room_id=room_id,
+                sender=room_creator,
+                type=EventTypes.MSC2716_BATCH,
+                content={
+                    EventContentFields.MSC2716_BATCH_ID: batch_id,
+                    EventContentFields.MSC2716_HISTORICAL: True,
+                },
+                prev_event_ids=[historical_message_event.event_id],
+                auth_event_ids=[
+                    *historical_base_auth_event_ids,
+                    as_membership_event.event_id,
+                ],
+                depth=inherited_depth,
+            )
+        )
+        _add_to_known_event_list(batch_event, historical_state_events)
+        base_insertion_event, base_insertion_event_context = self.get_success(
+            create_event(
+                self.hs,
+                room_id=room_id,
+                sender=room_creator,
+                type=EventTypes.MSC2716_INSERTION,
+                content={
+                    EventContentFields.MSC2716_NEXT_BATCH_ID: batch_id,
+                    EventContentFields.MSC2716_HISTORICAL: True,
+                },
+                prev_event_ids=[event_before.event_id],
+                auth_event_ids=[
+                    *historical_base_auth_event_ids,
+                    as_membership_event.event_id,
+                ],
+                state_event_ids=historical_state_event_ids,
+                depth=inherited_depth,
+            )
+        )
+        _add_to_known_event_list(base_insertion_event, historical_state_events)
+
+        # Chronological
+        pulled_events: List[EventBase] = [
+            # Beginning of room (oldest messages)
+            # *list(state_map.values()),
+            room_create_event,
+            pl_event,
+            as_membership_event,
+            state_map.get((EventTypes.JoinRules, "")),
+            state_map.get((EventTypes.RoomHistoryVisibility, "")),
+            event_before,
+            # HISTORICAL MESSAGE END
+            insertion_event,
+            historical_message_event,
+            batch_event,
+            base_insertion_event,
+            # HISTORICAL MESSAGE START
+            event_after,
+            # Latest in the room (newest messages)
+        ]
+
+        # pulled_events: List[EventBase] = [
+        #     # Beginning of room (oldest messages)
+        #     # *list(state_map.values()),
+        #     room_create_event,
+        #     pl_event,
+        #     as_membership_event,
+        #     state_map.get((EventTypes.JoinRules, "")),
+        #     state_map.get((EventTypes.RoomHistoryVisibility, "")),
+        #     event_before,
+        #     # HISTORICAL MESSAGE END
+        #     insertion_event,
+        #     historical_message_event,
+        #     batch_event,
+        #     base_insertion_event,
+        #     # HISTORICAL MESSAGE START
+        #     event_after,
+        #     # Latest in the room (newest messages)
+        # ]
+
+        # The order that we get after passing reverse chronological events in
+        # that mostly passes. Only the insertion event is rejected but the
+        # historical messages appear /messages scrollback.
+        # pulled_events: List[EventBase] = [
+        #     # Beginning of room (oldest messages)
+        #     # *list(state_map.values()),
+        #     room_create_event,
+        #     pl_event,
+        #     as_membership_event,
+        #     state_map.get((EventTypes.JoinRules, "")),
+        #     state_map.get((EventTypes.RoomHistoryVisibility, "")),
+        #     event_before,
+        #     event_after,
+        #     base_insertion_event,
+        #     batch_event,
+        #     historical_message_event,
+        #     insertion_event,
+        #     # Latest in the room (newest messages)
+        # ]
+
+        import logging
+
+        logger = logging.getLogger(__name__)
+        logger.info(
+            "pulled_events=%s",
+            json.dumps(
+                [_debug_event_string(event) for event in pulled_events],
+                indent=4,
+            ),
+        )
+
+        for event, _ in known_event_dict.values():
+            if event.internal_metadata.outlier:
+                self.fail("Our pristine events should not be marked as an outlier")
+
+        self.get_success(
+            self.hs.get_federation_event_handler()._process_pulled_events(
+                self.OTHER_SERVER_NAME,
+                [
+                    # Make copies of events since Synapse modifies the
+                    # internal_metadata in place and we want to keep our
+                    # pristine copies
+                    make_event_from_dict(pulled_event.get_pdu_json(), room_version)
+                    for pulled_event in pulled_events
+                ],
+                backfilled=True,
+            )
+        )
+
+        from_token = self.get_success(
+            self.hs.get_event_sources().get_current_token_for_pagination(room_id)
+        )
+        actual_events_in_room_reverse_chronological, _ = self.get_success(
+            main_store.paginate_room_events(
+                room_id, from_key=from_token.room_key, limit=100, direction="b"
+            )
+        )
+
+        # We have to reverse the list to make it chronological.
+        actual_events_in_room_chronological = list(
+            reversed(actual_events_in_room_reverse_chronological)
+        )
+
+        expected_event_order = [
+            # Beginning of room (oldest messages)
+            # *list(state_map.values()),
+            room_create_event,
+            as_membership_event,
+            pl_event,
+            state_map.get((EventTypes.JoinRules, "")),
+            state_map.get((EventTypes.RoomHistoryVisibility, "")),
+            event_before,
+            # HISTORICAL MESSAGE END
+            insertion_event,
+            historical_message_event,
+            batch_event,
+            base_insertion_event,
+            # HISTORICAL MESSAGE START
+            event_after,
+            # Latest in the room (newest messages)
+        ]
+
+        event_id_diff = {event.event_id for event in expected_event_order} - {
+            event.event_id for event in actual_events_in_room_chronological
+        }
+        event_diff_ordered = [
+            event for event in expected_event_order if event.event_id in event_id_diff
+        ]
+        event_id_extra = {
+            event.event_id for event in actual_events_in_room_chronological
+        } - {event.event_id for event in expected_event_order}
+        event_extra_ordered = [
+            event
+            for event in actual_events_in_room_chronological
+            if event.event_id in event_id_extra
+        ]
+        assertion_message = (
+            "Debug info:\nActual events missing from expected list: %s\nActual events contain %d additional events compared to expected: %s\nExpected event order: %s\nActual event order: %s"
+            % (
+                json.dumps(
+                    [_debug_event_string(event) for event in event_diff_ordered],
+                    indent=4,
+                ),
+                len(event_extra_ordered),
+                json.dumps(
+                    [_debug_event_string(event) for event in event_extra_ordered],
+                    indent=4,
+                ),
+                json.dumps(
+                    [_debug_event_string(event) for event in expected_event_order],
+                    indent=4,
+                ),
+                json.dumps(
+                    [
+                        _debug_event_string(event)
+                        for event in actual_events_in_room_chronological
+                    ],
+                    indent=4,
+                ),
+            )
+        )
+
+        # assert (
+        #     actual_events_in_room_chronological == expected_event_order
+        # ), assertion_message
+
+        self.assertEqual(
+            [event.event_id for event in actual_events_in_room_chronological],
+            [event.event_id for event in expected_event_order],
+            assertion_message,
+        )


This is a nice test for figuring out the mess with historical events from MSC2716 being rejected.

It eliminates all the federation variables when trying to do the same thing in Complement. And is so much faster to iterate on. Seconds vs minutes.

MadLittleMods · 2022-09-29T09:20:47Z

tests/handlers/test_federation_event.py

+                # It all works when I add a prev_event for the floating
+                # insertion event but the event no longer floats.
+                # It's able to resolve state at the prev_events though.
+                prev_event_ids=[event_before.event_id],


This can work without connecting it to event_before but then we rely on maria_membership_event not being gossiped about during backfill because if it is, then it will be rejected and we can't use a rejected event to auth the following historical events.

_process_pulled_event _process_pulled_event _compute_event_context_with_maybe_missing_prevs compute_event_context

The reason it works when it's not gossiped about is that _compute_event_context_with_maybe_missing_prevs fills in the state_ids_before_event and resolves the state magically for us without rejecting.

synapse/synapse/handlers/federation_event.py

Lines 971 to 1037 in 5f659d4

logger.info(

"Event %s is missing prev_events %s: calculating state for a "

"backwards extremity",

event_id,

shortstr(missing_prevs),

)

# Calculate the state after each of the previous events, and

# resolve them to find the correct state at the current event.

try:

# Determine whether we may be about to retrieve partial state

# Events may be un-partial stated right after we compute the partial state

# flag, but that's okay, as long as the flag errs on the conservative side.

partial_state_flags = await self._store.get_partial_state_events(seen)

partial_state = any(partial_state_flags.values())

# Get the state of the events we know about

ours = await self._state_storage_controller.get_state_groups_ids(

room_id, seen, await_full_state=False

)

# state_maps is a list of mappings from (type, state_key) to event_id

state_maps: List[StateMap[str]] = list(ours.values())

# we don't need this any more, let's delete it.

del ours

# Ask the remote server for the states we don't

# know about

for p in missing_prevs:

logger.info("Requesting state after missing prev_event %s", p)

with nested_logging_context(p):

# note that if any of the missing prevs share missing state or

# auth events, the requests to fetch those events are deduped

# by the get_pdu_cache in federation_client.

remote_state_map = (

await self._get_state_ids_after_missing_prev_event(

dest, room_id, p

)

)

state_maps.append(remote_state_map)

room_version = await self._store.get_room_version_id(room_id)

state_map = await self._state_resolution_handler.resolve_events_with_store(

room_id,

room_version,

state_maps,

event_map={event_id: event},

state_res_store=StateResolutionStore(self._store),

)

except Exception:

logger.warning(

"Error attempting to resolve state at missing prev_events",

exc_info=True,

)

raise FederationError(

"ERROR",

403,

"We can't get valid state history.",

affected=event_id,

)

return await self._state_handler.compute_event_context(

event, state_ids_before_event=state_map, partial_state=partial_state

)

I have a suspicion that we will also stop doing sort of thing since we also removed auth event resolving in #12943

MadLittleMods · 2022-09-29T09:24:31Z

synapse/handlers/room_batch.py

            events_to_create=events_to_create,
            room_id=room_id,
            inherited_depth=inherited_depth,
+            state_chain_event_id_to_connect_to=state_chain_event_id_to_connect_to,


These changes make this go from before -> after where we now connect historical batch to the historical state chain. And connect the state chain to the prev_event so everything has valid prev_events and auth_events to resolve organically from the DAG.

We do lose the benefit of removing the noise @mxid joined the room noise between each batch but we might have to solve this on the client by hiding historical state.

Before

flowchart BT A --- annotation1>"Note: older events are at the top"] subgraph live timeline marker1>m.room.marker] ----> B -----------------> A end subgraph batch0 batch0-batch[[m.room.batch]] --> batch0-2(("2")) --> batch0-1((1)) --> batch0-0((0)) --> batch0-insertion[/m.room.insertion\] end subgraph batch1 batch1-batch[[m.room.batch]] --> batch1-2(("2")) --> batch1-1((1)) --> batch1-0((0)) --> batch1-insertion[/m.room.insertion\] end subgraph batch2 batch2-batch[[m.room.batch]] --> batch2-2(("2")) --> batch2-1((1)) --> batch2-0((0)) --> batch2-insertion[/m.room.insertion\] end batch0-insertion -.-> memberBob0(["m.room.member (bob)"]) --> memberAlice0(["m.room.member (alice)"]) batch1-insertion -.-> memberBob1(["m.room.member (bob)"]) --> memberAlice1(["m.room.member (alice)"]) batch2-insertion -.-> memberBob2(["m.room.member (bob)"]) --> memberAlice2(["m.room.member (alice)"]) marker1 -.-> batch0-insertionBase batch0-insertionBase[/m.room.insertion\] ---------------> A batch0-batch -.-> batch0-insertionBase batch1-batch -.-> batch0-insertion batch2-batch -.-> batch1-insertion %% make the annotation links invisible linkStyle 0 stroke-width:2px,fill:none,stroke:none;

Loading

After

flowchart BT A --- annotation1>"Note: older events are at the top"] subgraph live timeline marker1>m.room.marker] ----> B -----------------> A end subgraph batch0 batch0-batch[[m.room.batch]] --> batch0-2(("2")) --> batch0-1((1)) --> batch0-0((0)) --> batch0-insertion[/m.room.insertion\] end subgraph batch1 batch1-batch[[m.room.batch]] --> batch1-2(("2")) --> batch1-1((1)) --> batch1-0((0)) --> batch1-insertion[/m.room.insertion\] end subgraph batch2 batch2-batch[[m.room.batch]] --> batch2-2(("2")) --> batch2-1((1)) --> batch2-0((0)) --> batch2-insertion[/m.room.insertion\] end batch0-insertion --> memberBob0(["m.room.member (bob)"]) --> memberAlice0(["m.room.member (alice)"]) --> A batch1-insertion --> memberBob1(["m.room.member (bob)"]) --> memberAlice1(["m.room.member (alice)"]) --> A batch2-insertion --> memberBob2(["m.room.member (bob)"]) --> memberAlice2(["m.room.member (alice)"]) --> A marker1 -.-> batch0-insertionBase batch0-insertionBase[/m.room.insertion\] ---------------> A batch0-batch -.-> batch0-insertionBase batch1-batch -.-> batch0-insertion batch2-batch -.-> batch1-insertion %% make the annotation links invisible linkStyle 0 stroke-width:2px,fill:none,stroke:none;

Loading

Split out to #13971

Pulled from scratch changes in, #13864

MadLittleMods added 5 commits September 21, 2022 18:03

Add changelog

f6393db

Assert is not None

f2a5c70

Calculate the stream_ordering from newest -> oldest (in the correct o…

b23b3e4

…rder) and persist in the oldest -> newest to get the least missing prev_event fetch thrashing

MadLittleMods changed the title ~~Maddlittlemods/msc2716 many batches optimization~~ Optimize backfill receiving to have less prev_event thrashing Sep 22, 2022

MadLittleMods changed the title ~~Optimize backfill receiving to have less prev_event thrashing~~ Optimize backfill receiving to have less missing prev_event thrashing Sep 22, 2022

MadLittleMods added 11 commits September 22, 2022 01:03

Try sort backfill points by tie breaking on stream_ordering

a25821d

Scratch try different orders just to see how the tests pass differently

5a5c324

Merge branch 'develop' into madlittlemods/13856-fix-have-seen-events-…

1054f91

…not-being-invalidated

Invalidate cache like #13796

2162ab5

Copying what #13796 is doing

Fix @cachedList on _have_seen_events_dict

0cdc7bf

As mentioned by @erikjohnston, #13865 (comment)

Add test description

5b9b645

Better changelog

9fb750d

Add test to make sure we can actually clear entries just by room_id

4fa8f05

Raise exception so we don't run into this arg mismatch again

b9be6c5

Add test to ensure the safety works

f8dc17b

Merge branch 'madlittlemods/13856-fix-have-seen-events-not-being-inva…

44e9746

…lidated' into maddlittlemods/msc2716-many-batches-optimization Conflicts: tests/storage/databases/main/test_events_worker.py

MadLittleMods mentioned this pull request Sep 23, 2022

Draft: Test that /messages works on remote homeserver and can be backfilled properly after many batches (MSC2716) matrix-org/complement#214

Closed

MadLittleMods added 11 commits September 24, 2022 01:29

Debugging

33d12a5

More debugging

78b4434

Merge branch 'madlittlemods/13356-messages-investigation-scratch-v1' …

85451b9

…into maddlittlemods/msc2716-many-batches-optimization Conflicts: synapse/handlers/federation.py synapse/storage/databases/main/cache.py synapse/storage/databases/main/event_federation.py

Non-working test

31e2c10

Test running but no expected results yet

7a3ded0

Better assertion message

6423938

Use event_id to compare

20f4d1c

Show extra unepexpected events

62f35ea

Align more to Complement test which does pass

4dcb2f6

I think working same as Complement reverse_chronological, only the in…

50b11cb

…sertion event rejected

Try chronolgoical which rejects the historical

cfa5e57

MadLittleMods added 5 commits September 28, 2022 21:27

Simplify case

aaa9679

Simplify case more (no more alice)

1ed0276

Working once you connect the floating insertion event

61c1296

Seems to work with Maria

5faebbd

WIP: Connect state_chain to prev_event and the batch to the state_cha…

8f8c1a0

…in so everyhting is valid We are going to lose the benefit of keeping the join noise out of the timeline. And will probably have to hide "historical" state on the client.

MadLittleMods commented Sep 29, 2022

View reviewed changes

MadLittleMods added a commit that referenced this pull request Sep 30, 2022

Optimize backfill receiving to have less missing prev_event thrashing

12c15b1

Pulled from scratch changes in, #13864

MadLittleMods changed the title ~~Optimize backfill receiving to have less missing prev_event thrashing~~ Optimize backfill receiving to have less missing prev_event thrashing (scratch) Sep 30, 2022

MadLittleMods mentioned this pull request Sep 30, 2022

Optimize backfill receiving to have less missing prev_event thrashing (v2) #13970

Closed

5 tasks

MadLittleMods added a commit that referenced this pull request Sep 30, 2022

Optimize backfill receiving to have less missing prev_event thrashing

68ae0fd

Pulled from scratch changes in, #13864

Explain why auth/state necessary here

3f8fef2

MadLittleMods mentioned this pull request Sep 30, 2022

No more floating MSC2716 historical batches #13971

Closed

5 tasks

MadLittleMods added 3 commits October 12, 2022 19:08

Add stream_ordering to debug string

8a69706

Ordering off because not setting stream_ordering

0b104e3

Log maria membership event

c5b1ce8

This was referenced Oct 17, 2022

Avoid checking the event cache when backfilling events #14164

Merged

Draft: /messages investigation scratch pad1 #13440

Closed

MadLittleMods mentioned this pull request Apr 5, 2023

MSC2716 leftover todo tasks #10737

Closed

27 tasks

MadLittleMods closed this Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize backfill receiving to have less missing `prev_event` thrashing (scratch) #13864

Optimize backfill receiving to have less missing `prev_event` thrashing (scratch) #13864

Uh oh!

MadLittleMods commented Sep 22, 2022 •

edited

Loading

Uh oh!

MadLittleMods Sep 29, 2022 •

edited

Loading

Uh oh!

MadLittleMods Sep 29, 2022

Uh oh!

MadLittleMods Sep 29, 2022 •

edited

Loading

Uh oh!

MadLittleMods Sep 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	logger.info(
	"Event %s is missing prev_events %s: calculating state for a "
	"backwards extremity",
	event_id,
	shortstr(missing_prevs),
	)
	# Calculate the state after each of the previous events, and
	# resolve them to find the correct state at the current event.

	try:
	# Determine whether we may be about to retrieve partial state
	# Events may be un-partial stated right after we compute the partial state
	# flag, but that's okay, as long as the flag errs on the conservative side.
	partial_state_flags = await self._store.get_partial_state_events(seen)
	partial_state = any(partial_state_flags.values())

	# Get the state of the events we know about
	ours = await self._state_storage_controller.get_state_groups_ids(
	room_id, seen, await_full_state=False
	)

	# state_maps is a list of mappings from (type, state_key) to event_id
	state_maps: List[StateMap[str]] = list(ours.values())

	# we don't need this any more, let's delete it.
	del ours

	# Ask the remote server for the states we don't
	# know about
	for p in missing_prevs:
	logger.info("Requesting state after missing prev_event %s", p)

	with nested_logging_context(p):
	# note that if any of the missing prevs share missing state or
	# auth events, the requests to fetch those events are deduped
	# by the get_pdu_cache in federation_client.
	remote_state_map = (
	await self._get_state_ids_after_missing_prev_event(
	dest, room_id, p
	)
	)

	state_maps.append(remote_state_map)

	room_version = await self._store.get_room_version_id(room_id)
	state_map = await self._state_resolution_handler.resolve_events_with_store(
	room_id,
	room_version,
	state_maps,
	event_map={event_id: event},
	state_res_store=StateResolutionStore(self._store),
	)

	except Exception:
	logger.warning(
	"Error attempting to resolve state at missing prev_events",
	exc_info=True,
	)
	raise FederationError(
	"ERROR",
	403,
	"We can't get valid state history.",
	affected=event_id,
	)
	return await self._state_handler.compute_event_context(
	event, state_ids_before_event=state_map, partial_state=partial_state
	)

Uh oh!

Optimize backfill receiving to have less missing prev_event thrashing (scratch) #13864

Optimize backfill receiving to have less missing prev_event thrashing (scratch) #13864

Uh oh!

Conversation

MadLittleMods commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dev notes

Why are we seeing some of these historical events being rejected?

Why aren't we sorting topologically when receiving backfill events?

How is stream_ordering given out?

Persisting events

Pretty print list

Random

Pull Request Checklist

Uh oh!

MadLittleMods Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 29, 2022

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Before

After

Uh oh!

MadLittleMods Sep 30, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize backfill receiving to have less missing `prev_event` thrashing (scratch) #13864

Optimize backfill receiving to have less missing `prev_event` thrashing (scratch) #13864

MadLittleMods commented Sep 22, 2022 •

edited

Loading

How is `stream_ordering` given out?

MadLittleMods Sep 29, 2022 •

edited

Loading

MadLittleMods Sep 29, 2022 •

edited

Loading