- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.1k
Don't waste time processing backfill events that we never show to the client #15632
Description
As discovered while working on #15617 and a natural extension of #15585 to process previously failed backfill events in the background (follow-up to #13623),
It turns out, when we make a /messages request and trigger a backfill, Synapse will happily waste time trying hard to process events (_process_pulled_events) in the foreground that we end up never showing to clients. The one particularly bad culprit is the org.matrix.dummy_event which Synapse automatically puts in the DAG to resolve outstanding forward extremities. Since it has 10x random prev_events, it takes a while to fetch the state for each prev_event and persist. We determine whether some events should be shown to clients in the _check_filter_send_to_client(...) function
Lines 482 to 519 in ad50510
| def _check_filter_send_to_client( | |
| event: EventBase, | |
| clock: Clock, | |
| retention_policy: RetentionPolicy, | |
| sender_ignored: bool, | |
| ) -> _CheckFilter: | |
| """Apply checks for sending events to client | |
| Returns: | |
| True if might be allowed to be sent to clients, False if definitely not. | |
| """ | |
| if event.type == EventTypes.Dummy: | |
| return _CheckFilter.DENIED | |
| if not event.is_state() and sender_ignored: | |
| return _CheckFilter.DENIED | |
| # Until MSC2261 has landed we can't redact malicious alias events, so for | |
| # now we temporarily filter out m.room.aliases entirely to mitigate | |
| # abuse, while we spec a better solution to advertising aliases | |
| # on rooms. | |
| if event.type == EventTypes.Aliases: | |
| return _CheckFilter.DENIED | |
| # Don't try to apply the room's retention policy if the event is a state | |
| # event, as MSC1763 states that retention is only considered for non-state | |
| # events. | |
| if not event.is_state(): | |
| max_lifetime = retention_policy.max_lifetime | |
| if max_lifetime is not None: | |
| oldest_allowed_ts = clock.time_msec() - max_lifetime | |
| if event.origin_server_ts < oldest_allowed_ts: | |
| return _CheckFilter.DENIED | |
| return _CheckFilter.MAYBE_ALLOWED | 
org.matrix.dummy_event can be particularly bad because they usually include a lot of disparate prev_events which take a while for us to work out the state for (long _get_state_groups_from_groups times).
Potential solution
We should use this same _check_filter_send_to_client(...) criteria when determining whether we should backfill a given event in the background or foreground.
We should be mindful an event that is filtered from the client, might still be used as a prev_event from another event in the backfill response that we will show to clients. And if we kick off the function to process org.matrix.dummy_event in the background, while the other event in the foreground, we will probably end up duplicating the work depending on the race.
We don't really have these same race -> duplicate work concerns with #15585 (events which previously failed to backfill) because it's a "fool me once" sort of situation and they are already potentially disconnected from what we were trying to backfill anyway given they failed before.
Also of interest, filter_events_for_server(...)