Don't waste time processing backfill events that we never show to the client

*As discovered while working on https://github.com/matrix-org/synapse/pull/15617 and a natural extension of https://github.com/matrix-org/synapse/pull/15585 to process previously failed backfill events in the background (follow-up to https://github.com/matrix-org/synapse/issues/13623),*

It turns out, when we make a `/messages` request and trigger a backfill, Synapse will happily waste time trying hard to process events ([`_process_pulled_events`](https://github.com/matrix-org/synapse/blob/1e89976b268c296e1fd8fface36ade29c0354254/synapse/handlers/federation_event.py#L819-L898)) in the foreground that we end up never showing to clients. The one particularly bad culprit is the `org.matrix.dummy_event` which Synapse automatically puts in the DAG to resolve outstanding forward extremities. Since it has 10x random `prev_events`, it takes a while to fetch the state for each `prev_event` and persist. We determine whether some events should be shown to clients in the `_check_filter_send_to_client(...)` function

https://github.com/matrix-org/synapse/blob/ad50510a06d035a674f0eeed5db5dd3060bc0b1c/synapse/visibility.py#L482-L519

`org.matrix.dummy_event` can be particularly bad because they usually include a lot of disparate `prev_events` which take a while for us to work out the state for (long [`_get_state_groups_from_groups`](https://github.com/matrix-org/synapse/blob/41b9def9f2c02118796e147f63abf23bc2d7dc04/synapse/storage/databases/state/bg_updates.py#L86-L250) times).


### Potential solution

We should use this same `_check_filter_send_to_client(...)` criteria when determining whether we should backfill a given event in the background or foreground.

We should be mindful an event that is filtered from the client, might still be used as a `prev_event` from another event in the backfill response that we will show to clients. And if we kick off the function to process `org.matrix.dummy_event` in the background, while the other event in the foreground, we will probably end up duplicating the work depending on the race.

We don't really have these same race -> duplicate work concerns with https://github.com/matrix-org/synapse/pull/15585 (events which previously failed to backfill) because it's a "fool me once" sort of situation and they are already potentially disconnected from what we were trying to backfill anyway given they failed before.

---

Also of interest, `filter_events_for_server(...)`

	def _check_filter_send_to_client(
	event: EventBase,
	clock: Clock,
	retention_policy: RetentionPolicy,
	sender_ignored: bool,
	) -> _CheckFilter:
	"""Apply checks for sending events to client

	Returns:
	True if might be allowed to be sent to clients, False if definitely not.
	"""

	if event.type == EventTypes.Dummy:
	return _CheckFilter.DENIED

	if not event.is_state() and sender_ignored:
	return _CheckFilter.DENIED

	# Until MSC2261 has landed we can't redact malicious alias events, so for
	# now we temporarily filter out m.room.aliases entirely to mitigate
	# abuse, while we spec a better solution to advertising aliases
	# on rooms.
	if event.type == EventTypes.Aliases:
	return _CheckFilter.DENIED

	# Don't try to apply the room's retention policy if the event is a state
	# event, as MSC1763 states that retention is only considered for non-state
	# events.
	if not event.is_state():
	max_lifetime = retention_policy.max_lifetime

	if max_lifetime is not None:
	oldest_allowed_ts = clock.time_msec() - max_lifetime

	if event.origin_server_ts < oldest_allowed_ts:
	return _CheckFilter.DENIED

	return _CheckFilter.MAYBE_ALLOWED

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Don't waste time processing backfill events that we never show to the client #15632

Potential solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Don't waste time processing backfill events that we never show to the client #15632

Description

Potential solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions