Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/18540.feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is the right architecture for dealing with such redactions. Pulling out all events sent by the user has a few potential issues: a) if there are a lot of events it might take a lot of time, and b) won't redact events that we haven't backfilled over federation yet. The MSC also seems to suggest that events should get unredacted if the ban is replaced with another state event without a redactions?

An alternate method may be to have a separate table that is room_redactions(room_id, sender), which we then check (as well as redactions table) when fetching the event? The new room_redactions would be updated when we add the ban/kick to the current_state_events table, and removed if it gets replaced?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSC also seems to suggest that events should get unredacted if the ban is replaced with another state event without a redactions?

They shouldn't get unredacted, but the auto-redaction should cease for newly received events. I'll try to clarify this in the MSC itself.

An example series of events:

  1. Alice sends a message
  2. Bob bans Alice, with redact_events: true
  3. [Servers and clients redact Alice's message in step 1]
  4. Due to propagation delays, Alice's second message arrives after the ban. The server (and client, if it received it) redacts that message too.
  5. Later, Bob unbans Alice, setting redact_events: false
  6. Again due to propagation delays, or because Alice rejoined, Alice's third message arrives. It is not redacted.

The messages in steps 1 and 4 remain redacted even after step 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikjohnston Looking at it I think a room_redactions table might work (with an added end_ts column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc), but would still need to pull all of the user's event ids (but not full events) out of the db because they are needed to invalidate the cache.

I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table - right now the PR does both so it is an obvious improvement but is it enough or does another approach need to be found?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with an added end_ts column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc

Wouldn't removing the row from the table be enough to stop redacting new events?

I don't have a sense of the time difference to pull event ids out of the database vs pulling the full events so it is unclear to me whether having to still pull the ids will negate the benefit of the new table

Pulling out only event IDs will certainly be much less data than full events, which should contribute directly to time taken executing the query. By how much depends on how large a user's events typically are. But you're right in that we'll still need to do the hard work of querying for every event a user has sent in a room.


The overview of this PR is:

The redactions table is updated to change the unique constraint from event_id to (event_id, redacts). This is so one event (a kick/ban membership) can redact multiple other events.

When a new event comes in over federation, add redacted_because to its unsigned and add a redaction to the local DB, then invalidate the event cache for that event.

/ban and /kick are updated with a org.matrix.msc4293.redact_events JSON body parameter. If provided, that field is added to the content of the ban/kick membership event.

When any event is persisted (local or over federation), all event IDs that a user has sent in a room are pulled out and entries in redactions are created for them. Each event has redacted_because added to it. The get_event cache is invalidated for each of these events.


What's more concerning to me is that the query to lookup all events a user has sent in a room happens is blocking during processing of a membership event. Perhaps that should be moved to a background task?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but would still need to pull all of the user's event ids (but not full events) out of the db because they are needed to invalidate the cache.

I think you can invalidate those caches by room as well as by event ID?

Looking at it I think a room_redactions table might work (with an added end_ts column or something to indicate the end of a redaction application in the case that the ban is rescinded, etc)

We would probably want to do this by some sort of stream ordering or whatever. We might do this as a two step:

  1. On receipt of ban, add room ID / target to table and record the range of stream orderings that should be redacted.
  2. On receipt of new events (e.g. via backfill), check if they match the ban and if so record them as redacted.

or something

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add support for [MSC4293](https://github.com/matrix-org/matrix-spec-proposals/pull/4293) - Redact on Kick/Ban.
3 changes: 3 additions & 0 deletions synapse/config/experimental.py
Original file line number Diff line number Diff line change
Expand Up @@ -569,3 +569,6 @@ def read_config(

# MSC4155: Invite filtering
self.msc4155_enabled: bool = experimental.get("msc4155_enabled", False)

# MSC4293: Redact on Kick/Ban
self.msc4293_enabled: bool = experimental.get("msc4293_enabled", False)
32 changes: 20 additions & 12 deletions synapse/rest/client/room.py
Original file line number Diff line number Diff line change
Expand Up @@ -1088,6 +1088,7 @@ def __init__(self, hs: "HomeServer"):
super().__init__(hs)
self.room_member_handler = hs.get_room_member_handler()
self.auth = hs.get_auth()
self.config = hs.config

def register(self, http_server: HttpServer) -> None:
# /rooms/$roomid/[join|invite|leave|ban|unban|kick]
Expand All @@ -1111,12 +1112,12 @@ async def _do(
}:
raise AuthError(403, "Guest access not allowed")

content = parse_json_object_from_request(request, allow_empty_body=True)
request_body = parse_json_object_from_request(request, allow_empty_body=True)

if membership_action == "invite" and all(
key in content for key in ("medium", "address")
key in request_body for key in ("medium", "address")
):
if not all(key in content for key in ("id_server", "id_access_token")):
if not all(key in request_body for key in ("id_server", "id_access_token")):
raise SynapseError(
HTTPStatus.BAD_REQUEST,
"`id_server` and `id_access_token` are required when doing 3pid invite",
Expand All @@ -1127,12 +1128,12 @@ async def _do(
await self.room_member_handler.do_3pid_invite(
room_id,
requester.user,
content["medium"],
content["address"],
content["id_server"],
request_body["medium"],
request_body["address"],
request_body["id_server"],
requester,
txn_id,
content["id_access_token"],
request_body["id_access_token"],
)
except ShadowBanError:
# Pretend the request succeeded.
Expand All @@ -1141,12 +1142,19 @@ async def _do(

target = requester.user
if membership_action in ["invite", "ban", "unban", "kick"]:
assert_params_in_dict(content, ["user_id"])
target = UserID.from_string(content["user_id"])
assert_params_in_dict(request_body, ["user_id"])
target = UserID.from_string(request_body["user_id"])

event_content = None
if "reason" in content:
event_content = {"reason": content["reason"]}
if "reason" in request_body:
event_content = {"reason": request_body["reason"]}
if self.config.experimental.msc4293_enabled:
if "org.matrix.msc4293.redact_events" in request_body:
if event_content is None:
event_content = {}
event_content["org.matrix.msc4293.redact_events"] = request_body[
"org.matrix.msc4293.redact_events"
]

try:
await self.room_member_handler.update_membership(
Expand All @@ -1155,7 +1163,7 @@ async def _do(
room_id=room_id,
action=membership_action,
txn_id=txn_id,
third_party_signed=content.get("third_party_signed", None),
third_party_signed=request_body.get("third_party_signed", None),
content=event_content,
)
except ShadowBanError:
Expand Down
96 changes: 96 additions & 0 deletions synapse/storage/databases/main/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,11 +376,107 @@ async def _persist_events_and_state_updates(

event_counter.labels(event.type, origin_type, origin_entity).inc()

if (
not self.hs.config.experimental.msc4293_enabled
or event.type != EventTypes.Member
or event.state_key is None
):
continue

# check if this is an unban/join that will undo a ban/kick redaction for
# a user in the room
if event.membership in [Membership.LEAVE, Membership.JOIN]:
if (
event.membership == Membership.LEAVE
and event.sender == event.state_key
):
# self-leave, ignore
continue

# if there is an existing ban/leave causing redactions for
# this user/room combination update the entry with the stream
# ordering when the redactions should stop
await self.db_pool.simple_update(
"room_ban_redactions",
{"room_id": event.room_id, "user_id": event.state_key},
{
"redact_end_ordering": event.internal_metadata.stream_ordering
},
desc="room_ban_redactions update redact_end_ordering",
)

# check for msc4293 redact_events flag and apply if found
if event.membership not in [Membership.LEAVE, Membership.BAN]:
continue
redact = event.content.get("org.matrix.msc4293.redact_events", False)
if not redact or not isinstance(redact, bool):
continue
# self-bans currently are not authorized so we don't check for that
# case
if (
event.membership == Membership.BAN
and event.sender == event.state_key
):
continue

# check that sender can redact
redact_allowed = await self._can_sender_redact(event)
if redact_allowed:
await self.db_pool.simple_upsert(
"room_ban_redactions",
{"room_id": event.room_id, "user_id": event.state_key},
{
"redacting_event_id": event.event_id,
"redact_end_ordering": None,
},
{
"room_id": event.room_id,
"user_id": event.state_key,
"redacting_event_id": event.event_id,
"redact_end_ordering": None,
},
)

# normally the cache entry for a redacted event would be invalidated
# by an arriving redaction event, but since we are not creating redaction
# events we invalidate manually
self.store._invalidate_local_get_event_cache_room_id(event.room_id)

self.store._invalidate_async_get_event_cache_room_id(event.room_id)

if new_forward_extremities:
self.store.get_latest_event_ids_in_room.prefill(
(room_id,), frozenset(new_forward_extremities)
)

async def _can_sender_redact(self, event: EventBase) -> bool:
state_filter = StateFilter.from_types([(EventTypes.PowerLevels, "")])
state = await self.store.get_partial_filtered_current_state_ids(
event.room_id, state_filter
)
pl_id = state[(EventTypes.PowerLevels, "")]
pl_event = await self.store.get_event(pl_id)
if pl_event:
sender_level = pl_event.content.get("users", {}).get(event.sender)
if sender_level is None:
sender_level = pl_event.content.get("users_default", 0)

redact_level = pl_event.content.get("redact")
if not redact_level:
redact_level = pl_event.content.get("events_default", 0)

room_redaction_level = pl_event.content.get("events", {}).get(
"m.room.redaction"
)
if room_redaction_level:
if sender_level < room_redaction_level:
return False

if sender_level >= redact_level:
return True

return False

async def _calculate_sliding_sync_table_changes(
self,
room_id: str,
Expand Down
44 changes: 43 additions & 1 deletion synapse/storage/databases/main/events_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# [This file includes modifications made by New Vector Limited]
#
#

import json
import logging
import threading
import weakref
Expand Down Expand Up @@ -961,6 +961,13 @@ def _invalidate_local_get_event_cache_room_id(self, room_id: str) -> None:
self._event_ref.clear()
self._current_event_fetches.clear()

def _invalidate_async_get_event_cache_room_id(self, room_id: str) -> None:
"""
Clears the async get_event cache for a room. Currently a no-op until
an async get_event cache is implemented - see https://github.com/matrix-org/synapse/pull/13242
for preliminary work.
"""

async def _get_events_from_cache(
self, events: Iterable[str], update_metrics: bool = True
) -> Dict[str, EventCacheEntry]:
Expand Down Expand Up @@ -1558,6 +1565,41 @@ def _fetch_event_rows(
if d:
d.redactions.append(redacter)

# check for MSC4932 redactions
to_check = []
events: List[_EventRow] = []
for e in evs:
event = event_dict.get(e)
if not event:
continue
events.append(event)
event_json = json.loads(event.json)
room_id = event_json.get("room_id")
user_id = event_json.get("sender")
to_check.append((room_id, user_id))

# likely that some of these events may be for the same room/user combo, in
# which case we don't need to do redundant queries
to_check_set = set(to_check)
for room_and_user in to_check_set:
room_redactions_sql = "SELECT redacting_event_id, redact_end_ordering FROM room_ban_redactions WHERE room_id = ? and user_id = ?"
txn.execute(room_redactions_sql, room_and_user)

res = txn.fetchone()
# we have a redaction for a room, user_id combo - apply it to matching events
if not res:
continue
for e_row in events:
e_json = json.loads(e_row.json)
room_id = e_json.get("room_id")
user_id = e_json.get("sender")
if room_and_user != (room_id, user_id):
continue
redacting_event_id, redact_end_ordering = res
if redact_end_ordering:
if e_row.stream_ordering > redact_end_ordering:
continue
e_row.redactions.append(redacting_event_id)
return event_dict

def _maybe_redact_event_row(
Expand Down
21 changes: 21 additions & 0 deletions synapse/storage/schema/main/delta/92/08_room_ban_redactions.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
--
-- This file is licensed under the Affero General Public License (AGPL) version 3.
--
-- Copyright (C) 2025 New Vector, Ltd
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU Affero General Public License as
-- published by the Free Software Foundation, either version 3 of the
-- License, or (at your option) any later version.
--
-- See the GNU Affero General Public License for more details:
-- <https://www.gnu.org/licenses/agpl-3.0.html>.

CREATE TABLE room_ban_redactions(
room_id text NOT NULL,
user_id text NOT NULL,
redacting_event_id text NOT NULL,
redact_end_ordering bigint DEFAULT NULL, -- stream ordering after which redactions are not applied
CONSTRAINT room_ban_redaction_uniqueness UNIQUE (room_id, user_id)
);

Loading
Loading