Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions proposals/4282-interactive-room-message.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Client
  • Server

Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# MSC4282: Hint that a /rooms/{room_id}/messages request is interactive

The endpoint [/rooms/{room_id}/messages](https://spec.matrix.org/latest/client-server-api/#get_matrixclientv3roomsroomidmessages)
is used by clients to retrieve older events from a homeserver, when the direction is set to
backwards (a phenomenon also called "back-pagination" throughout this MSC). This can be useful in a
few contexts:

- after a gappy sync (i.e. that set the `limited` flag), so as to retrieve events included in the
gap, that is, all the events not included in the last sync response, and that have been sent to the
homeserver after the last time we've sync'd. This applies both to sync v2 and simplified sliding
sync.
- as an out-of-sync mechanism to go through all the events in a room from the end to the start, so
as to apply some mass operation on them, like indexing them for a search engine.

In fact, this mechanism is crucial in the context of [simplified sliding sync](https://github.com/matrix-org/matrix-spec-proposals/pull/4186).
This sync mechanism indeed generates thin server responses including a minimal set of events
(controlled by the `timeline_limit` request parameter), so as to provide better initial sync times
and ultimately more responsive clients. The client is then expected to use the
`/rooms/{room_id}/messages` endpoint to retrieve the previous events of a room.

As a result, clients should be able to expect this endpoint to be *fast*, when the user session is
interactive (i.e. a user is waiting for these events to be retrieved). While it's hard to define
*how* fast, it's expected that this endpoint would return in a matter of seconds, in the worst
cases. Otherwise, the user experience on the clients may be severely degraded.

However, some server implementations, including
[Synapse](https://github.com/element-hq/synapse/blob/5c84f258095535aaa2a4a04c850f439fd00735cc/synapse/handlers/pagination.py#L575-L584),
[Conduit](https://gitlab.com/famedly/conduit/-/blob/a7e6f60b41122761422df2b7bcc0c192416f9a28/src/api/client_server/message.rs#L201)
and
[Conduwuit](https://github.com/girlbossceo/conduwuit/blob/0f81c1e1ccdcb0c5c6d5a27e82f16eb37b1e61c8/src/api/client/message.rs#L94-L101),
may generate, under some implementation-specific conditions, federation requests to
[backfill](https://spec.matrix.org/v1.14/server-server-api/#backfilling-and-retrieving-missing-events)
the room timeline, and fetch more events from other servers. This slows down reception of the
response in the client, since it now be blocking on the server waiting for the federation responses
to come. Moreover, the time spent retrieving those responses is theoretically unbounded, so the
homeserver and the clients may have to wait forever for such requests to complete.

We need a more responsive way to fetch older events from the server, without having to wait for
federation responses to come back. This is the point of this MSC.

## Proposal

It is proposed that the `/rooms/{room_id}/messages` endpoint be modified to allow clients to
specify a new boolean query parameter `interactive`, which indicates that the client is interested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to define the unstable prefix to use by implementations while this MSC has not been accepted. It's typically done in a section at the very end of the proposal.

in getting the response *quickly*.

If the parameter is missing, then it's considered to be `false` by default. Thus, this is not a
semantics breaking change, in that the server behavior will remain the same if the query parameter
hasn't been set.

When the query parameter is set to `true`, then the server is expected to do a best-effort attempt
at providing a response *in a reasonably short time*. Implementations may use one of the following
strategies to achieve this:

- avoid blocking on a backfill request to other homeservers, by not starting such requests at all,
or by starting them in the background in a non-blocking way.
- start the backfill request, and race between waiting for its completion and timing out after a
short amount of time. This can be a nice tradeoff in case backfill requests resolve quickly.
- not do anything differently. This doesn't solve the problem, but the query parameter really is a
hint that the response is expected to come in quickly, not a strong requirement.
- do something completely different, not mentioned in this MSC, that achieves the same goal.

## Potential issues

Before, it was possible that clients would miss events in a room, because they back-paginated
through it using `/messages`, and the server received new events after a netsplit, at a position that
the client had already paginated through. This would result in the client not receiving those
events, or receiving them through sync but in a non-topological ordering (i.e. an ordering that
would be different that the one they would've observed by paginating with `/messages`).

This MSC doesn't resolve this problem, and it may make it more apparent on the contrary, if *all*
`/messages` requests end up *not* causing any federation backfill. The most likely consequence of
this is that events might be more frequently misordered across clients.

## Alternatives

Instead of an additional query parameter, this MSC could mandate that this becomes the expected
behavior of all the implementations. This would be an implicit breaking change, and it may inhibit
use cases where clients might prefer a perfectly backfilled room over a quick response time.
Comment on lines +90 to +92
Copy link
Contributor

@MadLittleMods MadLittleMods Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related MSC for indicating gaps in the timeline: MSC3871

If we indicated gaps in the /messages, we could respond quickly always as a default and clients can handle displaying and filling in the gaps as necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was explored further during the recent Element Hackathon ("How Hard Could It Be?") with an initial implementation in Synapse -> element-hq/synapse#18873

For the hack day, I'll be tackling the backfill problem with this strategy:

  • Default to fast responses with gaps: As a default, we can always respond quickly and indicate gaps (MSC3871) for clients to paginate at their leisure.
  • Fast back-pagination: Clients back-paginate with /messages?dir=b&backfill=false, and Synapse skips backfilling entirely, returning only local history with gaps as necessary.
  • Explicit gap filling: To fill in gaps, clients use /messages?dir=b&backfill=true which works just like today to do a best effort backfill.

This allows the client to back-paginate the history we already have without delay. And can fill in the gaps as they see fit.

Gaps can be represented in the UI with a message like "We failed to get some messages in this gap, try again 🗘.", giving users clear feedback. Regardless of clients trying to fill in gaps automatically, I would still suggest to display gaps so people can tell what's happening.

This is basically a simplified version of MSC4282 leveraging MSC3871: Gappy timelines to get proper client feedback to indicate where the gaps are so we can skip backfill without worrying. For reference, skipping backfill without letting clients know where the gaps are just means they won't ever know that they are missing messages.

-- @MadLittleMods, https://github.com/element-hq/how-hard-can-it-be-2025/issues/47#issuecomment-3234339497


Since this problem is more frequent with simplified sliding sync, one could imagine that a client
would find a simplified-sliding-sync specific solution. For instance, it could increase the
`timeline_limit` window to get more and more events from the end of the room, up to the previous
latest event they knew about, and thus *not* cause backfill requests. This is a workaround that
would work, but not be optimal in terms of bandwidth and server CPU activity, as it would mean
including lots of events the client has already seen before (viz., the increasing tail of the
room's timeline).

We could also have a new separate paginated endpoint to retrieve the previous events in the *sync*
ordering, thus not causing any backfill requests. It would be strictly more work to implement, and
it is unclear that it would achieve more than the current proposal.