Skip to content

Conversation

jevolk
Copy link
Contributor

@jevolk jevolk commented Oct 21, 2025

@jevolk jevolk force-pushed the jevolk/eliminate-transactions branch from 0504cee to 77a12b2 Compare October 21, 2025 11:26
@jevolk jevolk force-pushed the jevolk/eliminate-transactions branch from 77a12b2 to 8f7cb02 Compare October 21, 2025 11:28
@turt2live turt2live added proposal A matrix spec change proposal s2s Server-to-Server API (federation) kind:maintenance MSC which clarifies/updates existing spec needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. hacktoberfest-accepted labels Oct 21, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Server (sending)
  • Server (receiving)

transmitting EDU's indescriminately will have to be considered and some additional sequencing will
likely be necessary in their payloads.

### Security Considerations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more thoughts. The obvious ones:

  • DoS risk by enabling concurrency without an appropriate recommendation for rate limiting.
  • What happens if an event is sent which has a different id to the one in the path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoS risk by enabling concurrency without an appropriate recommendation for rate limiting.

Indeed rate-limiting should be specified. It doesn't have to be much different either, the same 50PDU+100EDU can apply, this time it's just a measure of channels (or requests) rather than body content.

What happens if an event is sent which has a different id to the one in the path?

I regret choosing the arbitrary-string (anti-)pattern honestly, I think the EduId should just be a content hash. Along with PDU's that cryptographically covers every input to the endpoint.

### Proposal

We specify `PUT /_matrix/federation/v2/send/{ EventId | EduId }` where events are sent
indiscriminately. An `EduId` is an arbitrary string which MUST NOT be prefixed by `$`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No response format?

### Discussion

When used over modern HTTP/2 only a single connection is required to conduct an arbitrary number of
concurrent transmissions. HTTP/1 systems can very safely utilize pipelining considering the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steady on :) there's a reason HTTP/3 exists. Whilst you are no longer blocked at the http level, you are still head-of-line blocked on the TCP level. This matters because it means /send requests can still interfere with each other (large events sent before smaller events can impact the time until the smaller event is sent). HTTP/2 does help processing latency though, which you'd hope is the biggest latency contributor but it depends on the network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the prime mover for HTTP/3 was for the mobile space where even a slightly unreliable link can interfere on all channels without graceful degradation as you mentioned. HTTP/3 now allows for a smooth linear degradation.

In our space the most common interference comes from the hosts themselves rather than from the links (which are actually quite superb given the bias toward datacenter hosting (not even residental self-hosting!) in practice for matrix).

I'm not certain but I do believe HTTP/2 has some tunable parameters for multiplexing. In any case, we're restricted by the 64 KiB event limit for PDU's and it would probably be quite awful if any EDU's are out there which are larger.


##### Unstable Prefix

`PUT /_matrix/federation/unstable/net.zemos.send/{ EventId | EduId }`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the lack of batching here. It adds overhead when there needn't be any. I'd be tempted to batch per room as we often do want to process a bundle of events in the same room, and it helps servers enforce QoS per-room.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell the optimal application shape for h2 (and h3 for that matter) would involve keeping payloads succinct and granular. Somewhere between RFC7540 and RFC9113 (it's been ages since I've reviewed this stuff) the basic unit of exchange is specified as a "frame" which has a 9 byte header, and defaults to a max length of 16KiB, negotiable up to 16MiB. The RFC actually has this gem I stumbled on here:

Endpoints are not obligated to use all available space in a frame. Responsiveness can be improved by using frames that are smaller than the permitted maximum size. Sending large frames can result in delays in sending time-sensitive frames (such as RST_STREAM, WINDOW_UPDATE, or PRIORITY), which, if blocked by the transmission of a large frame, could affect performance.

My understanding is that batching would instead inhibit the multiplexer's degrees of freedom rather than provide any quality advantage. If the link is busy, a batch of PDU's is linearized in competition with other channels (other batches of PDU's). If streams at the API level aren't in use the first PDU in a batch won't be available until the last PDU has arrived. It might risk trading the head-of-line blocking problem for the tail-latency problem.

If PDU's and EDU's are instead sent individually, the only risks and efforts required on our part deal with sequence and ordering -- a very manageable (perhaps even enjoyable!) problem-space to engineer for. There are cases when perhaps we don't want EDU's for a read receipt to arrive before a PDU which it refers to (this is a real problem actually, today). Such issues would have to be contemplated because of the freedom granted by a granular approach; better than a problem-space with no freedom to navigate it.

@@ -0,0 +1,69 @@
# MSC4371: On the elimination of federation transactions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sentiment. However, I think there is probably a better API shape out there.

Conceptually I see federation transactions as a pubsub layer, where the topic is the room id. This would provide necessary batching without unduly blocking events in unrelated rooms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hacktoberfest-accepted kind:maintenance MSC which clarifies/updates existing spec needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal s2s Server-to-Server API (federation)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants