Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions proposals/4371-eliminate-transactions.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Server (sending)
  • Server (receiving)

Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# MSC4371: On the elimination of federation transactions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sentiment. However, I think there is probably a better API shape out there.

Conceptually I see federation transactions as a pubsub layer, where the topic is the room id. This would provide necessary batching without unduly blocking events in unrelated rooms.


Server Specification [v1.16 § 4](https://spec.matrix.org/v1.16/server-server-api/#transactions)
(including all prior versions) defines an envelope structure accompanying a protocol for common message
transport between servers referred to as "transactions." These structures collect messages queued by
an origin for a destination, then transmitted, acknowledged by the destination, and then this process
is repeated with new messages queued by the origin in the interim.

Transactions have existed since the early protocol (circa 2014) when HTTP/1.1 was the common standard
of transport. In HTTP/1 requests are processed sequentially within each connection. Multiple
connections may be used for concurrent processing but a federation server will already be
communicating to many destinations; minimizing connections between hosts is essential. Pipelining may
also be used to hide latency but without explicit support by HTTP/1 there are many complications;
protocol designers instead lean toward other solutions. From this environment federation transactions
arose.

Ironically transactions succumb to the same shortcomings as HTTP/1 itself. The Matrix protocol
specifies that only one transaction can be in flight at a time. The round-trip time for successful
acknowledgement must be paid before new information even begins to transmit. This introduces a
head-of-line-blocking effect, often paralyzing communication for any number of reasons such as
implementation errors, denial-of-service exploitation, or common processing where latent network
requests are often required to resolve a message to acceptance. During these events messages will
continue to queue on an origin. Eventually this queue exceeds the limits for a single transaction thus
requiring multiple rounds of transactions. These queuing events have been known to take days to
resolve.

Many messages bundled in these tranches often have no dependency on each other. For example, the
primary context division in Matrix is the Room: rooms have no specified interdependency: "transacting"
messages from different rooms at the same time serves no purpose. It is purely a hazard. Worse, the
primary unit of messaging for a room, the PDU, contains its own sequencing and reliability mechanism
allowing it to exist fully independent of any transaction—as it virtually always does in every other
context where PDU's are found. Sequencing PDU's in separate transactions is simply not necessary;
purely a hazard.

The specification states: "A Transaction is meaningful only to the pair of homeservers that exchanged
it; they are not globally-meaningful." This limited use and isolation eases our task to reduce or
eliminate transactions entirely.

### Proposal

We specify `PUT /_matrix/federation/v2/send/{ EventId | EduId }` where events are sent
indiscriminately. An `EduId` is an arbitrary string which MUST NOT be prefixed by `$`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No response format?


##### Unstable Prefix

`PUT /_matrix/federation/unstable/net.zemos.send/{ EventId | EduId }`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the lack of batching here. It adds overhead when there needn't be any. I'd be tempted to batch per room as we often do want to process a bundle of events in the same room, and it helps servers enforce QoS per-room.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell the optimal application shape for h2 (and h3 for that matter) would involve keeping payloads succinct and granular. Somewhere between RFC7540 and RFC9113 (it's been ages since I've reviewed this stuff) the basic unit of exchange is specified as a "frame" which has a 9 byte header, and defaults to a max length of 16KiB, negotiable up to 16MiB. The RFC actually has this gem I stumbled on here:

Endpoints are not obligated to use all available space in a frame. Responsiveness can be improved by using frames that are smaller than the permitted maximum size. Sending large frames can result in delays in sending time-sensitive frames (such as RST_STREAM, WINDOW_UPDATE, or PRIORITY), which, if blocked by the transmission of a large frame, could affect performance.

My understanding is that batching would instead inhibit the multiplexer's degrees of freedom rather than provide any quality advantage. If the link is busy, a batch of PDU's is linearized in competition with other channels (other batches of PDU's). If streams at the API level aren't in use the first PDU in a batch won't be available until the last PDU has arrived. It might risk trading the head-of-line blocking problem for the tail-latency problem.

If PDU's and EDU's are instead sent individually, the only risks and efforts required on our part deal with sequence and ordering -- a very manageable (perhaps even enjoyable!) problem-space to engineer for. There are cases when perhaps we don't want EDU's for a read receipt to arrive before a PDU which it refers to (this is a real problem actually, today). Such issues would have to be contemplated because of the freedom granted by a granular approach; better than a problem-space with no freedom to navigate it.


### Discussion

When used over modern HTTP/2 only a single connection is required to conduct an arbitrary number of
concurrent transmissions. HTTP/1 systems can very safely utilize pipelining considering the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steady on :) there's a reason HTTP/3 exists. Whilst you are no longer blocked at the http level, you are still head-of-line blocked on the TCP level. This matters because it means /send requests can still interfere with each other (large events sent before smaller events can impact the time until the smaller event is sent). HTTP/2 does help processing latency though, which you'd hope is the biggest latency contributor but it depends on the network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the prime mover for HTTP/3 was for the mobile space where even a slightly unreliable link can interfere on all channels without graceful degradation as you mentioned. HTTP/3 now allows for a smooth linear degradation.

In our space the most common interference comes from the hosts themselves rather than from the links (which are actually quite superb given the bias toward datacenter hosting (not even residental self-hosting!) in practice for matrix).

I'm not certain but I do believe HTTP/2 has some tunable parameters for multiplexing. In any case, we're restricted by the 64 KiB event limit for PDU's and it would probably be quite awful if any EDU's are out there which are larger.

idempotency of named PUT requests.


### Alternatives

A possible alternative would be to keep the transaction structure while amending the protocol
semantics for requisite conccurency in the modern age. Nevertheless the transaction structure has some
defects for optimal network software. For example, network software benefits from transmitting the
same message to multiple destinations without recrafting specific versions for each destination.

### Potential Issues

Some EDU's can exist naturally outside of transactions such as read-receipts which target a specific
`event_id`, can be replayed, and can be received in any order. Nevertheless a wider analysis of
transmitting EDU's indescriminately will have to be considered and some additional sequencing will
likely be necessary in their payloads.

### Security Considerations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more thoughts. The obvious ones:

  • DoS risk by enabling concurrency without an appropriate recommendation for rate limiting.
  • What happens if an event is sent which has a different id to the one in the path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoS risk by enabling concurrency without an appropriate recommendation for rate limiting.

Indeed rate-limiting should be specified. It doesn't have to be much different either, the same 50PDU+100EDU can apply, this time it's just a measure of channels (or requests) rather than body content.

What happens if an event is sent which has a different id to the one in the path?

I regret choosing the arbitrary-string (anti-)pattern honestly, I think the EduId should just be a content hash. Along with PDU's that cryptographically covers every input to the endpoint.