-
Notifications
You must be signed in to change notification settings - Fork 411
MSC4371: On the elimination of federation transactions. #4371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# MSC4371: On the elimination of federation transactions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with the sentiment. However, I think there is probably a better API shape out there. Conceptually I see federation transactions as a pubsub layer, where the topic is the room id. This would provide necessary batching without unduly blocking events in unrelated rooms. |
||
|
||
Server Specification [v1.16 § 4](https://spec.matrix.org/v1.16/server-server-api/#transactions) | ||
(including all prior versions) defines an envelope structure accompanying a protocol for common message | ||
transport between servers referred to as "transactions." These structures collect messages queued by | ||
an origin for a destination, then transmitted, acknowledged by the destination, and then this process | ||
is repeated with new messages queued by the origin in the interim. | ||
|
||
Transactions have existed since the early protocol (circa 2014) when HTTP/1.1 was the common standard | ||
of transport. In HTTP/1 requests are processed sequentially within each connection. Multiple | ||
connections may be used for concurrent processing but a federation server will already be | ||
communicating to many destinations; minimizing connections between hosts is essential. Pipelining may | ||
also be used to hide latency but without explicit support by HTTP/1 there are many complications; | ||
protocol designers instead lean toward other solutions. From this environment federation transactions | ||
arose. | ||
|
||
Ironically transactions succumb to the same shortcomings as HTTP/1 itself. The Matrix protocol | ||
specifies that only one transaction can be in flight at a time. The round-trip time for successful | ||
acknowledgement must be paid before new information even begins to transmit. This introduces a | ||
head-of-line-blocking effect, often paralyzing communication for any number of reasons such as | ||
implementation errors, denial-of-service exploitation, or common processing where latent network | ||
requests are often required to resolve a message to acceptance. During these events messages will | ||
continue to queue on an origin. Eventually this queue exceeds the limits for a single transaction thus | ||
requiring multiple rounds of transactions. These queuing events have been known to take days to | ||
resolve. | ||
|
||
Many messages bundled in these tranches often have no dependency on each other. For example, the | ||
primary context division in Matrix is the Room: rooms have no specified interdependency: "transacting" | ||
messages from different rooms at the same time serves no purpose. It is purely a hazard. Worse, the | ||
primary unit of messaging for a room, the PDU, contains its own sequencing and reliability mechanism | ||
allowing it to exist fully independent of any transaction—as it virtually always does in every other | ||
context where PDU's are found. Sequencing PDU's in separate transactions is simply not necessary; | ||
purely a hazard. | ||
|
||
The specification states: "A Transaction is meaningful only to the pair of homeservers that exchanged | ||
it; they are not globally-meaningful." This limited use and isolation eases our task to reduce or | ||
eliminate transactions entirely. | ||
|
||
### Proposal | ||
|
||
We specify `PUT /_matrix/federation/v2/send/{ EventId | EduId }` where events are sent | ||
indiscriminately. An `EduId` is an arbitrary string which MUST NOT be prefixed by `$`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No response format? |
||
|
||
##### Unstable Prefix | ||
|
||
`PUT /_matrix/federation/unstable/net.zemos.send/{ EventId | EduId }` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dislike the lack of batching here. It adds overhead when there needn't be any. I'd be tempted to batch per room as we often do want to process a bundle of events in the same room, and it helps servers enforce QoS per-room. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I can tell the optimal application shape for h2 (and h3 for that matter) would involve keeping payloads succinct and granular. Somewhere between RFC7540 and RFC9113 (it's been ages since I've reviewed this stuff) the basic unit of exchange is specified as a "frame" which has a 9 byte header, and defaults to a max length of 16KiB, negotiable up to 16MiB. The RFC actually has this gem I stumbled on here:
My understanding is that batching would instead inhibit the multiplexer's degrees of freedom rather than provide any quality advantage. If the link is busy, a batch of PDU's is linearized in competition with other channels (other batches of PDU's). If streams at the API level aren't in use the first PDU in a batch won't be available until the last PDU has arrived. It might risk trading the head-of-line blocking problem for the tail-latency problem. If PDU's and EDU's are instead sent individually, the only risks and efforts required on our part deal with sequence and ordering -- a very manageable (perhaps even enjoyable!) problem-space to engineer for. There are cases when perhaps we don't want EDU's for a read receipt to arrive before a PDU which it refers to (this is a real problem actually, today). Such issues would have to be contemplated because of the freedom granted by a granular approach; better than a problem-space with no freedom to navigate it. |
||
|
||
### Discussion | ||
|
||
When used over modern HTTP/2 only a single connection is required to conduct an arbitrary number of | ||
concurrent transmissions. HTTP/1 systems can very safely utilize pipelining considering the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Steady on :) there's a reason HTTP/3 exists. Whilst you are no longer blocked at the http level, you are still head-of-line blocked on the TCP level. This matters because it means /send requests can still interfere with each other (large events sent before smaller events can impact the time until the smaller event is sent). HTTP/2 does help processing latency though, which you'd hope is the biggest latency contributor but it depends on the network. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes but the prime mover for HTTP/3 was for the mobile space where even a slightly unreliable link can interfere on all channels without graceful degradation as you mentioned. HTTP/3 now allows for a smooth linear degradation. In our space the most common interference comes from the hosts themselves rather than from the links (which are actually quite superb given the bias toward datacenter hosting (not even residental self-hosting!) in practice for matrix). I'm not certain but I do believe HTTP/2 has some tunable parameters for multiplexing. In any case, we're restricted by the 64 KiB event limit for PDU's and it would probably be quite awful if any EDU's are out there which are larger. |
||
idempotency of named PUT requests. | ||
|
||
|
||
### Alternatives | ||
|
||
A possible alternative would be to keep the transaction structure while amending the protocol | ||
semantics for requisite conccurency in the modern age. Nevertheless the transaction structure has some | ||
defects for optimal network software. For example, network software benefits from transmitting the | ||
same message to multiple destinations without recrafting specific versions for each destination. | ||
|
||
### Potential Issues | ||
|
||
Some EDU's can exist naturally outside of transactions such as read-receipts which target a specific | ||
`event_id`, can be replayed, and can be received in any order. Nevertheless a wider analysis of | ||
transmitting EDU's indescriminately will have to be considered and some additional sequencing will | ||
likely be necessary in their payloads. | ||
|
||
### Security Considerations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Needs more thoughts. The obvious ones:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Indeed rate-limiting should be specified. It doesn't have to be much different either, the same 50PDU+100EDU can apply, this time it's just a measure of channels (or requests) rather than body content.
I regret choosing the arbitrary-string (anti-)pattern honestly, I think the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements: