-
Notifications
You must be signed in to change notification settings - Fork 411
MSC4371: On the elimination of federation transactions. #4371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MSC4371: On the elimination of federation transactions. #4371
Conversation
0504cee
to
77a12b2
Compare
Signed-off-by: Jason Volk <[email protected]>
77a12b2
to
8f7cb02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements:
- Server (sending)
- Server (receiving)
transmitting EDU's indescriminately will have to be considered and some additional sequencing will | ||
likely be necessary in their payloads. | ||
|
||
### Security Considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs more thoughts. The obvious ones:
- DoS risk by enabling concurrency without an appropriate recommendation for rate limiting.
- What happens if an event is sent which has a different id to the one in the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DoS risk by enabling concurrency without an appropriate recommendation for rate limiting.
Indeed rate-limiting should be specified. It doesn't have to be much different either, the same 50PDU+100EDU can apply, this time it's just a measure of channels (or requests) rather than body content.
What happens if an event is sent which has a different id to the one in the path?
I regret choosing the arbitrary-string (anti-)pattern honestly, I think the EduId
should just be a content hash. Along with PDU's that cryptographically covers every input to the endpoint.
### Proposal | ||
|
||
We specify `PUT /_matrix/federation/v2/send/{ EventId | EduId }` where events are sent | ||
indiscriminately. An `EduId` is an arbitrary string which MUST NOT be prefixed by `$`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No response format?
### Discussion | ||
|
||
When used over modern HTTP/2 only a single connection is required to conduct an arbitrary number of | ||
concurrent transmissions. HTTP/1 systems can very safely utilize pipelining considering the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steady on :) there's a reason HTTP/3 exists. Whilst you are no longer blocked at the http level, you are still head-of-line blocked on the TCP level. This matters because it means /send requests can still interfere with each other (large events sent before smaller events can impact the time until the smaller event is sent). HTTP/2 does help processing latency though, which you'd hope is the biggest latency contributor but it depends on the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but the prime mover for HTTP/3 was for the mobile space where even a slightly unreliable link can interfere on all channels without graceful degradation as you mentioned. HTTP/3 now allows for a smooth linear degradation.
In our space the most common interference comes from the hosts themselves rather than from the links (which are actually quite superb given the bias toward datacenter hosting (not even residental self-hosting!) in practice for matrix).
I'm not certain but I do believe HTTP/2 has some tunable parameters for multiplexing. In any case, we're restricted by the 64 KiB event limit for PDU's and it would probably be quite awful if any EDU's are out there which are larger.
|
||
##### Unstable Prefix | ||
|
||
`PUT /_matrix/federation/unstable/net.zemos.send/{ EventId | EduId }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike the lack of batching here. It adds overhead when there needn't be any. I'd be tempted to batch per room as we often do want to process a bundle of events in the same room, and it helps servers enforce QoS per-room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell the optimal application shape for h2 (and h3 for that matter) would involve keeping payloads succinct and granular. Somewhere between RFC7540 and RFC9113 (it's been ages since I've reviewed this stuff) the basic unit of exchange is specified as a "frame" which has a 9 byte header, and defaults to a max length of 16KiB, negotiable up to 16MiB. The RFC actually has this gem I stumbled on here:
Endpoints are not obligated to use all available space in a frame. Responsiveness can be improved by using frames that are smaller than the permitted maximum size. Sending large frames can result in delays in sending time-sensitive frames (such as RST_STREAM, WINDOW_UPDATE, or PRIORITY), which, if blocked by the transmission of a large frame, could affect performance.
My understanding is that batching would instead inhibit the multiplexer's degrees of freedom rather than provide any quality advantage. If the link is busy, a batch of PDU's is linearized in competition with other channels (other batches of PDU's). If streams at the API level aren't in use the first PDU in a batch won't be available until the last PDU has arrived. It might risk trading the head-of-line blocking problem for the tail-latency problem.
If PDU's and EDU's are instead sent individually, the only risks and efforts required on our part deal with sequence and ordering -- a very manageable (perhaps even enjoyable!) problem-space to engineer for. There are cases when perhaps we don't want EDU's for a read receipt to arrive before a PDU which it refers to (this is a real problem actually, today). Such issues would have to be contemplated because of the freedom granted by a granular approach; better than a problem-space with no freedom to navigate it.
@@ -0,0 +1,69 @@ | |||
# MSC4371: On the elimination of federation transactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the sentiment. However, I think there is probably a better API shape out there.
Conceptually I see federation transactions as a pubsub layer, where the topic is the room id. This would provide necessary batching without unduly blocking events in unrelated rooms.
rendered