-
Notifications
You must be signed in to change notification settings - Fork 411
MSC4354: Sticky Events #4354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MSC4354: Sticky Events #4354
Conversation
It wasn't particulalry useful for clients, and doesn't help equivocation much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements:
- Client (usage)
- Client (receive/handle)
- Server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementations for the client do not yet implement the latest versions of this MSC. This is currently in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Half-Shot which parts are those specifically? A review of the implementations appear to show it setting up things in a mostly-correct way. (I have no context on what transpired on this MSC between proto-MSC and now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were changes around 4-pules to 3-pules in the key mapping, and actually removing the requirement for the key mapping. This is now implemented in matrix-org/matrix-js-sdk#5028. I'm more happy that the SDK side is plausible now.
We have tested local calls with this MSC and it seems to work fine, but not federated calls. I don't actually see the need to block on federated calls myself, the application layer should be happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3-pules have become 4-pules again so I'll need to check that this all still works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed to check in, this still works.
Co-authored-by: Johannes Marbach <[email protected]>
Co-authored-by: Johannes Marbach <[email protected]>
Team member @mscbot has proposed to merge this. The next step is review by the rest of the tagged people: Concerns:
Once at least 75% of reviewers approve (and there are no outstanding concerns), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for information about what commands tagged team members can give me. |
Co-authored-by: Travis Ralston <[email protected]>
Co-authored-by: Travis Ralston <[email protected]>
@mscbot resolve Unclear if addendum is normative for spec process purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have split the comments into threads (#4354 (comment))
|
||
To implement these properties, servers MUST: | ||
|
||
* Attempt to send their own[^origin] sticky events to all joined servers, whilst respecting per-server backoff times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving from #4354 (comment)
The lack of atomicity in
/send
means clients may flicker RTC member state (update to old values, then immediately to newer values). This happens today too with state events, but less often.
In Synapse this will be especially slow as when we process each sticky event we go and fetch the previous 10 events and then query the state (assuming a large enough gap). This doesn't happen for state, as we'll get the last event and calculate the state for that chunk and atomically persist it. State flickering can happen if the server receives a chunk of events that contain a bunch of state changes, though empirically this is fairly rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't happen for state, as we'll get the last event and calculate the state for that chunk and atomically persist it.
I don't follow this. If I send 50 PDUs all in the same room, a nice linear chain with no forks, we:
- treat all 50 PDUs as live (so will send them down /sync)
- calculate the state before each PDU (only the earliest incurring a state query hit)
- process each PDU atomically, but not the batch of 50.
So you will see flickering?
I think flickering of ephemeral per-user state is inevitable if we wish to hide the key we're modifying in the map from the server. It's definitely a security / UX tradeoff to make, though we've increasingly leant on the side of security for quite some time now. What would the implications be for flicking live-location shares or flickering RTC members? The former likely means the location is updated gradually as the server/client catch up. I think RTC members are reasonably static (they don't change mid-call), so flickering RTC members could make it appear that older members are joined to the call who then leave the call a few milliseconds later? Is this a problem for the call state machine? cc @toger5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously if someone sends 50 sticky events in short succession then that will cause "flickering" as things come down live, but that is reflecting the reality that that state is flickering. That's totally fine.
However, if those 50 events happened over the course of an hour and you see them flickering of state changes then that is a different thing. We have previously made efforts to avoid much flickering on clients.
I think flickering of ephemeral per-user state is inevitable if we wish to hide the key we're modifying in the map from the server
Doesn't some of the encrypted state proposals allow encrypting the state key as well? Or potentially you could have a pointer to previous sticky events that get superseded and these are pulled in automatically (and if the server pulls them in then it knows not to treat them as "live")?
|
||
To implement these properties, servers MUST: | ||
|
||
* Attempt to send their own[^origin] sticky events to all joined servers, whilst respecting per-server backoff times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving from #4354 (comment)
how does MatrixRTC handle push notifications for incoming calls? (tangential to this MSC but whatever)
The question is: do we want to use sticky events for MatrixRTC notifications, and if so will that make the flickering problem much more noticeable/problematic?
Naively to me it feels odd to not use sticky events for call notifications, e.g. I'd have thought you would want to be notified for all calls in a DM. If you don't use sticky events you could end up in the situation where you see the call in the UI but not be notified about it.
|
||
To implement these properties, servers MUST: | ||
|
||
* Attempt to send their own[^origin] sticky events to all joined servers, whilst respecting per-server backoff times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving from #4354 (comment)
we will accumulate more forward extremities when catching up as we are now including sticky events in the initial batch of events when catching up. This is a concern, but having to deal with lots of forward extremities isn't a new concern.
One potential security concern here is that it makes it easier for users on one server to generate lots of extremities on another server, which can lead to performance issues in very large rooms. This does only work when the connection between the two servers is down (e.g. the remote server is down).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes it easier for users on one server to generate lots of extremities on another server
This is true today via message events surely? Like, I can trivially make lots of events and trickle every Nth message to cause forward extremity accumulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't as a user on the server, but yes the server can.
users sending multiple events with the same `sticky_key`. To deterministically tie-break, clients which | ||
implement this behaviour MUST[^maporder]: | ||
|
||
- pick the one with the highest `origin_server_ts`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the text below we do try to mitigate any possible client desynchronization. It might be easier to just define the sticky map as:
last to expire wins
This way we have actually prohibit diverging clients. In the current and dont motivate client implementations to maybe add additional tests for the expiration on top of the origin_server_ts
ordering.
If a client wants to update the sticky map they are now enforced to use the same (minus time passed) (or greater) expiration time, otherwise their event will not update other clients local sticky maps.
This might help to reduce the text in the following section as well.
- pick the one with the highest `origin_server_ts`, | |
- pick the one with the highest `origin_server_ts + sticky.duration_ms`, |
> If a client sends two sticky events in the same millisecond, the 2nd event may be replaced by the 1st if | ||
> the event ID of the 1st event has a higher lexicographical event ID. To protect against this, clients should | ||
> ensure that they wait at least 1 millisecond between sending sticky events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section should mention at one point "with the same sticky_key".
(This information can be guessed becaues of "the 2nd event may be replaced by the 1st" which is only the case if they do have the same sticky_key but the conclusion should include it explicitly imo.
> If a client sends two sticky events in the same millisecond, the 2nd event may be replaced by the 1st if | |
> the event ID of the 1st event has a higher lexicographical event ID. To protect against this, clients should | |
> ensure that they wait at least 1 millisecond between sending sticky events. | |
> If a client sends two sticky events in the same millisecond, the 2nd event may be replaced by the 1st if | |
> the event ID of the 1st event has a higher lexicographical event ID. To protect against this, clients should | |
> ensure that they wait at least 1 millisecond between sending sticky events with the same `sticky_key`. |
|
||
- pick the one with the highest `origin_server_ts`, | ||
- tie break on the one with the highest lexicographical event ID (A < Z). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redaction behaviour needs specifying.
} | ||
``` | ||
|
||
Sticky events are expected to be encrypted and so there is no "state filter" equivalent provided for sticky events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "state filter" refer to? I don't see that phrase anywhere in the C-S spec. Is it referring to https://spec.matrix.org/unstable/client-server-api/#filtering ?
On the topic of filtering, should events from ignored users be dropped?
Rendered
SCT Stuff:
FCP tickyboxes
MSC checklist