-
Notifications
You must be signed in to change notification settings - Fork 412
MSC3401: Native Group VoIP Signalling #3401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
05fd5af
7f5ee49
083fd9a
5ee96fb
b90b85e
ed37a0d
33a64f2
7fd1ba6
669d471
48526ad
dfd4ffe
3c306cc
4d43aae
856ddc7
d109b54
07f9547
7a06ed7
32f566a
3fde32b
05b5db2
43dc42f
5635cee
b8ebe27
6b98d66
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,264 @@ | ||
# MSC3401: Native Group VoIP signalling | ||
|
||
## Problem | ||
|
||
VoIP signalling in Matrix is currently conducted via timeline events in a 1:1 room. | ||
This has some limitations, especially if you try to broaden the approach to multiparty VoIP calls: | ||
|
||
* VoIP signalling can generate a lot of events as candidates are incrementally discovered, and for rapid call setup these need to be relayed as rapidly as possible. | ||
* Putting these into the room timeline means that if the client has a gappy sync, for VoIP to be reliable it will need to go back and fill in the gap before it can process any VoIP events, slowing things down badly. | ||
* Timeline events are (currently) subject to harsh rate limiting, as they are assumed to be a spam vector. | ||
* VoIP signalling leaks IP addresses. There is no reason to keep these around for posterity, and they should only be exposed to the devices which care about them. | ||
* Candidates are ephemeral data, and there is no reason to keep them around for posterity - they're just clogging up the DAG. | ||
|
||
Meanwhile we have no native signalling for group calls at all, forcing you to instead embed a separate system such as Jitsi, which has its own dependencies and doesn't directly leverage any of Matrix's encryption, decentralisation, access control or data model. | ||
|
||
## Proposal | ||
|
||
This proposal provides a signalling framework using to-device messages which can be applied to native Matrix 1:1 calls, full-mesh calls, SFU calls, cascaded SFU calls and in future MCU calls, and hybrid SFU/MCU approaches. It replaces the early flawed sketch at [MSC2359](https://github.com/matrix-org/matrix-doc/pull/2359). | ||
|
||
This does not immediately replace the current 1:1 call signalling, but may in future provide a migration path to unified signalling for 1:1 and group calls. | ||
|
||
Diagramatically, this looks like: | ||
|
||
1:1: | ||
``` | ||
A -------- B | ||
``` | ||
|
||
Full mesh between clients | ||
``` | ||
A -------- B | ||
\ / | ||
\ / | ||
\ / | ||
\ / | ||
C | ||
``` | ||
|
||
SFU (aka Focus): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bikeshedding warning: I'm relatively new to the WebRTC/VoIP industry, but I have never heard the term focus used in place of SFU. Is this a commonly known term? Should we be using SFU in this spec instead? Including renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the reason i originally went with However, in the current simpler draft, the only time you include this field is if you are using a conferencing focus of some kind. But, this proposal is not meant to just be for SFUs - the device you use to focus together your view of the conference could (in future) equally be an MCU as much as an SFU. Hence using the correct more generic term of 'focus' rather than making it specific to SFU technology. For instance, the server could advertise a stream which composites together a mosaic of different feeds for a non-E2EE call... at which point it's acting as a (hybrid) MCU. The term 'focus' comes from SIP (e.g. https://datatracker.ietf.org/doc/html/rfc3840#section-10.18) and is the standard term there for "an endpoint you connect to which mixes together other endpoints". I'm slightly inclined to keep it, to keep thing flexible for future more sophisticated foci tech. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we call it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. focus is a pretty well-known word, and foci is its plural. i don't particularly want to call it 'focuses', given that's a different word (the 3rd person present form of 'to focus'). not sure this is a showstopper. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It definitely isn't a showstopper but I would like to come up with a better name if we can. It is also a bit of a red-flag that just about everything else in the MSC is calling it a SFU. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While focus is a well-known word, outside of Britain its plural is 'focuses', so I would expect that a lot of people are going to be similarly confused over its meaning. Even the Cambridge Dictionary lists 'focuses' as the plural, while listing 'foci' as the formal plural in the UK. Might it be possible to at least mention in the spec that it's used in this sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm coming around to using "foci" as the word and there are references out there in the wild for "foci" being used in SIP terminology https://datatracker.ietf.org/doc/html/rfc4575#section-3.8 I think we should keep foci. |
||
``` | ||
A __ __ B | ||
\ / | ||
F | ||
| | ||
| | ||
C | ||
Where F is an SFU focus | ||
``` | ||
|
||
Cascaded decentralised SFU: | ||
``` | ||
A1 --. .-- B1 | ||
A2 ---Fa ----- Fb--- B2 | ||
\ / | ||
\ / | ||
\ / | ||
\ / | ||
Fc | ||
| | | ||
C1 C2 | ||
Where Fa, Fb and Fc are SFU foci, one per homeserver, each with two clients. | ||
``` | ||
|
||
### m.call state event | ||
|
||
The user who wants to initiate a call sends a `m.call` state event into the room to inform the room participants that a call is happening in the room. This effectively becomes the placeholder event in the timeline which clients would use to display the call in their scrollback (including duration and termination reason using `m.terminated`). Its body has the following fields: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How should glare be handled at the group call level in the case where multiple parties actually didn't meant to set up separate group calls in a room but just meant to call each other? For example, we could dictate that calls that have the same purpose and name should be able to replace each other in case of glare? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a very good question. Any idea @ara4n? I think because the In any case, I think glare is a non issue for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Glare can happen with any call type though if two clients decide to set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Perhaps there should be a way to specify a different power level requirement for different intents as well. A Discord user would expect to be able to start a room's call freely without disturbing other members of the room ala There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imho outside of DMs (where both users have PL100 anyway usually) calls should not be allowed for normal users. It is still a vector of spam. Just imagine having calls being started in Matrix HQ. It would just cause issues imho. Imho it is a sane default to restrict this and need active changes to allow it in a room. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what I mean about the different call intents causing different levels of disruption. You're right, obviously Unless I'm misunderstanding the purpose of |
||
|
||
* `m.intent` to describe the intended UX for handling the call. One of: | ||
* `m.ring` if the call is meant to cause the room participants devices to ring (e.g. 1:1 call or group call) | ||
* `m.conference` is the call should be presented as a conference call which users in the room may connect to | ||
|
||
* `m.room` if the call should be presented as a voice/video channel in which the user is immediately immersed on selecting the room. | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* `m.type` to say whether the initial type of call is voice only (`m.voice`) or video (`m.video`). This signals the intent of the user when placing the call to the participants (i.e. "i want to have a voice call with you" or "i want to have a video call with you") and warns the receiver whether they may be expected to view video or not, and provide suitable initial UX for displaying that type of call... even if it later gets upgraded to a video call. | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* `m.terminated` if this event indicates that the call in question has finished, including the reason why. (A voice/video room will never terminate.) (do we need a duration, or can we figure that out from the previous state event?). | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* `m.name` as an optional human-visible label for the call (e.g. "Conference call"). | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* `m.foci` as an optional list of recommended SFUs that the call initiator can recommend to users who do not want to use their own SFU (because they don't have one, or because they spot they would be the only person on their SFU for their call, and so choose to connect direct to save bandwidth). | ||
ara4n marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* The State key is a unique ID for that call. (We can't use the event ID, given `m.type` and `m.terminated` is mutable). If there are multiple non-termianted conf ID state events in the room, the client should display the most recently edited event. | ||
ara4n marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
For instance: | ||
|
||
```jsonc | ||
{ | ||
"type": "m.call", | ||
"state_key": "cvsiu2893", | ||
"content": { | ||
"m.intent": "m.room", | ||
"m.type": "m.voice", | ||
"m.name": "Voice room", | ||
"m.foci": [ | ||
"@sfu-lon:matrix.org", | ||
"@sfu-nyc:matrix.org", | ||
], | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
} | ||
} | ||
``` | ||
|
||
We mandate at most one call per room at any given point to avoid UX nightmares - if you want the user to participate in multiple parallel calls, you should simply create multiple rooms, each with one call. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this is worth considering though, the UX nightmare might not be that bad (some clients might even work entirely with this possibility), and personally i think that putting the conf ID in a sub-field is just asking for problems (if the previous call information gets overridden by a person sending another state event for a "new" call while the last one is still in-progress.) Why not move conf_id into the state_key, currently declare multiple calls UB and unsupported, while noting that speccing it and properly seating it would be a case for a future MSC? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. have done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re-opening this one because we've just had a glare-like bug on Element Call where multiple people entered the call at the same time (as you do) and multiple conferences got created in the same room. In general, we're going to want some way to handle glare of several people hitting the 'start conference call' button at the same time. Allowing multiple calls in a room means we need to handle this somehow. It's not impossible (eg. we could define some common ID for 'the' call in a room allowing you to use other IDs for other calls?) but I'd just like to check that we really want to deal with this complexity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am also very much in favour of having the With MSC3985 we now also have a separate method to create break-out rooms, so it feels like multiple calls in one room are no longer necessary I also think we should be able to use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there is still an issue with relying on With separate state keys, this is a lot easier, because it gives you a way to efficiently look up the current state of any call, current or historical. |
||
|
||
### Call participation | ||
|
||
Users who want to participate in the call declare this by publishing a `m.call.member` state event using their matrix ID as the state key (thus ensuring other users cannot edit it). The event contains an array of `m.calls` object describing which calls the user is participating in within that room. This array must contain one item (for now)> | ||
|
||
The fields within the item in the `m.calls` contents are: | ||
|
||
* `m.call_id` - the ID of the conference the user is claiming to participate in. If this doesn't match an unterminated `m.call` event, it should be ignored. | ||
|
||
* `m.foci` - Optionally, if the user wants to be contacted via an SFU rather than called directly (either 1:1 or full mesh), the user can also specify the SFUs their client(s) are connecting to. | ||
* `m.sources` - Optionally, the user can list the various combinations of media streams they are able to send. This is important if connecting to an SFU, as it lets the SFU know what simulcast resolutions the sender can send. In theory the offered SDP should include this, but if we are multiplexing all streams into the same SDP it seems likely that this will get lost, hence publishing it here. If the conference has no SFU, this list defines the devices which other devices should connect to full-mesh in order to participate. | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
For instance: | ||
|
||
```jsonc | ||
{ | ||
"type": "m.call.member", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. todo: actually track here whether the participant is joined to the call or not(!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah we still have an issue with tracking participants for a given group call for displaying in the UI. How are we going to check who is in a call and scale it? |
||
"state_key": "@matthew:matrix.org", | ||
"content": { | ||
"m.calls": [ | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
"m.call_id": "cvsiu2893", | ||
"m.foci": [ | ||
"@sfu-lon:matrix.org", | ||
"@sfu-nyc:matrix.org", | ||
], | ||
"m.sources": [ | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
{ | ||
"id": "qegwy64121wqw", | ||
"name": "Webcam", // optional, just to help users understand what multiple streams from the same person mean. | ||
"device_id": "ASDUHDGFYUW", // just in case people ending up dialing this directly for full mesh or 1:1 | ||
"voice": [ | ||
{ "id": "zbhsbdhwe", "format": { "channels": 2, "rate": 48000, "maxbr": 32000 } }, | ||
], | ||
"video": [ | ||
{ "id": "zbhsbdhzs", "format": { "res": { "width": 1280, "height": 720 }, "fps": 30, "maxbr": 512000 } }, | ||
{ "id": "zbhsbdhzx", "format": { "res": { "width": 320, "height": 240 }, "fps": 15, "maxbr": 48000 } }, | ||
], | ||
"mosaic": {}, // for composited video streams? | ||
}, | ||
{ | ||
"id": "suigv372y8378", | ||
"name": "Screenshare", // optional | ||
"device_id": "ASDUHDGFYUW", | ||
"video": [ | ||
{ "id": "xhsbdhzs", "format": { "res": { "width": 1280, "height": 720 }, "fps": 30, "maxbr": 512000 } }, | ||
{ "id": "xbhsbdhzx", "format": { "res": { "width": 320, "height": 240 }, "fps": 15, "maxbr": 48000 } }, | ||
] | ||
}, | ||
] | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
XXX: properly specify the formats here (webrtc constraints perhaps)? | ||
|
||
It's acceptable to advertise rigid formats here rather than dynamically negotiating resolution, bitrate etc, as in a group call we should just pick plausible desirable formats rather than try to please everyone. | ||
|
||
If a device loses connectivity, it is not particularly problematic that the membership data will be stale: all that will happen is that calls to the disconnected device will fail due to media or data-channel keepalive timeouts, and then subsequent attempts to call that device will fail. Therefore (unlike the earlier demos) we don't need to spot timeouts by constantly re-posting the state event. | ||
|
||
|
||
### Call setup | ||
|
||
Call setup then uses the normal `m.call.*` events, except they are sent over to-device messages to the relevant devices (encrypted via Olm). This means: | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
* When initiating a 1:1 call, the `m.call.invite` is sent to `*` devices of the intended target user. | ||
* Once the user answers the call from the device, the sender should rescind the other pending to-device messages, ensuring that other devices don't get spammed about long-obsolete 1:1 calls. XXX: We will need a way to rescind pending to-device msgs. | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* Subsequent candidates and other events are sent only to the device who answered. | ||
* XXX: do we still need MSC2746's `party_id` and `m.call.select_answer`? | ||
* We will need to include the `m.call_id` and room_id so that peers can map the call to the right room. | ||
* However, especially for 1:1 calls, we might want to let the to-device messages flow and cause the client to ring even before the `m.call` event propagates, to minimise latency. Therefore we'll need to include an `m.intent` on the `m.call.invite` too. | ||
* When initiating a group call, we need to decide which devices to actually talk to. | ||
* If the client has no SFU configured, we try to use the `m.foci` in the `m.call` event. | ||
* If there are multiple `m.foci`, we select the closest one based on latency, e.g. by trying to connect to all of them simultaneously and discarding all but the first call to answer. | ||
* If there are no `m.foci` in the `m.call` event, then we look at which foci in `m.room.member` that are already in use by existing participants, and select the most common one. (If the foci is overloaded it can reject us and we should then try the next most populous one, etc). | ||
* If there are no `m.foci` in the `m.room.member`, then we connect full mesh. | ||
* If subsequently `m.foci` are introduced into the conference, then we should transfer the call to them (effectively doing a 1:1->group call upgrade). | ||
* If the client does have an SFU configured, then we decide whether to use it. | ||
ara4n marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* If other conf participants are already using it, then we use it. | ||
* If there are other users from our homeserver in the conference, then we use it (as presumably they should be using it too) | ||
* If there are no other `m.foci` (either in the `m.call` or in the participant state) then we use it. | ||
* Otherwise, we save bandwidth on our SFU by not cascading and instead behaving as if we had no SFU configured. | ||
SimonBrandner marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
TODO: spec how clients discover their homeserver's preferred SFU foci | ||
|
||
Originally this proposal suggested that foci should be identified by their `(user_id, device_id)` rather than just their user_id, in order to ensure convergence on the same device. In practice, this is unnecessary complication if we make it the SFU implementor's problem to ensure that either only one device is logged in per SFU user - or instead you cluster the SFU devices together for the same user. It's important to note that when calling an SFU you should call `*` devices. | ||
|
||
### SFU control | ||
ara4n marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
SFUs are Selective Forwarding Units - a server which forwarding WebRTC streams between peers (which could be clients or SFUs or both). To make use of them effectively, peers need to be able to tell the SFU which streams they want to receive, and the SFU must tell the peers which streams it wants to be sent. We also need a way of telling SFUs which other SFUs to connect ("cascade") to. | ||
|
||
The client does this by establishing an optional datachannel connection to the SFU using normal `m.call.invite`, in order to perform low-latency signalling to rapidly select streams. | ||
|
||
To select a stream over this channel, the peer sends: | ||
|
||
```jsonc | ||
{ | ||
"op": "select", | ||
"conf_id": "cvsiu2893", | ||
"start": [ | ||
"zbhsbdhwe", | ||
"zbhsbdhzs", | ||
], | ||
"stop": [ | ||
"zbhsbdhxz", | ||
] | ||
} | ||
``` | ||
|
||
Rather than sending arrays one can send `"all"` to either `start` or `stop` to start or stop all streams. | ||
|
||
|
||
ara4n marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
All streams are sent within a single media session (rather than us having multiple sessions or calls), and there is no difference between a peer sending simulcast streams from a webcam versus two SFUs trunking together. | ||
|
||
If no DC is established, then 1:1 calls should send all streams without prompting, and SFUs should send no streams by default. | ||
|
||
If you are using your SFU in a call, it will need to know how to connect to other SFUs present in order to participate in the fullmesh of SFU traffic (if any). One option here is for SFUs to act as an AS and sniff the `m.room.member` traffic of their associated server, and automatically call any other `m.foci` which appear. (They don't need to make outbound calls to clients, as clients always dial in). Otherwise, we could consider an `"op": "connect"` command sent by clients, but then you have the problem of deciding which client(s) are responsible for reminding the SFU to connect to other SFUs. Much better to trust the server. | ||
|
||
Also, in order to authenticate that only legitimate users are allowed to subscribe to a given conf_id on an SFU, it would make sense for the SFU to act as an AS and sniff the `m.call` events on their associated server, and only act on to-device `m.call.*` events which come from a user who is confirmed to be in the room for that `m.call`. (In practice, if the conf is E2EE then it's of limited use to connect to the SFU without having the keys to decrypt the traffic, but this feature is desirable for non-E2EE confs and to stop bandwidth DoS) | ||
|
||
Finally, the DC transport is also used to detect connectivity timeouts more rapidly than webrtc's media timeout would allow, while avoiding clogging up the homeserver with keepalive traffic. This is done by each side sending a `"op": "ping"` packet every few seconds, and timing out the call if an `"op": "pong"` doesn't arrive within 5 seconds. | ||
robertlong marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
XXX: define how these DC messages muxes with other traffic, and consider what message encoding to actually use. | ||
|
||
TODO: spell out how this works with active speaker detection & associated signalling | ||
|
||
## Encryption | ||
|
||
We get E2EE for 1:1 and full mesh calls automatically in this model. | ||
|
||
|
||
However, when SFUs are on the media path, the SFU will necessarily terminate the SRTP traffic from the peer, breaking E2EE. To address this, we apply an additional end-to-end layer of encryption to the media using [WebRTC Encoded Transform](https://github.com/w3c/webrtc-encoded-transform/blob/main/explainer.md) (formerly Insertable Streams) via [SFrame](https://datatracker.ietf.org/doc/draft-omara-sframe/). | ||
|
||
In order to provide PFS, The symmetric key used for these stream from a given participating device is a megolm key. Unlike a normal megolm key, this is shared via `m.room_key` over Olm to the devices participating in the conference including an `m.call_id` and `m.room_id` field on the key to correlate it to the conference traffic, rather than using the `session_id` event field to correlate (given the encrypted traffic is SRTP rather than events, and we don't want to have to send fake events from all senders every time the megolm session is replaced). | ||
|
||
The megolm key is ratcheted forward for every SFrame, and shared with new participants at the current index via `m.room_key` over Olm as per above. When participants leave, a new megolm session is created and shared with all participants over Olm. The new session is only used once all participants have received it. | ||
|
||
## Potential issues | ||
|
||
To-device messages are point-to-point between servers, whereas today's `m.call.*` messages can transitively traverse servers via the room DAG, thus working around federation problems. In practice if you are relying on that behaviour, you're already in a bad place. | ||
|
||
The SFUs participating in a conference end up in a full mesh. Rather than inventing our own spanning-tree system for SFUs however, we should fix it for Matrix as a whole (as is happening in the LB work) and use a Pinecone tree or similar to decide what better-than-full-mesh topology to use. In practice, full mesh cascade between SFUs is probably not that bad (especially if SFUs only request the streams over the trunk their clients care about) - and on aggregate will be less obnoxious than all the clients hitting a single SFU. | ||
|
||
SFrame mandates its own ratchet currently which is almost the same as megolm but not quite. Switching it out for megolm seems reasonable right now (at least until MLS comes along) | ||
|
||
## Alternatives | ||
|
||
There are many many different ways to do this. The main other alternative considered was not to use state events to track membership, but instead gossip it via either to-device or DC messages between participants. This fell apart however due to trust: you effectively end up reinventing large parts of Matrix layered on top of to-device or DC. So you might as well publish and distribute the participation data in the DAG rather than reinvent the wheel. | ||
|
||
Another option is to treat 1:1 (and full mesh) entirely differently to SFU based calling rather than trying to unify them. Also, it's debatable whether supporting full mesh is useful at all. In the end, it feels like unifying 1:1 and SFU calling is for the best though, as it then gives you the ability to trivially upgrade 1:1 calls to group calls and vice versa, and avoids maintaining two separate hunks of spec. It also forces 1:1 calls to take multi-stream calls seriously, which is useful for more exotic capture devices (stereo cameras; 3D cameras; surround sound; audio fields etc). | ||
|
||
An alternative to to-device messages is to use DMs. You still risk gappy sync problems though due to lots of traffic, as well as the hassle of creating DMs and requiring canonical DMs to set up the calls. It does make debugging easier though, rather than having to track encrypted ephemeral to-device msgs. | ||
|
||
## Security considerations | ||
|
||
Malicious users could try to DoS SFUs by specifying them as their foci. | ||
SimonBrandner marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
SFrame E2EE may go horribly wrong if we can't send the new megolm session fast enough to all the participants when a participant leave (and meanwhile if we keep using the old session, we're technically leaking call media to the parted participant until we manage to rotate). | ||
|
||
Need to ensure there's no scope for media forwarding loops through SFUs. | ||
|
||
Malicious users in a room could try to sabotage a conference by overwriting the `m.call` state event of the current ongoing call. | ||
|
||
Too many foci will chew bandwidth due to fullmesh between them. In the worst case, if every use is on their own HS and picks a different foci, it degenerates to a fullmesh call (just serverside rather than clientside). Hopefully this shouldn't happen as you will converge on using a single SFU with the most clients, but need to check how this works in practice. | ||
|
||
## Unstable prefix | ||
|
||
... |
Uh oh!
There was an error while loading. Please reload this page.