Skip to content
147 changes: 147 additions & 0 deletions proposals/4099-server-participation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# MSC4099: Participation based authorization for servers in the Matrix DAG

This is a proposal for the representation of servers and their basic responsibilities in the Matrix
DAG. This MSC does not define or ammend a state resolution algorithm, since there are serveral possible
routes that can be explored with other MSCs.

The key merits of this proposal are:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unclear on what the actual problem being solved is here. I agree that because servers can spoof user events, you could model membership of DAGs via servers rather than users. However, shouldn't we be moving towards membership of rooms being tied more tightly to actual users & devices (i.e. cryptographically constrained group membership) rather than giving up and modelling participation just by servers?

In terms of the merits here:

The ability to deny servers from adding events to the DAG.

Do we not get this via ACLs?

The ability for clients and bots to examine joining servers before accepting any PDU from them into the room.

Hm, so this would let an AS on a server sniff traffic and somehow authorise it before it reaches other users in the room on that server?

Arbitrary knock logic for servers.

I'm failing to follow how this works. Is the idea that joins can be delegated to a client on a target server so the client can decide whether the joiner server is allowed to join or not?

So is the idea here: "Provide a way for clients to subscribe to all events attempting to federate with the server, and authorise them before they enter the room DAG?". If so, I wonder if a less invasive mechanism could be used - effectively a standardised API to inspect events before federation rules kick in, rather than changing the entire concept of membership?

(This also reminds me a bit of pseudousers in #1777 and whatever travis is up to in #4049, in terms of letting servers act as a first-class citizen rather than requiring traffic to be linked to users. Historically this has not gone anywhere, though: the accountability of actually linking traffic to users rather than servers seems desirable).

Copy link
Contributor Author

@Gnuxie Gnuxie Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, shouldn't we be moving towards membership of rooms being tied more tightly to actual users & devices (i.e. cryptographically constrained group membership) rather than giving up and modelling participation just by servers?

I do not think this MSC is not incompatible with cryptographically constraining user membership. Since this MSC can still be used in a DAG that has user membership in addition to server participation. However, there is another conversation to be had over whether Matrix should be going in this direction. However, it is not relevant to this MSC, that discussion is more relevant to #3953. So as a digression, while I'm not prepared to argue in the opposite direction right now, I would at least like to explore the alternative, even if it is opposed to the goal of eliminating meta data etc. Since for large public Matrix rooms it's not exactly clear to me yet how device centric membership, and pseudo identity etc is going to protect against data farming/mining by joining lots of public rooms etc. At least not in isolation and when most people are going to have the same profile information in all public rooms. Since for this to be effective the way people are using Matrix would also have to change. From what I have seen so far, with these proposals you are designing a protocol that suits a use case that is entirely different to the way Matrix is being used by the community today. Again, I see this as a digression. And I concede that I could be grossly misinformed about this, and I'm yet to develop my thoughts more concretely here. This is not the purpose of the MSC.

Do we not get this via ACLs?

No, and you know this as well as I do. Server ACL has been accidentally bypassed by server implementations not implementing it properly before, and with wider adoption of Matrix this will happen again. Additionally, there is no protection against a server deliberately leaking events between normal and malicious servers (in both directions), and no tooling to detect these leaks. There is also a limit to the number of servers that can be added to server ACL (around 512?), it really needs replacing. It requires all participating servers to be using the room in good faith, that is not going to be a reality forever.

Hm, so this would let an AS on a server sniff traffic and somehow authorise it before it reaches other users in the room on that server?

No, this was specific to sniffing the knock EDU during the revised server join handshake.

I'm failing to follow how this works. Is the idea that joins can be delegated to a client on a target server so the client can decide whether the joiner server is allowed to join or not?

Yes, the knock EDU is used as a way to get the clients to see that a server is wanting to join the room, granted there could be another better way to do this.

So is the idea here: "Provide a way for clients to subscribe to all events attempting to federate with the server, and authorise them before they enter the room DAG?"

No, again, the proposal is to make servers send an EDU to the room when they want to join, so that clients (and therefore admins, or at least their tooling) can vet the servers before allowing them to join and be authorized to send any events. Once they have been vetted, all authorization is done via auth rules. The idea is that there isn't a way in the auth rules for a joining server to create a valid PDU until a special event m.server.participation exists that references their server name. And this event has to be created by a room admin once they know the joining server exists. They can also pre-empt the existence of joining servers and setup m.server.participation events for them in advance. I guess now that I have explained this as such, it does look a lot like an allow list, but with a mechanism for it to be automatically updated quickly enough not to cause too much disruption.

Copy link
Contributor Author

@Gnuxie Gnuxie Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the introduction to the proposal since I think you read that and it mislead you. It was probably too abstract, my bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Those proposals are interesting though, I'll give them a look)

Copy link
Contributor Author

@Gnuxie Gnuxie Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not get this via ACLs?

The specification does not recommend that servers must (or even should) soft fail events from servers matching m.room.server_acl's deny. So you can't even rely on leaks to be mitigated that way. The specification likely can't make that recommendation either, because you'd have to think about what that would imply for a new server joining the room that goes on to replicate the room's history afresh.

- The ability to deny servers from adding events to the DAG.
- The ability for clients and bots to examine joining servers before accepting any PDU from them into the room.
- Arbritrary knock logic for servers.

Additional merits that can be explored as an indirect result of this proposal:
- A way for servers to preemptively load and cache rooms that their users are likely to join.
- A way for servers to advertise to other servers about rooms that their users are likely to join,
so that these rooms can be optionally preloaded and cached.

This is a more specific component and redesign of the general idea of [MSC3953: Server capability DAG](https://github.com/Gnuxie/matrix-doc/blob/gnuxie/capability-dag/proposals/3953-capability-dag.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this replace #3953?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, and I don't know yet. Is that alright?


## Context

### The role of the existing `m.room.member` for a user

In order to develop ideas about how to represent server membership in the room DAG,
which is NOT what this proposal does, we need to understand the responsibilities that `m.room.member`
already has:

- A representation of the desire for the user's server to be informed of new events.
- The capability for the server to participate in the room on behalf of the user,
being used to authorize the user's events.
- The capability for the user to backfill in relation to visibility.
+ it is unclear to me whether `m.room.history_visibility` restricts a server's ability to backfill or not.
- A reperesentation of the user's profile and participation information, who they are, why they are in the room, avatar, and displayname.

## Proposal

### Considerations for ammending the make_join handshake

When a joining server is instructed to join a room, the joining server sends an EDU `m.server.knock`
to any available resident servers that the joining is aware of.

The server then waits until it receives an `m.server.participation` event with the state_key
containing the joining server's name from any resident server that is participating in the room.

When `m.server.participation`'s `participation` field has the value `permitted`, then
the joining server can use `make_join` and `send_join`. However, `send_join` could be ammended
in another MSC so that a server is able to produce an `m.server.subscription` configuration event,
rather than an `m.room.member` event for a specific user. This is so that a server can begin the
process of joining the room in advance of a user accepting or joining the room via a client,
in order to improve the response time.

### The: `m.server.knock` EDU

`m.server.knock` is an EDU to make a client in a resident server aware of the joining server's intent to join
the room. A client can then arbritrarily research the reputation of the joining server before deciding
whether resident servers of the room should accept any PDU whatsoever from the joining server.
Currently in room V11 and below, it is not possible for room operators to stop a new server from
sending multiple PDUs to a room without first knowing of, and anticipating a malicious server's existence.
This is a fact which has already presented major problems in Matrix's history.

This propsal does not just aim to remove the risk of spam joins for members from the same server,
but also spam joins from many servers at the same time. While it is seen as technically difficult
to acquire user accounts from a large number of Matrix homeservers, it is still possible and
has happened before. For example, servers can be compromised via a common exploit in server
imlementations or existing servers that have weak registration requirements can be exploited,
and this has happened already in Matrix's history.

Having an EDU allows us to accept a knock arbritrarily with clients, and more accurately automated bots
like Draupnir. We can then arbitrarily research the reputation of the server before deciding
to accept. This also conveniently keeps auth_rules around retricted join rules clean and simple,
because all logic can be deferred to clients.

The `m.server.knock` EDU can be treated as idempotent by the receiver, although the effect should probably
expire after some subjective (to the receiver) duration.

```
{
"content": {
"room_id": "!example:example.com",
},
"edu_type": "m.server.knock"
}
```

### The `m.server.participation` event, `state_key: ${serverName}`

This is a capbility that allows the state_key'd server to send `m.server.subscription`, it is sent
to accept the `m.server.knock` EDU. The event can also be used to make a server aware of a room's
existance, so that it can be optionally preload and cache a room before the server's users discover it.

`participation` can be one of `permitted` or `deny`.
When `participation` is `permitted`, the server is able to join the room.
When `participation` is `denied`, then the server is not allowed to send any PDU's into the room.
The denied server must not be sent the denied event unless it is already present within the room,
or it has attempted to knock. This is to prevent malicious servers being made aware of rooms
that they have not yet discovered.

A `reason` field can be present alongside `participation` in order to explain the reason why
a server has been `denied`. This reason is to be shown to the knocking or previously present
server, so that they can understand what has happened.

### The `m.server.subscription` event, `state_key: ${serverName}`

This is a configuration event that uses the `m.server.participation` capability to manage
the server's subscription to the event stream. This is NOT an authorization event.

This is distinct from `m.server.participation` because this event is exclusively controlled
by the participating server, and other server's cannot modify this event[^spec-discussion].
This allows the server to have exclusive control over whether it is to be sent events (where
its participation is still `permitted`). We specifically do not want to merge this with
`participation` to avoid having to specialise state resolution for write conflicts,
or "force joining" servers back into rooms. This allows a server to remain permitted to participate,
but opt out of receiving further events from this room, and can then optionally stop replicating the
room and delete all persistent data relating to it (should all clients have also forgotten the room).

### Considerations for event authorization

All events that a server can send need to be authorized by an `m.server.participation` event
with the field `participation` with a value of `permitted`.

## Potential issues

### Permitting, then denying a malicious server.

The property that a malicious server can never send a PDU into the room can be worked around if
the server manages to have their `participation` `permitted`. Since now they can create PDU's
that reference this stale state, and all the other participating servers have no option but to
soft fail these events (ignoring that we don't block them at the network level).
While this is still a huge improvement over the exisitng situation but we need suggesstions for how
to stop this at the event authoirzation level. I'm begging for advice.

### Unclear if a joining server can recieve a PDU from a room that it is not joined to

The amendments to the join handshake described in this MSC mean that a server has to wait
for a PDU, `m.server.participation` before it has attmpted to join the room beyond sending an EDU.
It's not clear to me whether this is currently possible or changes are required to federation send.

## Alternatives

## Security considerations

## Unstable prefix

## Dependencies

None.

[^spec-discussion]: This was derived from the following spec discussion: https://matrix.to/#/!NasysSDfxKxZBzJJoE:matrix.org/$0pv9JVVKzuRE6mVBUGQMq44vNTZ1-l19yFcKgqt8Zl8?via=matrix.org&via=envs.net&via=element.io