Skip to content

Conversation

Gnuxie
Copy link
Contributor

@Gnuxie Gnuxie commented Sep 8, 2025

Rendered

Signed-off-by: Gnuxie [email protected]

@Gnuxie Gnuxie changed the title MSC0000: Server key identity and room membership MSC4345: Server key identity and room membership Sep 8, 2025
@tulir tulir added requires-room-version An idea which will require a bump in room version proposal A matrix spec change proposal room-spec Something to do with the room version specifications unassigned-room-version Remove this label when things get versioned. kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Sep 8, 2025
Copy link
Member

@tulir tulir Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Server (preferably multiple)
  • Client (preferably multiple)
  • Complement tests

This MSC is inspired the work of @kegsay in
[MSC4243](https://github.com/matrix-org/matrix-spec-proposals/pull/4243).

## Proposal
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should explain that we allow anyone with invite to set participation to permitted. But only people with ban can modify the participation to denied once a server has set their own participation from permitted to accepted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this so that the proposal works well for private rooms and public rooms.

Comment on lines 132 to 144
2. If `partcipation` is `denied`:
1. If the `sender`'s power level is greater than or equal to the _ban level_,
is greater than or equal to the target server's ambient power level, allow.
2. Otherwise, reject.
3. If `participation` is `permitted`:
1. If the _target server_'s current participation state is `accepted`, reject.
4. If the _target server_'s current participation state is `denied`:
1. If the origin of the current participation state is the target key, reject[^revocation].
2. If the `sender`'s power level is less than the _ban
level_ or is less than the target server's ambient power
level, reject.
5. if the `sender`'s power level is greater than or equal to
the _invite level_, allow. 3. Otherwise, reject.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the invite and ban permission could be inappropriate since this is equivalent to managing server ACL's in Matrix classic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invite is probably ok to reuse since those are equivalent... ban isn't though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about this actually.. it might not matter. ACL was always more risky because it can have irreversible efects

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

denied has now been changed to be equivalent to revocation. So there will be disruptive effects for any room version that doesn't incorporate MSC4348 or something like it. Ie, in a version where users are resident to a server key (rather than serverless), those users will all have to be rejoined to a room.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs reworking anyways, see #4345 (comment)


Please suggest specific algorithms to make this consistent.

## Potential issues
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server implementers should only allow server admins to revoke keys, and shouldn't let any user of theirs in a room to do it.

override this, and even if server admins set the server to deny,
the key owner can still revoke the key.

### The `/request_participation` endpoint
Copy link
Contributor Author

@Gnuxie Gnuxie Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subtext for this endpoint is that the requested server needs to create the participation event with one of its users. There may be objections to this, but in terms of client UI we're close enough with restricted join already in that the event says "bob joined via alice's server". So i'm not sure that argument alone can stand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs reworking see #4345 (comment)

reproducibility and preemptive access control for servers without the
use of a policy server.

### The `m.server.participation` state event, `state_key: ${origin_server_key}`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to steal more prose from MSC4243 to describe the exact format for the server key and then use that consistently.

Copy link
Member

@kegsay kegsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I found this proposal quite hard to parse due to the addition of unfamiliar terminology which is inadequately described. Some terminology seems to not be described at all e.g "target server's ambient power level" where said terminology appears in auth rules.
  • I found the auth rule changes hard to parse due to what I suspect is incorrect indentation which reads to me as dangling if statements.
  • I think the proposal is actually two proposals, one to handle soft-failure in a more consistent manner, and one to make it the room admins job to verify domains. I'm unsure why the author is combining them; it just increases the chance of the proposal being slowed down / rejected due to trying to do too much. One part of the proposal could have approval but because the other part doesn't, the whole thing gets blocked.
  • It's unclear what purpose advertised_domain serves, and how servers and clients are supposed to use it. I believe the intention is that participating servers verify the server key by talking to that advertised_domain, but this isn't clear from the proposal.

EDIT: this comment previously mentioned that domain-to-key mappings were controlled either by any server or only by privileged server, and thus it either was useless (any server could make the mappings) or centralised (only admins could), neither of which are desirable. It turns out that this proposal makes no domain-to-key mappings at all, and allows any server to admit server public keys if they are already joined to the room. As a result of lacking domain-to-key mappings, I'm unsure how this provides any real traceability guarantees, given this is mentioned as a key differentiator with MSC4243.

Related concerns:

`denied`. `participation` is protected from redaction.

A denied server must not be sent a `m.server.participation` event
unless the targeted server is already present within the room. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're just stating existing behaviour (we don't tell servers who aren't in the room that we're talking about them)? If so, it's really confusing because it makes it seem like this is a really important part of the property, instead of just some edge case handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that this was existing behaviour. Even if it is, it seems like it is a really important part of the property right?

@Gnuxie
Copy link
Contributor Author

Gnuxie commented Sep 9, 2025

Update: The MSC has been clarified substantially and more concepts that were described implicitly in the authorization rules have been explained explicitly. There is still work to be done to fix the auth rules, figure out what to do with the soft failure semantics and the introduced terminology for that, and probably a few other concerns.

Matrix rooms. Whereas MSC4243 only does so for individual user
accounts.

However, critically this MSC provides traceability to the origin of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually true? Nothing is verified in this proposal, so as per my sleeper server example you still have to play whack-a-mole with any malicious server. The only advantage this proposal provides is the implicit invite to public rooms, so you could feasibly do some investigation to see which server is letting in these other servers. It's probably not safe to actually take action against those servers though because that could be weaponsied (e.g I make a domain and valid server, I join via the victim server, then nuke the domain and spam freely, such that when investigators see who invited me they think the victim server is colluding when it isn't).

The exciting part of this proposal to me was the idea that this would provide better traceability to the origin of users, but sadly I just don't see how this proposal does this.

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's be specific. The sleeper server would have to have the invite permission. So the rooms where this can happen are going to be ones set to invite only AND where it is a choice to have the invite be the default power level (this is true for rooms created with the private preset I believe). If someone does this, their key can be revoked, and all the keys that they added / others added. So it is a game of wack-o-mole until you change the power level for invite when you figure out you are being attacked. Traceability exists because we can still see which servers added which keys and deny them. This is not possible at all currently for rooms that do not use join-restricted to ensure that each join event is signed by an existing participant. Whether anyone has a valid domain/invalid/attempts impersonation is not strictly relevant because the denies happen with relation to the server keys and not a domain. And a domain isn't going to be shown to anyone unless they themselves have verified ownership.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should break away from the invite power level though and require each participant to be explicitly granted this permission by a room admin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you'd hold the same standard for the invite power level in private rooms though? Since currently this same attack can happen in all matrix rooms where invite is default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(e.g I make a domain and valid server, I join via the victim server, then nuke the domain and spam freely, such that when investigators see who invited me they think the victim server is colluding when it isn't).

@kegsay I don't understand this bit. Why would investigators conclude that spam events are originating from the victim server? What evidence would they use to conclude that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never said originating, I said colluding. The victim server "let in" the spam server.


This allows both public and private rooms to benefit from DAG
reproducibility and preemptive access control for servers without the
use of a policy server.
Copy link
Member

@kegsay kegsay Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preemptive access control

I don't think this is true.

How can it be preemptive when any server can admit malicious servers without performing proper domain checks? In other words, how are the following two scenarios materially any different?

  • There's a public room with honest servers.
  • Many malicious servers join the room.

vs:

  • There's a public room with honest servers.
  • A malicious server joins the room who behaves correctly.
  • Many malicious servers join the room via the first malicious server.

It certainly allow reactive access control as you can find out who let in the bad guys and kick them out, but this is after-the-fact.

Only policy servers can provide preemptive access control because all servers in the room have to trust it to perform the correct authorisation checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be preemptive when any server can admit malicious servers without performing proper domain checks? In other words, how are the following two scenarios materially any different?

They can't, they need the invite power level under the current proposal... please stop reading my proposal in bad faith.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately meta, but: I don't think your proposal is being read in bad faith, @Gnuxie. It's best not to assume bad faith upon mere questions. These things are hard to reason about as it stands and it's quite possible that one may not be able to immediately draw all conclusions that follow from a set of premises. (I'm thinking of myself when I say this, for one.) Please just answer the questions in good faith and let's move on.

In this instance, I don't think Kegan literally meant any server, so we can modify the claim to "any server with an invite power level" without removing the concern. I believe what Kegan is trying to say is that you can still have a situation where such a server deliberately doesn't check the domain -> server key mapping and invites the server key into the room.

You can then notice this happened after the fact, but it contradicts the ability to preemptively deny access to the room for a particular domain. So in other words, this is not preemptive access control wrt to the domain. Is this wrong?

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what Kegan is trying to say is that you can still have a situation where such a server deliberately doesn't check the domain -> server key mapping and invites the server key into the room.

That is fine though. The advertised_domain that is provided is not an attestation or proof of verification that the domain has been verified to anyone. At any time. And we don't provide any mechanism for that. That's not what the proposal does or intends to do.

Separate to the idea of servers as keys, we do propose local (to each server) verification of domain ownership in order to allow clients to show domains in UI rather than server keys. None of that information is derived from the DAG other than servers locally testing the advertised_domain of accepted participations to attempt local verification of domain ownership through verifying a public key is advertised on the domain which is associated with the private key that has signed their participation event.

This process is described here https://github.com/Gnuxie/matrix-doc/blob/gnuxie/server-key-identity-and-room-membership/proposals/4345-server-key-identity-and-room-membership.md#changes-to-_matrixkeyv3query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can then notice this happened after the fact, but it contradicts the ability to preemptively deny access to the room for a particular domain. So in other words, this is not preemptive access control wrt to the domain. Is this wrong?

We don't provide it in regards to a domain, we provide it in regards to a server. Do we still allow political crisis within rooms because people with the relevant permissions who are trusted to be responsible for the room's administration can be malicious? Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't know about this. Annoyingly this does mean that room admins can't actually invite a server key, and it's not clear that they would be able to do that anyway given that they probably won't know which key a homeserver wants to use in this room. So the idea of permitting a homeserver to participate pre-emptively cannot be expressed through the m.server_participation event but through other means (like inviting a user). And this signals that when the homeserver requests participation with a key it should be permitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution of this problem needs to be worked out through exploration of MSC4348 or a compromising MSC that works closely to the current room model (if such a compromising MSC can even exist under the terms of this proposal, which might have already been written off).

Copy link
Contributor Author

@Gnuxie Gnuxie Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, i'm not sure we need to do all this. An invitation on the member level is enough. Then combined with a new power level for permit we don't actually give anyone with invite permission the ability to add an infinite number of keys. The only issue is that we probably still need a request step so that the joining server has total control over it's unverified_domain property (from the start, and not just since accept). It means we can also remove the accept step.

Copy link
Contributor Author

@Gnuxie Gnuxie Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requested state then needs to be tracable to the requesting server and the receiver of the request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'd have to modify /request_participation to include something like restricted join's join_authorised_via_users_server for the requested server's key. And we'd probably need the whole make_request, send_request handshake because for invite only rooms these will need to be authorized via an invitation.

@@ -0,0 +1,429 @@
# MSC4345: Server key identity and room membership
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fundamental fork-in-the-road choice here, when comparing this MSC to ones like MSC4243 (per-user keys). Both MSCs turn the DAG into a self-verifying data structure which is not reliant on DNS. The key difference is how they do this.

  • This MSC does this at the server level, introducing new state events to effectively represent "is this server key in the room?".
  • MSC4243 does this at the user level, which doesn't need new state events.

Neither MSC verifies the claimed domain for each server key: this is allowed to split brain which results in ugly user IDs appearing on clients (@alice:server-key for this MSC or @user-key:invalid for MSC4243) which need to be patched up when the domain is verified, so both end up looking like @alice:domain.

The fork in the road is ultimately which direction the protocol wants to go:

  • If we want to double down on "Matrix is a federation protocol, users must have some chaperone server when sending events" then this proposal fits that ideal better than MSC4243 because it bakes that "chaperone server" semantics explicitly into the protocol with their own set of keys.
  • If we want to double down on "Matrix is a peer-to-peer protocol, which happens to be currently be between servers but could be directly between users" then MSC4243 fits that ideal better because it promotes user keys to being the only key required to send valid events.

Note that this proposal can support P2P, by adding another layer of keys on top of the server keys (which is what MSC4348 does).

Federation protocols fundamentally need the server to know more information about their users, so it leaks more metadata than peer-to-peer protocols. For example, whilst this proposal can support P2P via MSC4348, it needs to know the PLs of all users on each server to know how to enforce the server participation event (the ambient power level in this MSC). Events also need to expose their sending node information in order to check the signatures on the "chaperone server/node". In contrast, P2P protocols can end up turning servers into store-and-forward nodes for encrypted data and that's it.

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to double down on "Matrix is a federation protocol, users must have some chaperone server when sending events" then this proposal fits that ideal better than MSC4243 because it bakes that "chaperone server" semantics explicitly into the protocol with their own set of keys.

This is not true. Please stop misrepresenting this proposal. #4348 shows how P2P matrix would work within this proposal. And it is a cleaner solution that does not tie accounts to any domain or server, MSC4243 does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't need to mislead anyone if i add my own commentary of the differences on MSC4243 in this manner. Because i have taken the time to understand MSC4243 through conversation with you. Please stop.

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also consider that you have had months to develop 4243 behind a placeholder in private. With support and consultation from others. I've written this proposal very quickly solo. Without praise or encouragement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still extend an invitation to you to develop this MSC with me collaboratively.

events through their users in both MSCs and we don't intend to change
that in future MSCs in this series either.

### Terminiology
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Terminiology
### Terminology


This allows both public and private rooms to benefit from DAG
reproducibility and preemptive access control for servers without the
use of a policy server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately meta, but: I don't think your proposal is being read in bad faith, @Gnuxie. It's best not to assume bad faith upon mere questions. These things are hard to reason about as it stands and it's quite possible that one may not be able to immediately draw all conclusions that follow from a set of premises. (I'm thinking of myself when I say this, for one.) Please just answer the questions in good faith and let's move on.

In this instance, I don't think Kegan literally meant any server, so we can modify the claim to "any server with an invite power level" without removing the concern. I believe what Kegan is trying to say is that you can still have a situation where such a server deliberately doesn't check the domain -> server key mapping and invites the server key into the room.

You can then notice this happened after the fact, but it contradicts the ability to preemptively deny access to the room for a particular domain. So in other words, this is not preemptive access control wrt to the domain. Is this wrong?

@Gnuxie
Copy link
Contributor Author

Gnuxie commented Sep 12, 2025

Update: auth rules have been cleaned up and references to MSC4349 to explain casual barriers have been made.

This proposal encodes a special auth rule for `denied` participation to
avoid soft failure and the problems discussed in MSC4104.

## Security considerations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please allow keys to be scoped to a number of events. E.g. 1000 before rotation is required. See also #4353

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth requiring that authorization events have a separate limit to normal events.

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth doing the hard work of specifying keys as capabilities omg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the awesome secure Matrix you can have! But you don't!

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably scope creep and should be a follow up MSC, see the-draupnir-project/planning#51

Copy link
Contributor Author

@Gnuxie Gnuxie Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a note in alternatives or issues before closing the thread.

We can't allow room admins to "unban" servers because it might allow
them to make use of stolen keys. See security considerations.
We're not confident in any other solutions to this.
context of statments is lost. Where authorization rules are
inconsistent this text takes precedence.

- Reminder: In this MSC _Server_ refers to the controller of a ed25519
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make this more consistent by using server key controller and homeserver deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal requires-room-version An idea which will require a bump in room version room-spec Something to do with the room version specifications unassigned-room-version Remove this label when things get versioned.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants