|
| 1 | +# MSC2758: Common grammar for textual identifiers |
| 2 | + |
| 3 | +The matrix specification uses textual identifiers for a wide range of |
| 4 | +concepts. Examples include "event types" and "room versions". |
| 5 | + |
| 6 | +In the past, these identifiers have often lacked a formal grammar, leaving |
| 7 | +servers and clients to make assumptions about questions such as which |
| 8 | +characters are permitted, minimum and maximum lengths, etc. |
| 9 | + |
| 10 | +This proposal suggests a common grammar which can be used as a basis for |
| 11 | +*future* identifier types, to reduce the work involved in future specification |
| 12 | +work. |
| 13 | + |
| 14 | +No attempt is made here to bring existing identifiers into line; however |
| 15 | +examples of identifiers which might have benefitted from such a grammar in the |
| 16 | +past include: |
| 17 | + |
| 18 | + * [`capabilities`](https://matrix.org/docs/spec/client_server/r0.6.0#get-matrix-client-r0-capabilities) |
| 19 | + identifiers. |
| 20 | + * authentication types for the [User-Interactive Authentication mechanism](https://matrix.org/docs/spec/client_server/r0.6.0#user-interactive-authentication-api). |
| 21 | + * login types for [`/_matrix/client/r0/login`](https://matrix.org/docs/spec/client_server/r0.6.0#post-matrix-client-r0-login). |
| 22 | + * event types |
| 23 | + * [`m.room.message` `msgtypes`](https://matrix.org/docs/spec/client_server/r0.6.0#m-room-message-msgtypes) |
| 24 | + * `app_id` for [`POST /_matrix/client/r0/pushers/set`](https://matrix.org/docs/spec/client_server/r0.6.0#post-matrix-client-r0-pushers-set). |
| 25 | + * `rule_ids`, `actions` and `tweaks` for [push rules](https://matrix.org/docs/spec/client_server/r0.6.0#push-rules). |
| 26 | + * [E2E messaging algorithm names](https://matrix.org/docs/spec/client_server/r0.6.0#messaging-algorithm-names). |
| 27 | + |
| 28 | +## Proposal |
| 29 | + |
| 30 | +We define a "common namespaced identifier grammar". This can then be referenced |
| 31 | +by other parts of the grammar, in much the same way as [Unpadded |
| 32 | +Base64](https://matrix.org/docs/spec/appendices#unpadded-base64) is defined |
| 33 | +today. |
| 34 | + |
| 35 | +The grammar is defined as follows: |
| 36 | + |
| 37 | + * An identifier may not be less than one character or more than 255 characters |
| 38 | + in length. |
| 39 | + * Identifiers must start with one of the characters `[a-z]`, and be entirely |
| 40 | + composed of the characters `[a-z]`, `[0-9]`, `-`, `_` and `.`. |
| 41 | + * Identifiers starting with the characters `m.` are reserved for use by the |
| 42 | + formal matrix specification. |
| 43 | + * Implementations wishing to implement unspecified identifiers should follow |
| 44 | + the Java Package Naming convention of starting with a reversed domain |
| 45 | + name (with a dot after the domain name part). For example, for the |
| 46 | + organisation `example.com`, a valid identifier would be |
| 47 | + `com.example.identifier`. |
| 48 | + |
| 49 | +This grammar is intended for use entirely by internal identifiers, and *not* |
| 50 | +for user-visible strings. |
| 51 | + |
| 52 | +### Rationale |
| 53 | + |
| 54 | + * Avoiding non-ascii characters sidesteps any issues with homoglyphs or |
| 55 | + altenative encodings of the same characters. |
| 56 | + * Avoiding upper-case character sidesteps any concerns over case-sensitivity. |
0 commit comments