|
| 1 | +# MSC2191: Markup for mathematical messages |
| 2 | + |
| 3 | +Some people write using an odd language that has strange symbols. No, I'm not |
| 4 | +talking about computer programmers; I'm talking about mathematicians. In order |
| 5 | +to aid these people in communicating, Matrix should define a standard way of |
| 6 | +including mathematical notation in messages. |
| 7 | + |
| 8 | +This proposal presents a format using LaTeX, in contrast with a [previous |
| 9 | +proposal](https://github.com/matrix-org/matrix-doc/pull/1722/) that used |
| 10 | +MathML. |
| 11 | + |
| 12 | +See also: |
| 13 | + |
| 14 | +- https://github.com/vector-im/riot-web/issues/1945 |
| 15 | + |
| 16 | + |
| 17 | +## Proposal |
| 18 | + |
| 19 | +A new attribute `data-mx-maths` will be added for use in `<span>` or `<div>` |
| 20 | +elements. Its value will be mathematical notation in LaTeX format. `<span>` |
| 21 | +is used for inline math, and `<div>` for display math. The contents of the |
| 22 | +`<span>` or `<div>` will be a fallback representation or the desired notation |
| 23 | +for clients that do not support mathematical display, or that are unable to |
| 24 | +render the entire `data-mx-maths` attribute. The fallback representation is |
| 25 | +left up to the sending client and could be, for example, an image, or an HTML |
| 26 | +approximation, or the raw LaTeX source. When using an image as a fallback, the |
| 27 | +sending client should be aware of issues that may arise from the receiving |
| 28 | +client using a different background colour. |
| 29 | + |
| 30 | +Example (with line breaks and indentation added to `formatted_body` for clarity): |
| 31 | + |
| 32 | +```json |
| 33 | +{ |
| 34 | + "content": { |
| 35 | + "body": "This is an equation: sin(x)=a/b", |
| 36 | + "format": "org.matrix.custom.html", |
| 37 | + "formatted_body": "This is an equation: |
| 38 | + <span data-mx-maths=\"\\sin(x)=\\frac{a}{b}\"> |
| 39 | + sin(<i>x</i>)=<sup><i>a</i></sup>/<sub><i>b</i></sub> |
| 40 | + </span>", |
| 41 | + "msgtype": "m.text" |
| 42 | + }, |
| 43 | + "event_id": "$eventid:example.com", |
| 44 | + "origin_server_ts": 1234567890, |
| 45 | + "sender": "@alice:example.com", |
| 46 | + "type": "m.room.message", |
| 47 | + "room_id": "!soomeroom:example.com" |
| 48 | +} |
| 49 | +``` |
| 50 | + |
| 51 | + |
| 52 | +## Other solutions |
| 53 | + |
| 54 | +[MSC1722](https://github.com/matrix-org/matrix-doc/pull/1722/) proposes using |
| 55 | +MathML as the format of transporting mathematical notation. It also summarizes |
| 56 | +some other solutions in its "Other Solutions" section. |
| 57 | + |
| 58 | +In comparison with MathML, LaTeX has several advantages and disadvantages. |
| 59 | + |
| 60 | +The first advantage, which is quite obvious, is that LaTeX is much less verbose |
| 61 | +and more readable than MathML. In many cases, the LaTeX code is a suitable |
| 62 | +fallback for the rendered notation. |
| 63 | + |
| 64 | +LaTeX is a suitable input method for many people, and so converting from a |
| 65 | +user's input to the message format would be a no-op. |
| 66 | + |
| 67 | +However, balanced against these advantages, LaTeX has several disadvantages as |
| 68 | +a message format. Some of these are covered in the "Potential issues" and |
| 69 | +"Security considerations". |
| 70 | + |
| 71 | + |
| 72 | +## Potential issues |
| 73 | + |
| 74 | +### "LaTeX" as a format is poorly defined |
| 75 | + |
| 76 | +There are several extensions to LaTeX that are commonly used, such as |
| 77 | +AMS-LaTeX. It is unclear which extensions should be supported, and which |
| 78 | +should not be supported. Different LaTeX-rendering libraries support different |
| 79 | +sets of commands. |
| 80 | + |
| 81 | +This proposal suggests that the receiving client should render the LaTeX |
| 82 | +version if possible, but if it contains unsupported commands, then it should |
| 83 | +display the fallback. Thus, it is up to the receiving client to decide what |
| 84 | +commands it will support, rather than dictating what commands must be |
| 85 | +supported. This comes at a cost of possible inconsistency between clients, but |
| 86 | +is somewhat mitigated by the use of a fallback. Clients should, however, aim |
| 87 | +to support, at minimum, the basic LaTeX2e maths commands and the TeX maths |
| 88 | +commands, with the possible exception of commands that could be security risks |
| 89 | +(see below). |
| 90 | + |
| 91 | +To improve compatibility, the sender's client may warn the sender if they are |
| 92 | +using a command that comes from another package, such as AMS-LaTeX. |
| 93 | + |
| 94 | +### Lack of libraries for displaying mathematics |
| 95 | + |
| 96 | +see the corresponding section in [MSC1722](https://github.com/matrix-org/matrix-spec-proposals/pull/1722/files#diff-4a271297299040dbfa622bfc6d2aab02f9bc82be0b28b2a92ce30b14c5621f94R148-R164) |
| 97 | + |
| 98 | + |
| 99 | +## Security considerations |
| 100 | + |
| 101 | +LaTeX is a [Turing complete programming |
| 102 | +language](https://web.archive.org/web/20160110102145/http://en.literateprograms.org/Turing_machine_simulator_%28LaTeX%29); |
| 103 | +it is possible to write a LaTeX document that contains an infinite loop, or |
| 104 | +that will require large amounts of memory. While it may be fun to write a |
| 105 | +[LaTeX file that can control a Mars |
| 106 | +Rover](https://wiki.haskell.org/wikiupload/8/85/TMR-Issue13.pdf#chapter.2), it |
| 107 | +is not desireable for a mathematical formula embedded in a Matrix message to |
| 108 | +control a Mars Rover. Clients should take precautions when rendering LaTeX. |
| 109 | +Clients that use a rendering library should only use one that can process the |
| 110 | +LaTeX safely. |
| 111 | + |
| 112 | +Clients should not render mathematics by calling the `latex` executable without |
| 113 | +proper sandboxing, as the `latex` executable was not written to handle |
| 114 | +untrusted input. (see, for example, <https://hovav.net/ucsd/dist/texhack.pdf>, |
| 115 | +<https://0day.work/hacking-with-latex/>, and |
| 116 | +<https://hovav.net/ucsd/dist/tex-login.pdf>.) Some LaTeX rendering libraries |
| 117 | +are better suited for processing untrusted input. |
| 118 | + |
| 119 | +Certain commands, such as [those that can create |
| 120 | +macros](https://katex.org/docs/supported#macros), are potentially dangerous; |
| 121 | +clients should either decline to process those commands, or should take care to |
| 122 | +ensure that they are handled in safe ways (such as by limiting recursion). In |
| 123 | +general, LaTeX commands should be filtered by allowing known-good commands |
| 124 | +rather than forbidding known-bad commands. Some LaTeX libraries may have |
| 125 | +options for doing this. |
| 126 | + |
| 127 | +In general, LaTeX places a heavy burden on client authors to ensure that it is |
| 128 | +processed safely. Some LaTeX rendering libraries provide security advice, for |
| 129 | +example, <https://github.com/KaTeX/KaTeX/blob/main/docs/security.md>. |
| 130 | + |
| 131 | + |
| 132 | +## Conclusion |
| 133 | + |
| 134 | +Math(s) is hard, but LaTeX makes it easier to write mathematical notation. |
| 135 | +However, using LaTeX as a format for including mathematics in Matrix messages |
| 136 | +has some serious downsides. Nevertheless, if clients handle the LaTeX |
| 137 | +carefully, or rely on the fallback representation, the concerns can be |
| 138 | +addressed. |
0 commit comments