Skip to content

Commit ba00632

Browse files
uhoregturt2liveanoadragon453
committed
MSC2191: Markup for mathematical messages (#2191)
* add proposal for using LaTeX for maths display * rename to match MSC number * change title * update based on feedback * up to clients how to deal with potentially-dangerous commands * fix typo Co-authored-by: Travis Ralston <[email protected]> * small typo fix --------- Co-authored-by: Travis Ralston <[email protected]> Co-authored-by: Andrew Morgan <[email protected]>
1 parent 03cc208 commit ba00632

File tree

1 file changed

+138
-0
lines changed

1 file changed

+138
-0
lines changed

proposals/2191-maths.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# MSC2191: Markup for mathematical messages
2+
3+
Some people write using an odd language that has strange symbols. No, I'm not
4+
talking about computer programmers; I'm talking about mathematicians. In order
5+
to aid these people in communicating, Matrix should define a standard way of
6+
including mathematical notation in messages.
7+
8+
This proposal presents a format using LaTeX, in contrast with a [previous
9+
proposal](https://github.com/matrix-org/matrix-doc/pull/1722/) that used
10+
MathML.
11+
12+
See also:
13+
14+
- https://github.com/vector-im/riot-web/issues/1945
15+
16+
17+
## Proposal
18+
19+
A new attribute `data-mx-maths` will be added for use in `<span>` or `<div>`
20+
elements. Its value will be mathematical notation in LaTeX format. `<span>`
21+
is used for inline math, and `<div>` for display math. The contents of the
22+
`<span>` or `<div>` will be a fallback representation or the desired notation
23+
for clients that do not support mathematical display, or that are unable to
24+
render the entire `data-mx-maths` attribute. The fallback representation is
25+
left up to the sending client and could be, for example, an image, or an HTML
26+
approximation, or the raw LaTeX source. When using an image as a fallback, the
27+
sending client should be aware of issues that may arise from the receiving
28+
client using a different background colour.
29+
30+
Example (with line breaks and indentation added to `formatted_body` for clarity):
31+
32+
```json
33+
{
34+
"content": {
35+
"body": "This is an equation: sin(x)=a/b",
36+
"format": "org.matrix.custom.html",
37+
"formatted_body": "This is an equation:
38+
<span data-mx-maths=\"\\sin(x)=\\frac{a}{b}\">
39+
sin(<i>x</i>)=<sup><i>a</i></sup>/<sub><i>b</i></sub>
40+
</span>",
41+
"msgtype": "m.text"
42+
},
43+
"event_id": "$eventid:example.com",
44+
"origin_server_ts": 1234567890,
45+
"sender": "@alice:example.com",
46+
"type": "m.room.message",
47+
"room_id": "!soomeroom:example.com"
48+
}
49+
```
50+
51+
52+
## Other solutions
53+
54+
[MSC1722](https://github.com/matrix-org/matrix-doc/pull/1722/) proposes using
55+
MathML as the format of transporting mathematical notation. It also summarizes
56+
some other solutions in its "Other Solutions" section.
57+
58+
In comparison with MathML, LaTeX has several advantages and disadvantages.
59+
60+
The first advantage, which is quite obvious, is that LaTeX is much less verbose
61+
and more readable than MathML. In many cases, the LaTeX code is a suitable
62+
fallback for the rendered notation.
63+
64+
LaTeX is a suitable input method for many people, and so converting from a
65+
user's input to the message format would be a no-op.
66+
67+
However, balanced against these advantages, LaTeX has several disadvantages as
68+
a message format. Some of these are covered in the "Potential issues" and
69+
"Security considerations".
70+
71+
72+
## Potential issues
73+
74+
### "LaTeX" as a format is poorly defined
75+
76+
There are several extensions to LaTeX that are commonly used, such as
77+
AMS-LaTeX. It is unclear which extensions should be supported, and which
78+
should not be supported. Different LaTeX-rendering libraries support different
79+
sets of commands.
80+
81+
This proposal suggests that the receiving client should render the LaTeX
82+
version if possible, but if it contains unsupported commands, then it should
83+
display the fallback. Thus, it is up to the receiving client to decide what
84+
commands it will support, rather than dictating what commands must be
85+
supported. This comes at a cost of possible inconsistency between clients, but
86+
is somewhat mitigated by the use of a fallback. Clients should, however, aim
87+
to support, at minimum, the basic LaTeX2e maths commands and the TeX maths
88+
commands, with the possible exception of commands that could be security risks
89+
(see below).
90+
91+
To improve compatibility, the sender's client may warn the sender if they are
92+
using a command that comes from another package, such as AMS-LaTeX.
93+
94+
### Lack of libraries for displaying mathematics
95+
96+
see the corresponding section in [MSC1722](https://github.com/matrix-org/matrix-spec-proposals/pull/1722/files#diff-4a271297299040dbfa622bfc6d2aab02f9bc82be0b28b2a92ce30b14c5621f94R148-R164)
97+
98+
99+
## Security considerations
100+
101+
LaTeX is a [Turing complete programming
102+
language](https://web.archive.org/web/20160110102145/http://en.literateprograms.org/Turing_machine_simulator_%28LaTeX%29);
103+
it is possible to write a LaTeX document that contains an infinite loop, or
104+
that will require large amounts of memory. While it may be fun to write a
105+
[LaTeX file that can control a Mars
106+
Rover](https://wiki.haskell.org/wikiupload/8/85/TMR-Issue13.pdf#chapter.2), it
107+
is not desireable for a mathematical formula embedded in a Matrix message to
108+
control a Mars Rover. Clients should take precautions when rendering LaTeX.
109+
Clients that use a rendering library should only use one that can process the
110+
LaTeX safely.
111+
112+
Clients should not render mathematics by calling the `latex` executable without
113+
proper sandboxing, as the `latex` executable was not written to handle
114+
untrusted input. (see, for example, <https://hovav.net/ucsd/dist/texhack.pdf>,
115+
<https://0day.work/hacking-with-latex/>, and
116+
<https://hovav.net/ucsd/dist/tex-login.pdf>.) Some LaTeX rendering libraries
117+
are better suited for processing untrusted input.
118+
119+
Certain commands, such as [those that can create
120+
macros](https://katex.org/docs/supported#macros), are potentially dangerous;
121+
clients should either decline to process those commands, or should take care to
122+
ensure that they are handled in safe ways (such as by limiting recursion). In
123+
general, LaTeX commands should be filtered by allowing known-good commands
124+
rather than forbidding known-bad commands. Some LaTeX libraries may have
125+
options for doing this.
126+
127+
In general, LaTeX places a heavy burden on client authors to ensure that it is
128+
processed safely. Some LaTeX rendering libraries provide security advice, for
129+
example, <https://github.com/KaTeX/KaTeX/blob/main/docs/security.md>.
130+
131+
132+
## Conclusion
133+
134+
Math(s) is hard, but LaTeX makes it easier to write mathematical notation.
135+
However, using LaTeX as a format for including mathematics in Matrix messages
136+
has some serious downsides. Nevertheless, if clients handle the LaTeX
137+
carefully, or rely on the fallback representation, the concerns can be
138+
addressed.

0 commit comments

Comments
 (0)