Skip to content
25 changes: 21 additions & 4 deletions text/0000-compat-math-identifiers.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,14 @@ The Unicode sets [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/l

The characters 1) - 4) are added to the set of characters allowed in Rust identifiers.
[UAX31](https://www.unicode.org/reports/tr31/#Standard_Profiles) notes that "supporting these characters is recommended for some computer languages because they can be beneficial in some applications".
These characters will not have syntactic use and are only added to the set of characters allowed in identifiers following the recommendations of [UAX31-R3b](https://www.unicode.org/reports/tr31/#R3b).

In other words this RFC proposes to adopt the "Mathematical Compatibility Notation Profile" which in accordance with [UAX31-R3b](https://www.unicode.org/reports/tr31/#R3b) allows these characters in identifiers and in turn prevents syntactic use.
For example `let a = 2.0; let b = a²;` will naturally give a compiler error that `a²` is an unknown identifier and not be interpreted as `let b = a * a;`.
Similarly `let a = [2, 0]; let b = a₁;` will naturally give a compiler error that `a₁` is an unknown identifier and not be interpreted as `let b = a[0];`.
`∞` will just be a character usable in identifiers and not be a synonym to the likes of `f32::INFINITY`.

The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC warning when used:
The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC or `uncommon_codepoints` warning when used depending on their Unicode classification.
For example using `𝛁` in an identifier will trigger:
```
warning: identifier contains a non normalized (NFKC) character: '𝛁'
```
Expand All @@ -98,6 +101,7 @@ In particular the characters in 5) are identified as "Not_NFKC", i.e. characters

Note that Unicode specifically [mentions Rust as a positive industry example](https://www.unicode.org/reports/tr55/#General-Security-Profile) that follows the recommendations from the General Security Profile.


# Drawbacks
[drawbacks]: #drawbacks

Expand All @@ -111,15 +115,26 @@ Note that Unicode specifically [mentions Rust as a positive industry example](ht

* Some people might find it difficult to read superscript and subscript letters on lower resolution screens or when using small font sizes.

* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing.

# Rationale and alternatives
Copy link
Member

@Noratrieb Noratrieb Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust currently just follows Unicode's recommendation on what should be allowed as a programming language identifier: https://rust-lang.github.io/rfcs/2457-non-ascii-idents.html (Annex 31).

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

It would be very good to have a description here of why Annex 31 does not contain these symbols, if such discussion can be found anywhere, to ensure that we are not missing something important and are sure about our choice to deviate from the recommendation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very good to have a description here of why Annex 31 does not contain these symbols

UAX 31 does contain these symbols, that's what this profile comes from: https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile

For the question "why are they not in the default profile", the answer is basically to leave room for languages that want to do custom operators, or use these as builtin operators.

It's also just caution in expanding the set to include new meanings: while the XID set expands with each Unicode release as new characters get added, it would not be good for new types of characters to get included: if a programming language cared only about linguistic content in identifiers; it would perhaps be surprised if mathematical subscripts entered the fray. This separate profile allows for explicit choice.

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

The mathematical profile is included in UAX 31, the identifiers standard: that is the Unicode consortium making a Unicode decision that these are acceptable in identifiers. It's a choice from a menu that programming languages may choose from. Rust is currently following Unicode's recommendation, but this RFC would have Rust continuing to follow Unicode's recommendation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed formulation related to UAX31 a bit.

[rationale-and-alternatives]: #rationale-and-alternatives

If this RFC is not implemented then everyone has to keep using ASCII characters for identifier in scientific code, for example `gradient_energy` or `a_12`.

The impact of not implementing it should be fairly small, but implementing it could invite more scientific oriented people to the Rust language and make it easier for them to implement complex concepts.

Alternatively Rust could decide to give the proposed characters syntatic meaning.

Superscript characters could be interpreted as potentiation, for example `let a = 2; let b = a²;` could be a synonym to `let a = 2; let b = a * a;`.
This would open up a host of questions and potential issues, like:
- Should `a²⁻³` be interpreted as `1/a`?
- There is no superscript character for multiplication `*`.

`∞` could be a synonym or replacement to `f32::INIFITY`, however there is no precedence for using non-ASCII characters in `core`/`std` and this would likely meet considerable opposition.

Derivatives could be added as a language features via auto-differentiation techniques thus giving `∇` and `∂` syntactic meaning, however there is no precedence of this in other languages and similar features are usually provided by libraries.

Subscript characters could be given syntatic meaning, for example `a₁` could be a synonym to `a[1]`, however this would be highly contentious and error prone due to the general disagreement between 0-based vs 1-based indexing and would suffer from similar problems as using superscript for potentiation.

# Prior art
[prior-art]: #prior-art

Expand Down Expand Up @@ -171,6 +186,8 @@ fn main() {
}
```

[C++ P3658R0](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3658r0.pdf) is a similar proposal with similar reasoning for the C++ language. In particular it states that the characters suggested in this RFC where allowed in C++11 to C++20 as originally published.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

Expand Down