RFC: ID_Compat_Math characters allowed in identifiers #3840

Danvil · 2025-07-16T19:17:40Z

This RFC extends the set of Unicode character which can be used in identifiers with ID_Compat_Math_Start and ID_Compat_Math_Continue, most notable: ∇, ∂, ∞, subscripts ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ and superscripts ₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎.

This can be a boon to implementers of scientific concepts as they can write for example let ∇E₁₂ = 0.5;.

Rendered

clarfonthey · 2025-07-16T22:11:21Z

While I mostly sympathise with this and think that it's probably fine to do this, I think that an RFC suggesting this should at minimum:

Reference the actual section of UAX 31 that defines these groups of characters: https://www.unicode.org/reports/tr31/#Standard_Profiles
Reference the section of UTS 55 linked in the above section that explains why you might not want to use these groups of characters, which currently cites Rust as an existing example: https://www.unicode.org/reports/tr55/#General-Security-Profile
Reference the section of UTS 39 linked in the above section that explains the exact mechanisms which the above can be made safe: https://www.unicode.org/reports/tr39/#General_Security_Profile

Note that your reference to NFKC is technically correct: Not_NFKC is one of the restricted security profile cases that is covered by UTS 39, but it's not the only one, and it's worth discussing whether Rust's handling would need to be expanded because of this case.

FWIW, I very much sympathise with both the desire to have more scientific characters in variables and the desire to hand-wave away the issues as being already solved. It's also harder than ever before to do proper research online due to the shift of focus toward crystal-ball-based decisionmaking. I mostly want to clarify where you can find the relevant Unicode resources discussing this issue, and I think that the RFC should be updated to directly reference them so that we don't try and reinvent the wheel and redo all their hard work.

Also, I think it's pretty great that Rust is explicitly mentioned in the Unicode standard as someone who does this right! I didn't know this was the case until now.

text/0000-compat-math-identifiers.md

* Added links to UAX31 and others as requested in CR * Fixed typos as requested in CR * Extended the drawbacks section * Other improvements

Danvil · 2025-07-16T23:23:10Z

@clarfonthey Thanks for the review! I made the requested changes and added more links to the Unicode resources and expanded some sections.
@programmerjake Thanks for the review - typos are fixed.

text/0000-compat-math-identifiers.md

Noratrieb · 2025-07-17T10:57:06Z

text/0000-compat-math-identifiers.md

+
+* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing.
+
+# Rationale and alternatives


Rust currently just follows Unicode's recommendation on what should be allowed as a programming language identifier: https://rust-lang.github.io/rfcs/2457-non-ascii-idents.html (Annex 31).

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

It would be very good to have a description here of why Annex 31 does not contain these symbols, if such discussion can be found anywhere, to ensure that we are not missing something important and are sure about our choice to deviate from the recommendation.

It would be very good to have a description here of why Annex 31 does not contain these symbols

UAX 31 does contain these symbols, that's what this profile comes from: https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile

For the question "why are they not in the default profile", the answer is basically to leave room for languages that want to do custom operators, or use these as builtin operators.

It's also just caution in expanding the set to include new meanings: while the XID set expands with each Unicode release as new characters get added, it would not be good for new types of characters to get included: if a programming language cared only about linguistic content in identifiers; it would perhaps be surprised if mathematical subscripts entered the fray. This separate profile allows for explicit choice.

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

The mathematical profile is included in UAX 31, the identifiers standard: that is the Unicode consortium making a Unicode decision that these are acceptable in identifiers. It's a choice from a menu that programming languages may choose from. Rust is currently following Unicode's recommendation, but this RFC would have Rust continuing to follow Unicode's recommendation.

Changed formulation related to UAX31 a bit.

Noratrieb · 2025-07-17T10:58:20Z

cc @Manishearth as our Unicode person

Manishearth

Overall seems fine to me. I didn't include this in the original RFC since IIRC the mathematical profile was still being worked on, and I didn't wish to have this facet be another thing that needed to be discussed.

Manishearth · 2025-07-17T14:25:09Z

text/0000-compat-math-identifiers.md

+
+* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing.
+
+# Rationale and alternatives


It would be very good to have a description here of why Annex 31 does not contain these symbols

UAX 31 does contain these symbols, that's what this profile comes from: https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile

For the question "why are they not in the default profile", the answer is basically to leave room for languages that want to do custom operators, or use these as builtin operators.

It's also just caution in expanding the set to include new meanings: while the XID set expands with each Unicode release as new characters get added, it would not be good for new types of characters to get included: if a programming language cared only about linguistic content in identifiers; it would perhaps be surprised if mathematical subscripts entered the fray. This separate profile allows for explicit choice.

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

The mathematical profile is included in UAX 31, the identifiers standard: that is the Unicode consortium making a Unicode decision that these are acceptable in identifiers. It's a choice from a menu that programming languages may choose from. Rust is currently following Unicode's recommendation, but this RFC would have Rust continuing to follow Unicode's recommendation.

text/0000-compat-math-identifiers.md

* Clarified choice between syntactic and identifier use * Added link to a similar C++ proposal * Expanded the alternatives section discussing how characters could be given syntactic meaning instead

Danvil · 2025-07-17T19:42:45Z

@Manishearth @kennytm @Noratrieb thank you for the review! I added your suggestions and comments to the draft.

text/0000-compat-math-identifiers.md

text/3840-compat-math-identifiers.md

clarfonthey · 2025-07-20T06:30:25Z

text/3840-compat-math-identifiers.md

+Similarly `let a = [2, 0]; let b = a₁;` will naturally give a compiler error that `a₁` is an unknown identifier and not be interpreted as `let b = a[0];`.
+`∞` will just be a character usable in identifiers and not be a synonym to the likes of `f32::INFINITY`.
+
+The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC or `uncommon_codepoints` warning when used depending on their Unicode classification.


FWIW, it's worth noting that actually the characters from 3) and 4) would also be included in the NFKC warning; if you look at the definition of NFKC, superscripts and subscripts are an explicit example: https://unicode.org/reports/tr15/#Compatibility_Composite_Figure

So, it's worth noting that only three characters from this wouldn't trigger the warning. These are still three good characters to include, but it's worth noting for accuracy.

Another side note to mention, also to clarify the above, is that NFKC effectively removes super/subscripts when normalizing, and I personally think it's kind of weird that this results in some normalized characters which are not normally allowed in identifiers. (for example, parentheses and +/- signs)

I think it's particularly strange that these are included in the definition at all, but I guess it kind of makes sense.

In my personal experience x⁽³⁾, x⁺, x⁻, x₃ all appear readily in scientific texts. Unfortunately x* which is also quite common is not possible as there is not superscript * and x* itself is of course an invalid identifier.

joshtriplett · 2025-07-27T08:46:55Z

@Manishearth Do the lists of confusables have appropriate entries for these new symbols? For instance, are the bold/italic/bold-italic/sans-serif variations of Nabla already marked as confusable with each other?

teor2345

Some grammar and readability suggestions

text/3840-compat-math-identifiers.md

text/0000-compat-math-identifiers.md

clarfonthey · 2025-07-29T02:11:59Z

@Manishearth Do the lists of confusables have appropriate entries for these new symbols? For instance, are the bold/italic/bold-italic/sans-serif variations of Nabla already marked as confusable with each other?

Yes. (Not Manish, but I also read all the docs and this is correct.)

Also, as mentioned in this comment thread all but the infinity, nabla, and partial differential symbols are not NFKC and would thus show up as confusable for that reason as well.

Manishearth · 2025-07-29T02:32:42Z

Right, we have multiple reasons as to why the existing lints would fire on these things.

(There's a reason why I spent so much time trying to figure out the underlying concerns on the original non-ASCII issue: this way we were able to design lints that directly addressed those concerns; which is more future proof)

joshtriplett · 2025-07-29T16:27:06Z

Given appropriate confusables tables, the ID_Compat_Math_Start symbols seem entirely fine.

For ID_Compat_Math_Continue, most of them seem fine. I think it's unlikely that we're going to add functionality that would make e.g. ² raise something to the second power. The operators (⁺⁻⁼₊₋₌) seem a little more odd, but hopefully people will use them reasonably and we will have lints helping people avoid confusion.

What are the expected use cases for those operators in superscripts/subscripts? In particular, I'm wondering if there'd be any value in a lint for "characters that usually shouldn't be the last character in an identifier". Because, for instance, x₌ seems like a much more confusing variable name than something with ₌ in the middle of the name.

Jules-Bertholet · 2025-07-29T16:39:44Z

I could totally see x₌ being used to denote, e.g., “the quantity of X necessary to keep Y constant”. x₊ and x₋ also seem useful. (Though ₋ is potentially confusable with _.)

RFC: ID_Compat_Math characters allowed in identifiers

32f0414

programmerjake reviewed Jul 16, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Links to Unicode and various improvements

60dabca

* Added links to UAX31 and others as requested in CR * Fixed typos as requested in CR * Extended the drawbacks section * Other improvements

ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jul 16, 2025

programmerjake reviewed Jul 16, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Danvil added 2 commits July 16, 2025 17:24

Improved grammar of a sentence

b52ddab

Added another, longer motivating example

fe3d007

kennytm reviewed Jul 17, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Noratrieb reviewed Jul 17, 2025

View reviewed changes

Manishearth reviewed Jul 17, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Changes suggested by CR and expanded alternatives

e6d4ec2

* Clarified choice between syntactic and identifier use * Added link to a similar C++ proposal * Expanded the alternatives section discussing how characters could be given syntactic meaning instead

programmerjake reviewed Jul 17, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Danvil added 2 commits July 17, 2025 14:01

Added link to a Rust autodiff experiment and improved some wording

c1fe8b4

Updated file name with issue number

b58cbb6

Jules-Bertholet reviewed Jul 19, 2025

View reviewed changes

text/3840-compat-math-identifiers.md Outdated Show resolved Hide resolved

Annotated code blocks with ```rust to enable syntax highlighting

1113b8a

clarfonthey reviewed Jul 20, 2025

View reviewed changes

teor2345 reviewed Jul 27, 2025

View reviewed changes

ZuseZ4 reviewed Jul 29, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Rawk mentioned this pull request Aug 2, 2025

Rust doesn't support x₁ and x₂ identifiers while C++ does. #3402

Open

Fixed grammar and reference to autodiff feature

5556c69


		* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing.

		# Rationale and alternatives

RFC: ID_Compat_Math characters allowed in identifiers #3840

Are you sure you want to change the base?

RFC: ID_Compat_Math characters allowed in identifiers #3840

Conversation

Danvil commented Jul 16, 2025 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clarfonthey commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Danvil commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Noratrieb Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Manishearth Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Danvil Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Noratrieb commented Jul 17, 2025

Uh oh!

Manishearth left a comment

Choose a reason for hiding this comment

Uh oh!

Manishearth Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Danvil commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clarfonthey Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

clarfonthey Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

Danvil Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

joshtriplett commented Jul 27, 2025

Uh oh!

teor2345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clarfonthey commented Jul 29, 2025

Uh oh!

Manishearth commented Jul 29, 2025

Uh oh!

joshtriplett commented Jul 29, 2025

Uh oh!

Jules-Bertholet commented Jul 29, 2025

Uh oh!

Uh oh!

Danvil commented Jul 16, 2025 •

edited by rustbot

Loading

Noratrieb Jul 17, 2025 •

edited

Loading