Skip to content

Commit b8c7422

Browse files
authored
Move digit separators to accepted. (#4105)
1 parent 708d50b commit b8c7422

File tree

2 files changed

+201
-196
lines changed

2 files changed

+201
-196
lines changed
Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# Digit Separators
2+
3+
Author: Lasse Nielsen, Sam Rawlins
4+
5+
Status: In-progress
6+
7+
Version 1.0
8+
9+
## Motivation
10+
11+
To make long number literals more readable, allow authors to inject [digit
12+
group separators][] inside numbers. Examples with different possible separators:
13+
14+
```none
15+
100 000 000 000 000 000 000 // space
16+
100,000,000,000,000,000,000 // comma
17+
100.000.000.000.000.000.000 // period
18+
100'000'000'000'000'000'000 // apostrophe (C++)
19+
100_000_000_000_000_000_000 // underscore (many programming languages).
20+
```
21+
22+
## Proposal
23+
24+
### Digit separators in number literals
25+
26+
Allow one or more `_`s between any two otherwise adjacent _digits_ of a NUMBER
27+
or HEX\_NUMBER token. The following are not digits: The leading `0x` or `0X` in
28+
HEX\_NUMBER, and any `.`, `e`, `E`, `+` or `-` in NUMBER.
29+
30+
That means only allowing `_`s between two `0-9` digits in NUMBER and between
31+
two `0-9`,`a-f`,`A-F` digits in HEX\_NUMBER.
32+
33+
The grammar would be changing `<DIGIT>+` to `<DIGITS>` which is then `<DIGIT>`s
34+
with optional `_`s between, and same for hex digits:
35+
36+
```bnf
37+
<NUMBER> ::= <DIGITS> (`.' <DIGITS>)? <EXPONENT>?
38+
\alt `.' <DIGITS> <EXPONENT>?
39+
40+
<EXPONENT> ::= (`e' | `E') (`+' | `-')? <DIGITS>
41+
42+
<DIGITS> ::= <DIGIT> (`_'* <DIGIT>)*
43+
44+
<HEX\_NUMBER> ::= `0x' <HEX\_DIGITS>
45+
\alt `0X' <HEX\_DIGITS>
46+
47+
<HEX\_DIGIT> ::= `a' .. `f'
48+
\alt `A' .. `F'
49+
\alt <DIGIT>
50+
51+
<HEX\_DIGITS> ::= <HEX\_DIGIT> (`_'* <HEX\_DIGIT>)*
52+
```
53+
54+
### Examples
55+
56+
```none
57+
100__000_000__000_000__000_000 // one hundred million million millions!
58+
0x4000_0000_0000_0000
59+
0.000_000_000_01
60+
0x00_14_22_01_23_45 // MAC address
61+
555_123_4567 // US Phone number
62+
```
63+
64+
**Invalid** literals:
65+
66+
```none
67+
100_
68+
0x_00_14_22_01_23_45
69+
0._000_000_000_1
70+
100_.1
71+
1.2e_3
72+
```
73+
74+
An identifier like `_100` is a valid identifier, and `_100._100` is a valid
75+
member access. If users learn the "separator only between digits" rule quickly,
76+
this will likely not be an issue.
77+
78+
### Why choose underscores
79+
80+
The syntax must work even with just a single separator, so it can't be anything
81+
that can already validly seperate two expressions (excludes all infix operators
82+
and comma) and should already be part of a number literal (excludes decimal
83+
point).
84+
85+
So, the comma and decimal point are probably never going to work, even if they
86+
are already the standard "thousands separator" in text in different parts of
87+
the world.
88+
89+
Space separation is dangerous because it's hard to see whether it's just space,
90+
or it's an accidental tab character. If we allow spacing, should we allow
91+
arbitrary whitespace, including line terminators? If so, then this suddenly
92+
become quite dangerous. Forget a comma at the end of a line in a multiline
93+
list, and two adjacent integers are automatically combined (we already have
94+
that problem with strings). So, probably not a good choice, even if it is the
95+
preferred formatting for print text.
96+
97+
The apostrope is also the string single-quote character. We don't currently
98+
allow adjacent numbers and strings, but if we ever do, then this syntax becomes
99+
ambiguous. It's still possible (we disambiguate by assuming it's a digit
100+
separator). It is currently used by C++ 14 as a digit group separator, so it is
101+
definitely possible.
102+
103+
That leaves underscore, which could be the start of an identifier. Currently
104+
`100_000` would be tokenized as "integer literal 100" followed by "identifier
105+
`_000`". However, users would never write an identifier adjacent to another
106+
token that contains identifier-valid characters (unlike strings, which have
107+
clear delimiters that do not occur anywher else), so this is unlikely to happen
108+
in practice. Underscore is already used by a large number of programming
109+
languages including Java, Swift, and Python.
110+
111+
We also want to allow multiple separators for higher-level grouping, e.g.,:
112+
113+
```none
114+
100__000_000_000__000_000_000
115+
```
116+
117+
For this purpose, the underscore extends gracefully. So does space, but has the
118+
disadvantage that it collapses when inserted into HTML, whereas `''` looks odd.
119+
120+
### Related work
121+
122+
* [Java digit separators](https://docs.oracle.com/javase/8/docs/technotes/guides/language/underscores-literals.html)
123+
* [Python PEP 515 - underscores in numeric literals](https://peps.python.org/pep-0515/)
124+
125+
### Possible new lint rules
126+
127+
There are some possible new lint rule considerations, but none of these are
128+
considered vital to the usability or general success of the feature.
129+
130+
The feature is designed to help the readability of long numbers. But a
131+
developer can still make a mistake about where to place separators. For example:
132+
133+
```
134+
var one = 1_000_000;
135+
var two = 2_000_000;
136+
var three = 3_000_000;
137+
var four = 4_0000_000; // Whoops!
138+
```
139+
140+
If a developer uses the Dart formatter to format their code, they cannot try to
141+
vertically align the numbers with whitespace (extra space characters are
142+
removed by the formatter). So we could offer a lint rule to only place
143+
separators every three digits of a decimal number. Also possibly a similar rule
144+
for hexadecimal numbers. If a developer ever uses digit separators for a
145+
different purpose (as in separating the digits of a phone number), the rule may
146+
not prove useful.
147+
148+
A separate lint rule could encourage _consistent_ digit separators, which
149+
triggers if the digit groups do not have the same size (except the most
150+
significant one, which can be shorter). If there are any `__` separators, the
151+
number of `_`-separated groups between them should also be the same, and
152+
repeatedly for higher numbers of `_`s.
153+
154+
### Possible new quick fixes
155+
156+
There are some possible new automated fix ("quick fix") considerations, but
157+
none of these are considered vital to the usability or general success of the
158+
feature.
159+
160+
#### Unexpected underscores
161+
162+
With the digit-separators feature, separators can be added between _digits_ of
163+
a number literal, but nowhere else. In most error cases, the unexpected
164+
underscore can be detected as such, and we can offer quick fixes to remove
165+
unexpected errors (for example, `100_`, `100_e1.2`, `100._00`). In a few cases,
166+
the intention is not as straightforward, such as `100._100`, where `_100` can
167+
be a legal name of an extension member (though the presense of such a private
168+
extension member can be detected).
169+
170+
#### Unexpected commas
171+
172+
The only legal digit separator that is introduced with this feature is the
173+
underscore character. If a developer attempts to use another character, for
174+
example commas, as a separator, we may be able to detect this, and offer a
175+
quick fix to convert the commas to underscores.
176+
177+
### Non-breaking change
178+
179+
This change is strictly non-breaking. The feature can be thought of as a single
180+
change from previous Dart syntax: some syntax which was previously illegal
181+
(producing compile-time errors) becomes legal.
182+
183+
(The feature is still introduced with a [Dart language version][], so that
184+
packages that start using the feature declare that they require some new lower
185+
bound of the Dart SDK.)
186+
187+
### Formatting
188+
189+
As any number literal remains a single token, there are no formatting
190+
considerations.
191+
192+
## Changelog
193+
194+
### 1.0
195+
196+
- Initial version
197+
198+
[digit group separators]: https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping
199+
[Dart language version]: https://github.com/dart-lang/language/blob/main/accepted/2.8/language-versioning/feature-specification.md

0 commit comments

Comments
 (0)