Skip to content

Commit c8cf274

Browse files
osa1Commit Queue
authored andcommitted
[dart2wasm] Refactor int parsing
This refactors dart2wasm's `int.parse` and `int.tryParse` implementations. Current implementation is copied from VM, which supports 63-bit "smi" integers and 32-bit integers. In dart2wasm we only support 64-bit integers. This change updates the int parsing to handle 64-bit ints. The changes can be summarized as refactorings plus one change. Refactorings: - Remove all mentions of "smi"s and 32-bit integer parsing support. - Move patched public members `parse` and `tryParse` to the beginning of the patched class, to allow top-down reading of the file and separate entry points from internal functions. - Refactor the `last` (inclusive last character index) parameter of `_tryParseSmi` (renamed to `_tryParseIntRadix10` in this CL) as `end` (exlusive last character index), to be consistent with the rest of the function in the same file. - Remove redundant `null` checks from the pre-sound-null-safe days. - Remove 32-bit constants from the `_PARSE_LIMITS` table. Change: Current code, when the input string is larger than the max. number of digits that fit into the `int` type, parses one "block" at a time, then combines the blocks. (A block is a substring in the input that can be parsed as `int` without overflows) This makes the code very complicated (with lazily generated "overflow limits" table, complicated logic to combine blocks while checking for overflows) to handle just one digit after a block. With this change we do something simpler: first we skip all leading zeros. This part is new, current code does not skip leading zeros and handle them as a part of a block. After the zeros we parse one block as usual. After the block, we can parse at most two more digits without an overflow (or underflow if the number is negative). Handling of these two digits does not need to be optimized with special checks and table lookups, because the amount of work done for the digits is small, and branching and the cost of table lookups followed by more efficient code will probably be slower than just handling digits in a simple way. This change is done in `_parseRadix`. Rest of the changes in the file are refactoring, as described above. The `_PARSE_LIMITS` table with max. number of digits that fit into an `int` is updated using this program: ``` void main() { final maxI64 = 9223372036854775807; for (int radix = 2; radix <= 36; radix += 1) { final str = maxI64.toRadixString(radix); print("Max I64 in radix $radix = $str, num digits = ${str.length}"); } } ``` For example, max. 64-bit signed integer in radix 20 is "5cbfjia3fh26ja7", which has 15 digits. Unless all of the digits are the largest digit of the radix, we need to subtract one. So the max. number of digits for radix 20 is 14. The only radixes where all digits are the largest digit are 2 and 8. In these cases we can handle 63 and 21 digits respectively (instead of 62 and 20). In all other bases we subtract the number of digits printed by the program above by one in the table. # Benchmarks Golem reports 61% improvement in the benchmark `Int.parse.0032.bits`. Golem also reports 18% slowdown in Utf8Encode.sk.10M, however the Wasms for that benchmark before and after this change are identical, so it must be noise. Change-Id: Ia35a50a0328e680be2d494405e13caaded1b7ad9 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/372281 Commit-Queue: Ömer Ağacan <[email protected]> Reviewed-by: Martin Kustermann <[email protected]>
1 parent b121353 commit c8cf274

File tree

1 file changed

+139
-182
lines changed

1 file changed

+139
-182
lines changed

sdk/lib/_internal/wasm/lib/int_patch.dart

Lines changed: 139 additions & 182 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,41 @@
22
// for details. All rights reserved. Use of this source code is governed by a
33
// BSD-style license that can be found in the LICENSE file.
44

5-
import "dart:_internal" show has63BitSmis, patch, unsafeCast;
5+
import "dart:_internal" show patch;
66
import "dart:_string" show StringUncheckedOperations;
7-
import "dart:typed_data" show Int64List;
7+
import "dart:_wasm";
88

99
@patch
1010
class int {
11-
static int? _tryParseSmi(String str, int first, int last) {
12-
assert(first <= last);
13-
var ix = first;
14-
var sign = 1;
15-
var c = str.codeUnitAtUnchecked(ix);
16-
// Check for leading '+' or '-'.
17-
if ((c == 0x2b) || (c == 0x2d)) {
18-
ix++;
19-
sign = 0x2c - c; // -1 for '-', +1 for '+'.
20-
if (ix > last) {
21-
return null; // Empty.
22-
}
23-
}
24-
var smiLimit = has63BitSmis ? 18 : 9;
25-
if ((last - ix) >= smiLimit) {
26-
return null; // May not fit into a Smi.
11+
@patch
12+
static int? tryParse(String source, {int? radix}) {
13+
if (source.isEmpty) {
14+
return null;
2715
}
28-
var result = 0;
29-
for (int i = ix; i <= last; i++) {
30-
var c = 0x30 ^ str.codeUnitAtUnchecked(i);
31-
if (9 < c) {
32-
return null;
33-
}
34-
result = 10 * result + c;
16+
if (radix == null || radix == 10) {
17+
// Try parsing immediately, without trimming whitespace.
18+
int? result = _tryParseIntRadix10(source, 0, source.length);
19+
if (result != null) return result;
20+
} else if ((radix - 2).gtU(34)) {
21+
throw RangeError("Radix $radix not in range 2..36");
3522
}
36-
return sign * result;
23+
return _parse(source, radix, _kNull);
3724
}
3825

3926
@patch
4027
static int parse(String source, {int? radix, int onError(String source)?}) {
41-
if (source == null) throw ArgumentError("The source must not be null");
4228
if (source.isEmpty) {
4329
return _handleFormatError(onError, source, 0, radix, null) as int;
4430
}
4531
if (radix == null || radix == 10) {
4632
// Try parsing immediately, without trimming whitespace.
47-
int? result = _tryParseSmi(source, 0, source.length - 1);
33+
int? result = _tryParseIntRadix10(source, 0, source.length);
4834
if (result != null) return result;
49-
} else if (radix < 2 || radix > 36) {
35+
} else if ((radix - 2).gtU(34)) {
5036
throw RangeError("Radix $radix not in range 2..36");
5137
}
5238
// Split here so improve odds of parse being inlined and the checks omitted.
53-
return _parse(source, radix, onError) as int;
39+
return _parse(source, radix, onError)!;
5440
}
5541

5642
static int? _parse(
@@ -91,19 +77,6 @@ class int {
9177
return _parseRadix(source, radix, start, end, sign, false, onError);
9278
}
9379

94-
@patch
95-
static int? tryParse(String source, {int? radix}) {
96-
if (source.isEmpty) return null;
97-
if (radix == null || radix == 10) {
98-
// Try parsing immediately, without trimming whitespace.
99-
int? result = _tryParseSmi(source, 0, source.length - 1);
100-
if (result != null) return result;
101-
} else if (radix < 2 || radix > 36) {
102-
throw RangeError("Radix $radix not in range 2..36");
103-
}
104-
return _parse(source, radix, _kNull);
105-
}
106-
10780
static Null _kNull(_) => null;
10881

10982
static int? _handleFormatError(int? Function(String)? onError, String source,
@@ -119,87 +92,82 @@ class int {
11992
}
12093

12194
static int? _parseRadix(String source, int radix, int start, int end,
122-
int sign, bool allowU64, int? Function(String)? onError) {
123-
int tableIndex = (radix - 2) * 4 + (has63BitSmis ? 2 : 0);
124-
int blockSize = _PARSE_LIMITS[tableIndex];
125-
int length = end - start;
126-
if (length <= blockSize) {
127-
int? smi = _parseBlock(source, radix, start, end);
128-
if (smi == null) {
129-
return _handleFormatError(onError, source, start, radix, null);
130-
}
131-
return sign * smi;
95+
int sign, bool allowOverflow, int? Function(String)? onError) {
96+
// Skip leading zeroes.
97+
while (start < end && source.codeUnitAtUnchecked(start) == 0x30 /* 0 */) {
98+
start += 1;
13299
}
133100

134-
// Often cheaper than: int smallBlockSize = length % blockSize;
135-
// because digit count generally tends towards smaller. rather
136-
// than larger.
137-
int smallBlockSize = length;
138-
while (smallBlockSize >= blockSize) smallBlockSize -= blockSize;
139-
int result = 0;
140-
if (smallBlockSize > 0) {
141-
int blockEnd = start + smallBlockSize;
142-
int? smi = _parseBlock(source, radix, start, blockEnd);
143-
if (smi == null) {
144-
return _handleFormatError(onError, source, start, radix, null);
145-
}
146-
result = sign * smi;
147-
start = blockEnd;
101+
final blockSize = _PARSE_LIMITS[radix].toInt();
102+
final length = end - start;
103+
104+
// Parse at most `blockSize` characters without overflows.
105+
final parseBlockLength = length < blockSize ? length : blockSize;
106+
int? blockResult =
107+
_parseBlock(source, radix, start, start + parseBlockLength);
108+
if (blockResult == null) {
109+
return _handleFormatError(onError, source, start, radix, null);
148110
}
149-
int multiplier = _PARSE_LIMITS[tableIndex + 1];
150-
int positiveOverflowLimit = 0;
151-
int negativeOverflowLimit = 0;
152-
tableIndex = tableIndex << 1; // pre-multiply by 2 for simpler indexing
153-
positiveOverflowLimit = _int64OverflowLimits[tableIndex];
154-
if (positiveOverflowLimit == 0) {
155-
positiveOverflowLimit = _initInt64OverflowLimits(tableIndex, multiplier);
111+
112+
int result = sign * blockResult;
113+
114+
if (parseBlockLength < blockSize) {
115+
// Overflow is not possible.
116+
return result;
156117
}
157-
negativeOverflowLimit = _int64OverflowLimits[tableIndex + 1];
158-
int blockEnd = start + blockSize;
159-
do {
160-
int? smi = _parseBlock(source, radix, start, blockEnd);
161-
if (smi == null) {
162-
return _handleFormatError(onError, source, start, radix, null);
118+
119+
// Check overflows on the next digits. We can scan at most two digits before an overflow.
120+
start += parseBlockLength;
121+
122+
for (int i = start; i < end; i++) {
123+
int char = source.codeUnitAtUnchecked(i);
124+
int digit = char ^ 0x30;
125+
if (digit > 9) {
126+
digit = (char | 0x20) - (0x61 - 10);
127+
if (digit < 10 || digit >= radix) {
128+
return _handleFormatError(onError, source, start, radix, null);
129+
}
163130
}
164-
if (result >= positiveOverflowLimit) {
165-
if ((result > positiveOverflowLimit) ||
166-
(smi > _int64OverflowLimits[tableIndex + 2])) {
167-
// Although the unsigned overflow limits do not depend on the
168-
// platform, the multiplier and block size, which are used to
169-
// compute it, do.
170-
int X = has63BitSmis ? 1 : 0;
171-
if (allowU64 &&
172-
!(result >= _int64UnsignedOverflowLimits[X] &&
173-
(result > _int64UnsignedOverflowLimits[X] ||
174-
smi > _int64UnsignedSmiOverflowLimits[X])) &&
175-
blockEnd + blockSize > end) {
176-
return (result * multiplier) + smi;
177-
}
131+
132+
if (sign > 0) {
133+
const max = 9223372036854775807;
134+
135+
if (!allowOverflow && (result > (max - digit) ~/ radix)) {
178136
return _handleFormatError(onError, source, null, radix,
179137
"Positive input exceeds the limit of integer");
180138
}
181-
} else if (result <= negativeOverflowLimit) {
182-
if ((result < negativeOverflowLimit) ||
183-
(smi > _int64OverflowLimits[tableIndex + 3])) {
139+
140+
result = (radix * result) + digit;
141+
} else {
142+
const min = -9223372036854775808;
143+
144+
// We don't need to check `allowOverflow` as overflows are only allowed
145+
// in positive numbers.
146+
if (result < (min + digit) ~/ radix) {
184147
return _handleFormatError(onError, source, null, radix,
185148
"Negative input exceeds the limit of integer");
186149
}
150+
151+
result = (radix * result) - digit;
187152
}
188-
result = (result * multiplier) + (sign * smi);
189-
start = blockEnd;
190-
blockEnd = start + blockSize;
191-
} while (blockEnd <= end);
153+
}
154+
192155
return result;
193156
}
194157

195-
// Parse block of digits into a Smi.
196-
static _Smi? _parseBlock(String source, int radix, int start, int end) {
197-
_Smi result = unsafeCast<_Smi>(0);
158+
/// Parse digits in [source] range from [start] to [end].
159+
///
160+
/// Returns `null` if a character is not valid in radix [radix].
161+
///
162+
/// Does not check for overflows, assumes that the number of digits in the
163+
/// range will fit into an [int].
164+
static int? _parseBlock(String source, int radix, int start, int end) {
165+
int result = 0;
198166
if (radix <= 10) {
199167
for (int i = start; i < end; i++) {
200168
int digit = source.codeUnitAtUnchecked(i) ^ 0x30;
201169
if (digit >= radix) return null;
202-
result = (radix * result + digit) as _Smi;
170+
result = (radix * result) + digit;
203171
}
204172
} else {
205173
for (int i = start; i < end; i++) {
@@ -209,87 +177,76 @@ class int {
209177
digit = (char | 0x20) - (0x61 - 10);
210178
if (digit < 10 || digit >= radix) return null;
211179
}
212-
result = (radix * result + digit) as _Smi;
180+
result = (radix * result) + digit;
213181
}
214182
}
215183
return result;
216184
}
217185

218-
// For each radix, 2-36, how many digits are guaranteed to fit in a smi,
219-
// and magnitude of such a block (radix ** digit-count).
220-
// 32-bit limit/multiplier at (radix - 2)*4, 64-bit limit at (radix-2)*4+2
221-
static const _PARSE_LIMITS = const [
222-
30, 1073741824, 62, 4611686018427387904, // radix: 2
223-
18, 387420489, 39, 4052555153018976267,
224-
15, 1073741824, 30, 1152921504606846976,
225-
12, 244140625, 26, 1490116119384765625, // radix: 5
226-
11, 362797056, 23, 789730223053602816,
227-
10, 282475249, 22, 3909821048582988049,
228-
10, 1073741824, 20, 1152921504606846976,
229-
9, 387420489, 19, 1350851717672992089,
230-
9, 1000000000, 18, 1000000000000000000, // radix: 10
231-
8, 214358881, 17, 505447028499293771,
232-
8, 429981696, 17, 2218611106740436992,
233-
8, 815730721, 16, 665416609183179841,
234-
7, 105413504, 16, 2177953337809371136,
235-
7, 170859375, 15, 437893890380859375, // radix: 15
236-
7, 268435456, 15, 1152921504606846976,
237-
7, 410338673, 15, 2862423051509815793,
238-
7, 612220032, 14, 374813367582081024,
239-
7, 893871739, 14, 799006685782884121,
240-
6, 64000000, 14, 1638400000000000000, // radix: 20
241-
6, 85766121, 14, 3243919932521508681,
242-
6, 113379904, 13, 282810057883082752,
243-
6, 148035889, 13, 504036361936467383,
244-
6, 191102976, 13, 876488338465357824,
245-
6, 244140625, 13, 1490116119384765625, // radix: 25
246-
6, 308915776, 13, 2481152873203736576,
247-
6, 387420489, 13, 4052555153018976267,
248-
6, 481890304, 12, 232218265089212416,
249-
6, 594823321, 12, 353814783205469041,
250-
6, 729000000, 12, 531441000000000000, // radix: 30
251-
6, 887503681, 12, 787662783788549761,
252-
6, 1073741824, 12, 1152921504606846976,
253-
5, 39135393, 12, 1667889514952984961,
254-
5, 45435424, 12, 2386420683693101056,
255-
5, 52521875, 12, 3379220508056640625, // radix: 35
256-
5, 60466176, 11, 131621703842267136,
257-
];
258-
259-
static const _maxInt64 = 0x7fffffffffffffff;
260-
static const _minInt64 = -0x8000000000000000;
261-
262-
static const _int64UnsignedOverflowLimits = const [0xfffffffff, 0xf];
263-
static const _int64UnsignedSmiOverflowLimits = const [
264-
0xfffffff,
265-
0xfffffffffffffff
266-
];
267-
268-
/// Calculation of the expression
269-
///
270-
/// result = (result * multiplier) + (sign * smi)
271-
///
272-
/// in `_parseRadix()` may overflow 64-bit integers. In such case,
273-
/// `int.parse()` should stop with an error.
274-
///
275-
/// This table is lazily filled with int64 overflow limits for result and smi.
276-
/// For each multiplier from `_PARSE_LIMITS[tableIndex + 1]` this table
277-
/// contains
278-
///
279-
/// * `[tableIndex*2]` = positive limit for result
280-
/// * `[tableIndex*2 + 1]` = negative limit for result
281-
/// * `[tableIndex*2 + 2]` = limit for smi if result is exactly at positive limit
282-
/// * `[tableIndex*2 + 3]` = limit for smi if result is exactly at negative limit
283-
static final Int64List _int64OverflowLimits =
284-
Int64List(_PARSE_LIMITS.length * 2);
285-
286-
static int _initInt64OverflowLimits(int tableIndex, int multiplier) {
287-
_int64OverflowLimits[tableIndex] = _maxInt64 ~/ multiplier;
288-
_int64OverflowLimits[tableIndex + 1] = _minInt64 ~/ multiplier;
289-
_int64OverflowLimits[tableIndex + 2] =
290-
unsafeCast<int>(_maxInt64.remainder(multiplier));
291-
_int64OverflowLimits[tableIndex + 3] =
292-
-unsafeCast<int>(_minInt64.remainder(multiplier));
293-
return _int64OverflowLimits[tableIndex];
186+
static int? _tryParseIntRadix10(String str, int start, int end) {
187+
int ix = start;
188+
int sign = 1;
189+
int c = str.codeUnitAtUnchecked(ix);
190+
// Check for leading '+' or '-'.
191+
if ((c == 0x2b) || (c == 0x2d)) {
192+
ix++;
193+
sign = 0x2c - c; // -1 for '-', +1 for '+'.
194+
if (ix == end) {
195+
return null; // Empty.
196+
}
197+
}
198+
if (end - ix > 18) {
199+
return null; // May not fit into an `int`.
200+
}
201+
int result = 0;
202+
for (int i = ix; i < end; i++) {
203+
int c = 0x30 ^ str.codeUnitAtUnchecked(i);
204+
if (9 < c) {
205+
return null;
206+
}
207+
result = (10 * result) + c;
208+
}
209+
return sign * result;
294210
}
211+
212+
// For each radix, 2-36, how many digits are guaranteed to fit in an `int`.
213+
static const _PARSE_LIMITS = const WasmArray<WasmI64>.literal([
214+
0, // unused
215+
0, // unused
216+
63, // radix: 2
217+
39,
218+
31,
219+
27, // radix: 5
220+
24,
221+
22,
222+
21,
223+
19,
224+
18, // radix: 10
225+
18,
226+
17,
227+
17,
228+
16,
229+
16, // radix: 15
230+
15,
231+
15,
232+
15,
233+
14,
234+
14, // radix: 20
235+
14,
236+
14,
237+
13,
238+
13,
239+
13, // radix: 25
240+
13,
241+
13,
242+
13,
243+
12,
244+
12, // radix: 30
245+
12,
246+
12,
247+
12,
248+
12,
249+
12, // radix: 35
250+
12,
251+
]);
295252
}

0 commit comments

Comments
 (0)