[dart2wasm] Refactor int parsing

osa1 · Commit Queue · commit c8cf274cffcb · 2024-08-05T12:25:18.000Z
This refactors dart2wasm's `int.parse` and `int.tryParse` implementations. Current implementation is copied from VM, which supports 63-bit "smi" integers and 32-bit integers. In dart2wasm we only support 64-bit integers. This change updates the int parsing to handle 64-bit ints. The changes can be summarized as refactorings plus one change. Refactorings: - Remove all mentions of "smi"s and 32-bit integer parsing support. - Move patched public members `parse` and `tryParse` to the beginning of the patched class, to allow top-down reading of the file and separate entry points from internal functions. - Refactor the `last` (inclusive last character index) parameter of `_tryParseSmi` (renamed to `_tryParseIntRadix10` in this CL) as `end` (exlusive last character index), to be consistent with the rest of the function in the same file. - Remove redundant `null` checks from the pre-sound-null-safe days. - Remove 32-bit constants from the `_PARSE_LIMITS` table. Change: Current code, when the input string is larger than the max. number of digits that fit into the `int` type, parses one "block" at a time, then combines the blocks. (A block is a substring in the input that can be parsed as `int` without overflows) This makes the code very complicated (with lazily generated "overflow limits" table, complicated logic to combine blocks while checking for overflows) to handle just one digit after a block. With this change we do something simpler: first we skip all leading zeros. This part is new, current code does not skip leading zeros and handle them as a part of a block. After the zeros we parse one block as usual. After the block, we can parse at most two more digits without an overflow (or underflow if the number is negative). Handling of these two digits does not need to be optimized with special checks and table lookups, because the amount of work done for the digits is small, and branching and the cost of table lookups followed by more efficient code will probably be slower than just handling digits in a simple way. This change is done in `_parseRadix`. Rest of the changes in the file are refactoring, as described above. The `_PARSE_LIMITS` table with max. number of digits that fit into an `int` is updated using this program: ``` void main() { final maxI64 = 9223372036854775807; for (int radix = 2; radix <= 36; radix += 1) { final str = maxI64.toRadixString(radix); print("Max I64 in radix $radix = $str, num digits = ${str.length}"); } } ``` For example, max. 64-bit signed integer in radix 20 is "5cbfjia3fh26ja7", which has 15 digits. Unless all of the digits are the largest digit of the radix, we need to subtract one. So the max. number of digits for radix 20 is 14. The only radixes where all digits are the largest digit are 2 and 8. In these cases we can handle 63 and 21 digits respectively (instead of 62 and 20). In all other bases we subtract the number of digits printed by the program above by one in the table. # Benchmarks Golem reports 61% improvement in the benchmark `Int.parse.0032.bits`. Golem also reports 18% slowdown in Utf8Encode.sk.10M, however the Wasms for that benchmark before and after this change are identical, so it must be noise. Change-Id: Ia35a50a0328e680be2d494405e13caaded1b7ad9 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/372281 Commit-Queue: Ömer Ağacan <omersa@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com>
diff --git a/sdk/lib/_internal/wasm/lib/int_patch.dart b/sdk/lib/_internal/wasm/lib/int_patch.dart
@@ -2,55 +2,41 @@
 // for details. All rights reserved. Use of this source code is governed by a
 // BSD-style license that can be found in the LICENSE file.
 
-import "dart:_internal" show has63BitSmis, patch, unsafeCast;
+import "dart:_internal" show patch;
 import "dart:_string" show StringUncheckedOperations;
-import "dart:typed_data" show Int64List;
+import "dart:_wasm";
 
 @patch
 class int {
-  static int? _tryParseSmi(String str, int first, int last) {
-    assert(first <= last);
-    var ix = first;
-    var sign = 1;
-    var c = str.codeUnitAtUnchecked(ix);
-    // Check for leading '+' or '-'.
-    if ((c == 0x2b) || (c == 0x2d)) {
-      ix++;
-      sign = 0x2c - c; // -1 for '-', +1 for '+'.
-      if (ix > last) {
-        return null; // Empty.
-      }
-    }
-    var smiLimit = has63BitSmis ? 18 : 9;
-    if ((last - ix) >= smiLimit) {
-      return null; // May not fit into a Smi.
+  @patch
+  static int? tryParse(String source, {int? radix}) {
+    if (source.isEmpty) {
+      return null;
     }
-    var result = 0;
-    for (int i = ix; i <= last; i++) {
-      var c = 0x30 ^ str.codeUnitAtUnchecked(i);
-      if (9 < c) {
-        return null;
-      }
-      result = 10 * result + c;
+    if (radix == null || radix == 10) {
+      // Try parsing immediately, without trimming whitespace.
+      int? result = _tryParseIntRadix10(source, 0, source.length);
+      if (result != null) return result;
+    } else if ((radix - 2).gtU(34)) {
+      throw RangeError("Radix $radix not in range 2..36");
     }
-    return sign * result;
+    return _parse(source, radix, _kNull);
   }
 
   @patch
   static int parse(String source, {int? radix, int onError(String source)?}) {
-    if (source == null) throw ArgumentError("The source must not be null");
     if (source.isEmpty) {
       return _handleFormatError(onError, source, 0, radix, null) as int;
     }
     if (radix == null || radix == 10) {
       // Try parsing immediately, without trimming whitespace.
-      int? result = _tryParseSmi(source, 0, source.length - 1);
+      int? result = _tryParseIntRadix10(source, 0, source.length);
       if (result != null) return result;
-    } else if (radix < 2 || radix > 36) {
+    } else if ((radix - 2).gtU(34)) {
       throw RangeError("Radix $radix not in range 2..36");
     }
     // Split here so improve odds of parse being inlined and the checks omitted.
-    return _parse(source, radix, onError) as int;
+    return _parse(source, radix, onError)!;
   }
 
   static int? _parse(
@@ -91,19 +77,6 @@ class int {
     return _parseRadix(source, radix, start, end, sign, false, onError);
   }
 
-  @patch
-  static int? tryParse(String source, {int? radix}) {
-    if (source.isEmpty) return null;
-    if (radix == null || radix == 10) {
-      // Try parsing immediately, without trimming whitespace.
-      int? result = _tryParseSmi(source, 0, source.length - 1);
-      if (result != null) return result;
-    } else if (radix < 2 || radix > 36) {
-      throw RangeError("Radix $radix not in range 2..36");
-    }
-    return _parse(source, radix, _kNull);
-  }
-
   static Null _kNull(_) => null;
 
   static int? _handleFormatError(int? Function(String)? onError, String source,
@@ -119,87 +92,82 @@ class int {
   }
 
   static int? _parseRadix(String source, int radix, int start, int end,
-      int sign, bool allowU64, int? Function(String)? onError) {
-    int tableIndex = (radix - 2) * 4 + (has63BitSmis ? 2 : 0);
-    int blockSize = _PARSE_LIMITS[tableIndex];
-    int length = end - start;
-    if (length <= blockSize) {
-      int? smi = _parseBlock(source, radix, start, end);
-      if (smi == null) {
-        return _handleFormatError(onError, source, start, radix, null);
-      }
-      return sign * smi;
+      int sign, bool allowOverflow, int? Function(String)? onError) {
+    // Skip leading zeroes.
+    while (start < end && source.codeUnitAtUnchecked(start) == 0x30 /* 0 */) {
+      start += 1;
     }
 
-    // Often cheaper than: int smallBlockSize = length % blockSize;
-    // because digit count generally tends towards smaller. rather
-    // than larger.
-    int smallBlockSize = length;
-    while (smallBlockSize >= blockSize) smallBlockSize -= blockSize;
-    int result = 0;
-    if (smallBlockSize > 0) {
-      int blockEnd = start + smallBlockSize;
-      int? smi = _parseBlock(source, radix, start, blockEnd);
-      if (smi == null) {
-        return _handleFormatError(onError, source, start, radix, null);
-      }
-      result = sign * smi;
-      start = blockEnd;
+    final blockSize = _PARSE_LIMITS[radix].toInt();
+    final length = end - start;
+
+    // Parse at most `blockSize` characters without overflows.
+    final parseBlockLength = length < blockSize ? length : blockSize;
+    int? blockResult =
+        _parseBlock(source, radix, start, start + parseBlockLength);
+    if (blockResult == null) {
+      return _handleFormatError(onError, source, start, radix, null);
     }
-    int multiplier = _PARSE_LIMITS[tableIndex + 1];
-    int positiveOverflowLimit = 0;
-    int negativeOverflowLimit = 0;
-    tableIndex = tableIndex << 1; // pre-multiply by 2 for simpler indexing
-    positiveOverflowLimit = _int64OverflowLimits[tableIndex];
-    if (positiveOverflowLimit == 0) {
-      positiveOverflowLimit = _initInt64OverflowLimits(tableIndex, multiplier);
+
+    int result = sign * blockResult;
+
+    if (parseBlockLength < blockSize) {
+      // Overflow is not possible.
+      return result;
     }
-    negativeOverflowLimit = _int64OverflowLimits[tableIndex + 1];
-    int blockEnd = start + blockSize;
-    do {
-      int? smi = _parseBlock(source, radix, start, blockEnd);
-      if (smi == null) {
-        return _handleFormatError(onError, source, start, radix, null);
+
+    // Check overflows on the next digits. We can scan at most two digits before an overflow.
+    start += parseBlockLength;
+
+    for (int i = start; i < end; i++) {
+      int char = source.codeUnitAtUnchecked(i);
+      int digit = char ^ 0x30;
+      if (digit > 9) {
+        digit = (char | 0x20) - (0x61 - 10);
+        if (digit < 10 || digit >= radix) {
+          return _handleFormatError(onError, source, start, radix, null);
+        }
       }
-      if (result >= positiveOverflowLimit) {
-        if ((result > positiveOverflowLimit) ||
-            (smi > _int64OverflowLimits[tableIndex + 2])) {
-          // Although the unsigned overflow limits do not depend on the
-          // platform, the multiplier and block size, which are used to
-          // compute it, do.
-          int X = has63BitSmis ? 1 : 0;
-          if (allowU64 &&
-              !(result >= _int64UnsignedOverflowLimits[X] &&
-                  (result > _int64UnsignedOverflowLimits[X] ||
-                      smi > _int64UnsignedSmiOverflowLimits[X])) &&
-              blockEnd + blockSize > end) {
-            return (result * multiplier) + smi;
-          }
+
+      if (sign > 0) {
+        const max = 9223372036854775807;
+
+        if (!allowOverflow && (result > (max - digit) ~/ radix)) {
           return _handleFormatError(onError, source, null, radix,
               "Positive input exceeds the limit of integer");
         }
-      } else if (result <= negativeOverflowLimit) {
-        if ((result < negativeOverflowLimit) ||
-            (smi > _int64OverflowLimits[tableIndex + 3])) {
+
+        result = (radix * result) + digit;
+      } else {
+        const min = -9223372036854775808;
+
+        // We don't need to check `allowOverflow` as overflows are only allowed
+        // in positive numbers.
+        if (result < (min + digit) ~/ radix) {
           return _handleFormatError(onError, source, null, radix,
               "Negative input exceeds the limit of integer");
         }
+
+        result = (radix * result) - digit;
       }
-      result = (result * multiplier) + (sign * smi);
-      start = blockEnd;
-      blockEnd = start + blockSize;
-    } while (blockEnd <= end);
+    }
+
     return result;
   }
 
-  // Parse block of digits into a Smi.
-  static _Smi? _parseBlock(String source, int radix, int start, int end) {
-    _Smi result = unsafeCast<_Smi>(0);
+  /// Parse digits in [source] range from [start] to [end].
+  ///
+  /// Returns `null` if a character is not valid in radix [radix].
+  ///
+  /// Does not check for overflows, assumes that the number of digits in the
+  /// range will fit into an [int].
+  static int? _parseBlock(String source, int radix, int start, int end) {
+    int result = 0;
     if (radix <= 10) {
       for (int i = start; i < end; i++) {
         int digit = source.codeUnitAtUnchecked(i) ^ 0x30;
         if (digit >= radix) return null;
-        result = (radix * result + digit) as _Smi;
+        result = (radix * result) + digit;
       }
     } else {
       for (int i = start; i < end; i++) {
@@ -209,87 +177,76 @@ class int {
           digit = (char | 0x20) - (0x61 - 10);
           if (digit < 10 || digit >= radix) return null;
         }
-        result = (radix * result + digit) as _Smi;
+        result = (radix * result) + digit;
       }
     }
     return result;
   }
 
-  // For each radix, 2-36, how many digits are guaranteed to fit in a smi,
-  // and magnitude of such a block (radix ** digit-count).
-  // 32-bit limit/multiplier at (radix - 2)*4, 64-bit limit at (radix-2)*4+2
-  static const _PARSE_LIMITS = const [
-    30, 1073741824, 62, 4611686018427387904, // radix: 2
-    18, 387420489, 39, 4052555153018976267,
-    15, 1073741824, 30, 1152921504606846976,
-    12, 244140625, 26, 1490116119384765625, //  radix: 5
-    11, 362797056, 23, 789730223053602816,
-    10, 282475249, 22, 3909821048582988049,
-    10, 1073741824, 20, 1152921504606846976,
-    9, 387420489, 19, 1350851717672992089,
-    9, 1000000000, 18, 1000000000000000000, //  radix: 10
-    8, 214358881, 17, 505447028499293771,
-    8, 429981696, 17, 2218611106740436992,
-    8, 815730721, 16, 665416609183179841,
-    7, 105413504, 16, 2177953337809371136,
-    7, 170859375, 15, 437893890380859375, //    radix: 15
-    7, 268435456, 15, 1152921504606846976,
-    7, 410338673, 15, 2862423051509815793,
-    7, 612220032, 14, 374813367582081024,
-    7, 893871739, 14, 799006685782884121,
-    6, 64000000, 14, 1638400000000000000, //    radix: 20
-    6, 85766121, 14, 3243919932521508681,
-    6, 113379904, 13, 282810057883082752,
-    6, 148035889, 13, 504036361936467383,
-    6, 191102976, 13, 876488338465357824,
-    6, 244140625, 13, 1490116119384765625, //   radix: 25
-    6, 308915776, 13, 2481152873203736576,
-    6, 387420489, 13, 4052555153018976267,
-    6, 481890304, 12, 232218265089212416,
-    6, 594823321, 12, 353814783205469041,
-    6, 729000000, 12, 531441000000000000, //    radix: 30
-    6, 887503681, 12, 787662783788549761,
-    6, 1073741824, 12, 1152921504606846976,
-    5, 39135393, 12, 1667889514952984961,
-    5, 45435424, 12, 2386420683693101056,
-    5, 52521875, 12, 3379220508056640625, //    radix: 35
-    5, 60466176, 11, 131621703842267136,
-  ];
-
-  static const _maxInt64 = 0x7fffffffffffffff;
-  static const _minInt64 = -0x8000000000000000;
-
-  static const _int64UnsignedOverflowLimits = const [0xfffffffff, 0xf];
-  static const _int64UnsignedSmiOverflowLimits = const [
-    0xfffffff,
-    0xfffffffffffffff
-  ];
-
-  /// Calculation of the expression
-  ///
-  ///   result = (result * multiplier) + (sign * smi)
-  ///
-  /// in `_parseRadix()` may overflow 64-bit integers. In such case,
-  /// `int.parse()` should stop with an error.
-  ///
-  /// This table is lazily filled with int64 overflow limits for result and smi.
-  /// For each multiplier from `_PARSE_LIMITS[tableIndex + 1]` this table
-  /// contains
-  ///
-  /// * `[tableIndex*2]` = positive limit for result
-  /// * `[tableIndex*2 + 1]` = negative limit for result
-  /// * `[tableIndex*2 + 2]` = limit for smi if result is exactly at positive limit
-  /// * `[tableIndex*2 + 3]` = limit for smi if result is exactly at negative limit
-  static final Int64List _int64OverflowLimits =
-      Int64List(_PARSE_LIMITS.length * 2);
-
-  static int _initInt64OverflowLimits(int tableIndex, int multiplier) {
-    _int64OverflowLimits[tableIndex] = _maxInt64 ~/ multiplier;
-    _int64OverflowLimits[tableIndex + 1] = _minInt64 ~/ multiplier;
-    _int64OverflowLimits[tableIndex + 2] =
-        unsafeCast<int>(_maxInt64.remainder(multiplier));
-    _int64OverflowLimits[tableIndex + 3] =
-        -unsafeCast<int>(_minInt64.remainder(multiplier));
-    return _int64OverflowLimits[tableIndex];
+  static int? _tryParseIntRadix10(String str, int start, int end) {
+    int ix = start;
+    int sign = 1;
+    int c = str.codeUnitAtUnchecked(ix);
+    // Check for leading '+' or '-'.
+    if ((c == 0x2b) || (c == 0x2d)) {
+      ix++;
+      sign = 0x2c - c; // -1 for '-', +1 for '+'.
+      if (ix == end) {
+        return null; // Empty.
+      }
+    }
+    if (end - ix > 18) {
+      return null; // May not fit into an `int`.
+    }
+    int result = 0;
+    for (int i = ix; i < end; i++) {
+      int c = 0x30 ^ str.codeUnitAtUnchecked(i);
+      if (9 < c) {
+        return null;
+      }
+      result = (10 * result) + c;
+    }
+    return sign * result;
   }
+
+  // For each radix, 2-36, how many digits are guaranteed to fit in an `int`.
+  static const _PARSE_LIMITS = const WasmArray<WasmI64>.literal([
+    0, // unused
+    0, // unused
+    63, // radix: 2
+    39,
+    31,
+    27, // radix: 5
+    24,
+    22,
+    21,
+    19,
+    18, // radix: 10
+    18,
+    17,
+    17,
+    16,
+    16, // radix: 15
+    15,
+    15,
+    15,
+    14,
+    14, // radix: 20
+    14,
+    14,
+    13,
+    13,
+    13, // radix: 25
+    13,
+    13,
+    13,
+    12,
+    12, // radix: 30
+    12,
+    12,
+    12,
+    12,
+    12, // radix: 35
+    12,
+  ]);
 }