Skip to content

VM inlining curiosity #56975

@jensjoha

Description

@jensjoha

Before https://dart-review.googlesource.com/c/sdk/+/385722 a compile of the CFE (an aot compile of pkg/front_end/tool/_fasta/compile.dart is what I'll be using below) includes two scanners: The Utf8BytesScanner and StringScanner. After the CL landed it it only included one.

Before, or indeed now if I include it again, say by applying this patch (here done on top of 329760d):

diff --git a/pkg/front_end/lib/src/kernel_generator_impl.dart b/pkg/front_end/lib/src/kernel_generator_impl.dart
index 2daf9ed7002..86673e57c33 100644
--- a/pkg/front_end/lib/src/kernel_generator_impl.dart
+++ b/pkg/front_end/lib/src/kernel_generator_impl.dart
@@ -5,6 +5,7 @@
 /// Defines the front-end API for converting source code to Dart Kernel objects.
 library front_end.kernel_generator_impl;
 
+import 'package:_fe_analyzer_shared/src/scanner/scanner.dart';
 import 'package:_fe_analyzer_shared/src/messages/severity.dart' show Severity;
 import 'package:kernel/ast.dart';
 import 'package:kernel/class_hierarchy.dart';
@@ -77,6 +78,11 @@ Future<InternalCompilerResult> generateKernelInternal(
   ProcessedOptions options = compilerContext.options;
   assert(options.haveBeenValidated, "Options have not been validated");
 
+  if (1+1==3) {
+    ScannerResult scannerResult = scanString("void main(){}");
+    print(scannerResult);
+  }
+
   options.reportNullSafetyCompilationModeInfo();
   FileSystem fs = options.fileSystem;

looking at an AOT compile on the function AbstractScanner.select at what happens to advance we can see that it checks the class id and then inlines both of the two versions:

[...]

903547  v91 <- LoadClassId(v2) [2073, 2074] int64
   0x00000000006337ea <+34>:    mov    -0x1(%rdx),%ebx
   0x00000000006337ed <+37>:    shr    $0xc,%ebx

903548  ParallelMove fp[-1] <- rbx
   0x00000000006337f0 <+40>:    mov    %rbx,-0x8(%rbp)

903549  Branch if EqualityCompare:10(v91 == v52) goto (17, 27)
   0x00000000006337f4 <+44>:    cmp    $0x819,%rbx
   0x00000000006337fb <+51>:    jne    0x633855 <AbstractScanner.select+141>

903550  B17
903551  v58 <- LoadField(v2 T{Utf8BytesScanner} . byteOffset) [-9223372036854775808, 9223372036854775807] int64
   0x0000000000633801 <+57>:    mov    0x9f(%rdx),%rdi

[...]

903564  B27
903565  v74 <- LoadField(v2 T{StringScanner} . scanOffset) [-9223372036854775808, 9223372036854775807] int64
   0x0000000000633855 <+141>:   mov    0x9f(%rdx),%rdi

[...]

With only one included, though, it no longer inlines it, and the method disassembly looks like this:

[...]
901662  v6 <- StaticCall:10( advance<0> v2, using unchecked entrypoint, result_type = T{_Smi}) [-4611686018427387904, 4611686018427387903] int64
   0x0000000000631559 <+49>:    call   0x62fe60 <Utf8BytesScanner.advance>
[...]

I can ask the VM nicely for it to be inlined by applying @pragma("vm:prefer-inline"), making it in fact inline it:

diff --git a/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart b/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart
index 2ea201ed8f5..bedc0edc015 100644
--- a/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart
+++ b/pkg/_fe_analyzer_shared/lib/src/scanner/utf8_bytes_scanner.dart
@@ -126,6 +126,7 @@ class Utf8BytesScanner extends AbstractScanner {
 
   @override
   @pragma('vm:unsafe:no-bounds-checks')
+  @pragma("vm:prefer-inline")
   int advance() {
     // Always increment so byteOffset goes past the end.
     ++byteOffset;
[...]

901884  v32 <- LoadField(v2 T{Utf8BytesScanner} . byteOffset) [-9223372036854775808, 9223372036854775807] int64
   0x00000000006318ea <+34>:    mov    0x9f(%rdx),%rbx

901885  ParallelMove rbx <- rbx
901886  v34 <- BinaryInt64Op(+ [tr], v32, v72 T{_Smi}) [-9223372036854775808, 9223372036854775807] int64
   0x00000000006318f1 <+41>:    add    $0x1,%rbx

901887  StoreField(v2 T{Utf8BytesScanner} . byteOffset = v34 T{int} <int64>)
   0x00000000006318f5 <+45>:    mov    %rbx,0x9f(%rdx)

[...]

In fact inlining it is (slightly) faster --- here the difference having it inlined when compiling a fixed version of the CFE with itself:

instructions:u: -0.1007% +/- 0.0006% (-21314554.00 +/- 122170.23)

Knowing that @mraleph for instance is sceptical about the instruction count, looking instead on the scanner benchmark pkg/_fe_analyzer_shared/test/scanner_benchmark.dart this too has two live scanners, and they're inlined by default (.1 below). Out-commenting the string one (leaving only one) does StaticCalls (.2 below). Asking nicely for it to be inlines and it is (.3 below) compiled like out/ReleaseX64/dart-sdk/bin/dart compile aot-snapshot pkg/_fe_analyzer_shared/test/scanner_benchmark.dart (and moved to .1 etc), run like e.g. out/ReleaseX64/dart-sdk/bin/dartaotruntime pkg/_fe_analyzer_shared/test/scanner_benchmark.aot.1 pkg/kernel/lib/src/equivalence.dart --bytes gives these runtimes:
.1: ~103.1 bytes per microsecond
.2: ~108.7 bytes per microsecond
.3: ~112.9 bytes per microsecond

So having only one non-inlined one processes 5.47075% +/- 1.30016% more bytes per time unit compared to having two inlined ones. So far so good. Inlining advance makes it process 3.8906% +/- 0.812575% more bytes still per time unit though.

I guess my questions are this:

  1. Why did it go from being inlined to not being inlined? That's quite a surprising result and as we've seen it would have been faster had it still been inlined.
    Depending on the answer to that,
  2. why wasn't the before a class-id check and two (specific) StaticCalls instead? (probably that would be slower, which is why it depends on the answer to 1 above).
    and also
  3. If - when in the situation where we have two options and it inlines by default - I mark both as @pragma("vm:never-inline") it doesn't check class id's and instead does a DispatchTableCall which makes sense. But if I only mark one of them with @pragma("vm:never-inline") it inlines the non-marked on and does a DispatchTableCall on the other one --- which seems to me like it could have been a StaticCall considering it should now know exactly what is left. Why doesn't it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-vmUse area-vm for VM related issues, including code coverage, and the AOT and JIT backends.type-questionA question about expected behavior or functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions