Skip to content

AOT compiled access of several Uint8List indices (sometimes) does a LoadField per index #60001

@jensjoha

Description

@jensjoha

If I have this file:

import 'dart:typed_data';

void main() {
  Foo foo = new Foo(Uint8List.fromList([1, 2, 3, 4]));
  print(foo.doStuff());
}

class Foo {
  final Uint8List _bytes;

  Foo(this._bytes);

  @pragma("vm:never-inline")
  @pragma('vm:unsafe:no-bounds-checks')
  int doStuff() {
    if (_bytes.length >= 4) {
      return _bytes[0] + _bytes[1] + _bytes[2] + _bytes[3];
    }
    return 0;
  }
}

the doStuff method aot-compiled contains code like this:

41038   v57 <- LoadIndexed:24([_Uint8List] v3 T{_Uint8List}, v56 T{_Smi}) [0, 255] int64
   0x000000000006bb59 <+21>:    movzbq 0x17(%rcx),%rdx

41039   v59 <- LoadIndexed:28([_Uint8List] v3 T{_Uint8List}, v58 T{_Smi}) [0, 255] int64
   0x000000000006bb5e <+26>:    movzbq 0x18(%rcx),%rbx

41040   ParallelMove rdx <- rdx
41041   v14 <- BinaryInt64Op(+ [tr], v57 T{_Smi}, v59 T{_Smi}) [0, 510] int64
   0x000000000006bb63 <+31>:    add    %rbx,%rdx

41042   v61 <- LoadIndexed:34([_Uint8List] v3 T{_Uint8List}, v60 T{_Smi}) [0, 255] int64
   0x000000000006bb66 <+34>:    movzbq 0x19(%rcx),%rbx

41043   ParallelMove rdx <- rdx
41044   v18 <- BinaryInt64Op(+ [tr], v14, v61 T{_Smi}) [0, 765] int64
   0x000000000006bb6b <+39>:    add    %rbx,%rdx

41045   v63 <- LoadIndexed:40([_Uint8List] v3 T{_Uint8List}, v62 T{_Smi}) [0, 255] int64
   0x000000000006bb6e <+42>:    movzbq 0x1a(%rcx),%rbx

41046   ParallelMove rdx <- rdx
41047   v22 <- BinaryInt64Op(+ [tr], v18, v63 T{_Smi}) [0, 1020] int64
   0x000000000006bb73 <+47>:    add    %rbx,%rdx

So it basically loads, loads, adds, loads, adds, loads and adds.

If I instead have this:

import 'dart:io';
import 'dart:typed_data';

void main() {
  File f = File.fromUri(Platform.script);
  Foo foo = new Foo(f.readAsBytesSync());
  print(foo.doStuff());
}

class Foo {
  final Uint8List _bytes;

  Foo(this._bytes);

  @pragma("vm:never-inline")
  @pragma('vm:unsafe:no-bounds-checks')
  int doStuff() {
    if (_bytes.length >= 4) {
      return _bytes[0] + _bytes[1] + _bytes[2] + _bytes[3];
    }
    return 0;
  }
}

where the only difference is where it got the uint8list from it instead generates

44448   v68 <- LoadField(v3 . PointerBase.data, MayLoadInnerPointer) untagged
   0x000000000006fb95 <+21>:    mov    0x7(%rcx),%rdx

44449   v57 <- LoadIndexed:24([_Uint8List] v68 T{Uint8List}, v56 T{_Smi}) [0, 255] int64
   0x000000000006fb99 <+25>:    movzbq (%rdx),%rbx

44450   v69 <- LoadField(v3 T{Uint8List} . PointerBase.data, MayLoadInnerPointer) untagged
   0x000000000006fb9d <+29>:    mov    0x7(%rcx),%rdx

44451   v59 <- LoadIndexed:28([_Uint8List] v69 T{Uint8List}, v58 T{_Smi}) [0, 255] int64
   0x000000000006fba1 <+33>:    movzbq 0x1(%rdx),%rsi

44452   ParallelMove rbx <- rbx
44453   v14 <- BinaryInt64Op(+ [tr], v57 T{_Smi}, v59 T{_Smi}) [0, 510] int64
   0x000000000006fba6 <+38>:    add    %rsi,%rbx

44454   v70 <- LoadField(v3 T{Uint8List} . PointerBase.data, MayLoadInnerPointer) untagged
   0x000000000006fba9 <+41>:    mov    0x7(%rcx),%rdx

44455   v61 <- LoadIndexed:34([_Uint8List] v70 T{Uint8List}, v60 T{_Smi}) [0, 255] int64
   0x000000000006fbad <+45>:    movzbq 0x2(%rdx),%rsi

44456   ParallelMove rbx <- rbx
44457   v18 <- BinaryInt64Op(+ [tr], v14, v61 T{_Smi}) [0, 765] int64
   0x000000000006fbb2 <+50>:    add    %rsi,%rbx

44458   v71 <- LoadField(v3 T{Uint8List} . PointerBase.data, MayLoadInnerPointer) untagged
   0x000000000006fbb5 <+53>:    mov    0x7(%rcx),%rdx

44459   v63 <- LoadIndexed:40([_Uint8List] v71 T{Uint8List}, v62 T{_Smi}) [0, 255] int64
   0x000000000006fbb9 <+57>:    movzbq 0x3(%rdx),%rcx

44460   ParallelMove rbx <- rbx
44461   v22 <- BinaryInt64Op(+ [tr], v18, v63 T{_Smi}) [0, 1020] int64
   0x000000000006fbbe <+62>:    add    %rcx,%rbx

i.e. it now loads the field before each load.

I don't understand why it does that now when it didn't before, and I certainly don't understand why it has to do it every time --- isn't the thing still in %rdx so it could just reuse it?

So: Why does it do that and can I make it not do that?

/cc @mraleph

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-vmUse area-vm for VM related issues, including code coverage, and the AOT and JIT backends.type-performanceIssue relates to performance or code size

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions