Understanding how the compiler optimizes structs mutation #135

enitrat · 2023-08-18T12:46:02Z

enitrat
Aug 18, 2023
Maintainer

How the compiler optimizes structs modifications

Introduction

This report highlights how the compiler optimizes structs modified in function in order to minimize the amount of cells written to each time a struct field is updated. This work was conducted using scarb 0.6.2 and cairo: 2.1.1

Methodology

We define a file with a multi-field struct MyStruct, a main function that creates an instance of MyStruct and calls a function update_one that updates a single field of the struct, and a function update_many that updates multiple fields of the struct. We then compile this file to sierra and analyze the Sierra code to see how the compiler optimizes the struct modifications.
In the middle of the calling_bar and calling_bar_inlined functions, we add a call to a function that takes MyStruct as parameter to show how function inlining affects the optimization.

The following file is the source file used for this insvestigation:

#[derive(Drop, Copy)]
struct MyStruct {
    a: felt252,
    b: felt252,
    c: felt252,
    d: felt252,
    e: felt252
}

fn main() {
    let my_struct = MyStruct { a: 1, b: 2, c: 3, d: 4, e: 5 };
}

fn update_one(ref my_struct: MyStruct) {
    my_struct.a = 1;
}

fn update_many(ref my_struct: MyStruct) {
    my_struct.a = 1;
    my_struct.b = 2;
    my_struct.c = 3;
    my_struct.d = 4;
    my_struct.e = 5;
}

fn calling_bar(ref my_struct: MyStruct) {
    my_struct.a = 1;
    my_struct.b = 2;
    bar(my_struct);
    my_struct.c = 3;
    my_struct.d = 4;
    my_struct.e = 5;
}

fn calling_bar_inlined(ref my_struct: MyStruct) {
    my_struct.a = 1;
    my_struct.b = 2;
    bar_inlined(my_struct);
    my_struct.c = 3;
    my_struct.d = 4;
    my_struct.e = 5;
}


fn bar(my_struct: MyStruct) {
    my_struct.a + my_struct.b + my_struct.c + my_struct.d + my_struct.e;
}

#[inline(always)]
fn bar_inlined(my_struct: MyStruct) {
    my_struct.a + my_struct.b + my_struct.c + my_struct.d + my_struct.e;
}

Results

Let's start by analyzing the behavior of each function one by one, looking at Sierra code snippets

update_one:
Nothing particular here, but let's highlight the important parts of this code to make it easier to understand what's happenning in the next snippets.
- The first instructions deconstructs the struct into multiple variables
- The variable with id [1] is dropped, and a new variable with id [6] is created, whose value is 1
- The struct is reconstructed using the new value of [6] for the first member, keeping the previous value fort the other members
```
struct_deconstruct<compiler_struct_opti::MyStruct>([0]) -> ([1], [2], [3], [4], [5]);
drop<felt252>([1]) -> ();
felt252_const<1>() -> ([6]);
struct_construct<compiler_struct_opti::MyStruct>([6], [2], [3], [4], [5]) -> ([7]);
struct_construct<Unit>() -> ([8]);
store_temp<compiler_struct_opti::MyStruct>([7]) -> ([9]);
store_temp<Unit>([8]) -> ([10]);
return([9], [10]);
```

update_many

The concept is the same as for the function above, except that here, instead of reconstructing the struct after each update, we perform all the updates and then reconstruct the struct once. before returning from the function.

  drop<compiler_struct_opti::MyStruct>([0]) -> ();
  felt252_const<1>() -> ([1]);
  felt252_const<2>() -> ([2]);
  felt252_const<3>() -> ([3]);
  felt252_const<4>() -> ([4]);
  felt252_const<5>() -> ([5]);
  struct_construct<compiler_struct_opti::MyStruct>([1], [2], [3], [4], [5]) -> ([6]);
  struct_construct<Unit>() -> ([7]);
  store_temp<compiler_struct_opti::MyStruct>([6]) -> ([8]);
  store_temp<Unit>([7]) -> ([9]);
  return([8], [9]);

calling_bar

Something interesting here is that since we're calling bar which takes MyStruct as an parameter, we need to reconstruct the struct before calling bar, and then deconstruct it again after the call to bar to perform the following modifications. This is because the function bar could modify the struct, and we need to make sure that the struct is in a consistent state after the call to bar. This is why we have the following instructions:

  struct_deconstruct<compiler_struct_opti::MyStruct>([0]) -> ([1], [2], [3], [4], [5]);
  drop<felt252>([1]) -> ();
  drop<felt252>([2]) -> ();
  felt252_const<1>() -> ([6]);
  felt252_const<2>() -> ([7]);
  struct_construct<compiler_struct_opti::MyStruct>([6], [7], [3], [4], [5]) -> ([8]);
  store_temp<compiler_struct_opti::MyStruct>([8]) -> ([10]);
  dup<compiler_struct_opti::MyStruct>([10]) -> ([10], [8]);
  function_call<user@compiler_struct_opti::bar>([10]) -> ([9]);
  drop<Unit>([9]) -> ();
  struct_deconstruct<compiler_struct_opti::MyStruct>([8]) -> ([11], [12], [13], [14], [15]);
  drop<felt252>([13]) -> ();
  drop<felt252>([14]) -> ();
  drop<felt252>([15]) -> ();
  felt252_const<3>() -> ([16]);
  felt252_const<4>() -> ([17]);
  felt252_const<5>() -> ([18]);
  struct_construct<compiler_struct_opti::MyStruct>([11], [12], [16], [17], [18]) -> ([19]);
  struct_construct<Unit>() -> ([20]);
  store_temp<compiler_struct_opti::MyStruct>([19]) -> ([21]);
  store_temp<Unit>([20]) -> ([22]);
  return([21], [22]);

calling_bar_inlined

Even though the function bar_inlined is inlined, the compiler still needs to reconstruct the struct before the call to bar_inlined and deconstruct it after the call to bar_inlined. This could probably be optimized by the compiler - because we're just calling struct_construct to call struct_deconstruct right after - but it's not the case for now.
Another interesting point is that in the inlined version of the function, calling store_temp (which supposedly stores the struct in memory prior to the function call) returns the same variable ID, meaning that this is actually a no-op.

  struct_deconstruct<compiler_struct_opti::MyStruct>([0]) -> ([1], [2], [3], [4], [5]);
  drop<felt252>([1]) -> ();
  drop<felt252>([2]) -> ();
  felt252_const<1>() -> ([6]);
  felt252_const<2>() -> ([7]);
  struct_construct<compiler_struct_opti::MyStruct>([6], [7], [3], [4], [5]) -> ([8]);
  store_temp<compiler_struct_opti::MyStruct>([8]) -> ([8]);
  dup<compiler_struct_opti::MyStruct>([8]) -> ([8], [9]);
  struct_deconstruct<compiler_struct_opti::MyStruct>([9]) -> ([10], [11], [12], [13], [14]);
  felt252_add([10], [11]) -> ([15]);
  store_temp<felt252>([15]) -> ([15]);
  felt252_add([15], [12]) -> ([16]);
  store_temp<felt252>([16]) -> ([16]);
  felt252_add([16], [13]) -> ([17]);
  store_temp<felt252>([17]) -> ([17]);
  felt252_add([17], [14]) -> ([18]);
  drop<felt252>([18]) -> ();
  struct_deconstruct<compiler_struct_opti::MyStruct>([8]) -> ([19], [20], [21], [22], [23]);
  drop<felt252>([21]) -> ();
  drop<felt252>([22]) -> ();
  drop<felt252>([23]) -> ();
  felt252_const<3>() -> ([24]);
  felt252_const<4>() -> ([25]);
  felt252_const<5>() -> ([26]);
  struct_construct<compiler_struct_opti::MyStruct>([19], [20], [24], [25], [26]) -> ([27]);
  struct_construct<Unit>() -> ([28]);
  store_temp<compiler_struct_opti::MyStruct>([27]) -> ([29]);
  store_temp<Unit>([28]) -> ([30]);
  return([29], [30]);

Overall, the compiler optimizes structs modifications as we would expect a performant compiler to - and it's a feature that is very welcomed compared to the previous way of doing things in Cairo 0, where we had to explicitly define functions to update structs that would manually reconstruct the struct after each update.

What if we looked at Cairo Assembly?

Sierra is an intermediate representation of Cairo code, and it's not the code that is actually executed by the VM. The VM executes Cairo Assembly, which is a lower-level representation of Cairo code. Let's look at the Cairo Assembly generated by the compiler for the same file as above.

Remember that Cairo only has three registers: ap, fp, and pc, and a single memory. Structs are a high-level representation of packed data, but they have no representation in Cairo Assembly. Instead, structs data are represented as a sequence of cells in memory, and the compiler keeps track of the offset of each field of the struct in memory.

Inlined functions

While not the primary objective of this study, it's interesting to see how the compiler optimizes the code when the function is inlined. Let's look at the Cairo Assembly generated for both calling_bar calling_bar_inlined:

calling_bar

// Note: calling_bar function
[ap + 0] = 1, ap++;
[ap + 0] = 2, ap++;
// Note: the following instructions are just copying the
// struct fields passed as function argument to memory
// in order to prepare the `bar` call
[ap + 0] = [fp + -5], ap++;
[ap + 0] = [fp + -4], ap++;
[ap + 0] = [fp + -3], ap++;
call rel 11;
// Note: the following 2 instructions are just copying the previously updated struct values to memory in order to prepare the return of the function
[ap + 0] = [ap + -10], ap++;
[ap + 0] = [ap + -10], ap++;
// This is where `c, d, e` are updated
[ap + 0] = 3, ap++;
[ap + 0] = 4, ap++;
[ap + 0] = 5, ap++;
ret;
// Note: bar function
[ap + 0] = [fp + -7] + [fp + -6], ap++;
[ap + 0] = [ap + -1] + [fp + -5], ap++;
[ap + 0] = [ap + -1] + [fp + -4], ap++;
ret;

calling_bar_inlined

// Note: calling_bar_inlined function
[ap + 0] = 1, ap++;
[ap + 0] = 2, ap++;
[ap + 0] = [fp + -5], ap++;
[ap + 0] = [fp + -4], ap++;
[ap + 0] = [fp + -3], ap++;
[ap + 0] = [ap + -5] + [ap + -4], ap++;
[ap + 0] = [ap + -1] + [ap + -4], ap++;
[ap + 0] = [ap + -1] + [ap + -4], ap++;
[ap + 0] = [ap + -8], ap++;
[ap + 0] = [ap + -8], ap++;
[ap + 0] = 3, ap++;
[ap + 0] = 4, ap++;
[ap + 0] = 5, ap++;
ret;
// Note: bar_inlined function
[ap + 0] = [fp + -7] + [fp + -6], ap++;
[ap + 0] = [ap + -1] + [fp + -5], ap++;
[ap + 0] = [ap + -1] + [fp + -4], ap++;
ret;

The first noticeable thing here is that inlined function only runs one extra instruction, the call rel 11, but otherwise both run the same amount of instructions. In the inlined version, all the fields of my_struct are written to memory before the execution of the inlined bar function. The normal version is similar, except that it uses a call instruction to execute the non-inlined version of the bar function. Both versions execute the same amount of instructions

Back to understanding how structs are handled

Let's look at the Cairo Assembly generated for the update_many and calling_bar functions. The compiled code for calling_bar is available above.

update_many
```
[ap + 0] = 1, ap++;
[ap + 0] = 2, ap++;
[ap + 0] = 3, ap++;
[ap + 0] = 4, ap++;
[ap + 0] = 5, ap++;
ret;
```
This one is pretty straightforward - we just write the values of the struct fields to memory, and since no other operation is performed in between, we can just return them as is (as our function returns my_struct passed by ref).
calling_bar

The main difference here is that we need to store the struct in memory before calling bar, and then store it in memory again it after the call to bar to return it from the calling_bar function.

Basically, what happens here is that we write the values of the struct fields to memory (in order to call the function bar), and once bar returns we still have to update the last two fields. Because my_struct is passed as ref, it is returned from the function - therefore, the last instructions in our function are to write all the fields of my_struct to memory. Given that we already updated the values of a,b, we just refer to the values stored in memory previously for these fields, and we write the new values for d and e to memory.

Why you should minimize the amount of functions called

Every time you make a function call, you need to write the arguments to memory and if that functions returns something, you need to push it back to memory before returning.

To illustrate this more accurately, let's focus on this specific case: using multiple setters instead of updating the struct directly. Let's look at the following code:

struct MyStruct{
    a: felt252,
    b: felt252,
    c: felt252
}

#[generate_trait]
impl MyStructImpl of MyStructTrait{
    fn set_a(ref my_struct: MyStruct, a: felt252) {
        my_struct.a = a;
    }

    fn set_b(ref my_struct: MyStruct, b: felt252) {
        my_struct.b = b;
    }

    fn set_c(ref my_struct: MyStruct, c: felt252) {
        my_struct.c = c;
    }
}

Let's see how the following compiles to CASM:

fn main() -> MyStruct {
    let mut my_struct = MyStruct { a: 1, b: 2, c: 3 };
    my_struct.set_a(4);
    my_struct.set_b(5);
    my_struct.set_c(6);
    my_struct
}

[ap + 0] = 1, ap++;
[ap + 0] = 2, ap++;
[ap + 0] = 3, ap++;
[ap + 0] = 4, ap++;
call rel 20;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = 5, ap++;
call rel 17;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = 6, ap++;
call rel 14;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = [ap + -3], ap++;
[ap + 0] = [ap + -3], ap++;
ret;
[ap + 0] = [fp + -3], ap++;
[ap + 0] = [fp + -5], ap++;
[ap + 0] = [fp + -4], ap++;
ret;
[ap + 0] = [fp + -6], ap++;
[ap + 0] = [fp + -3], ap++;
[ap + 0] = [fp + -4], ap++;
ret;
[ap + 0] = [fp + -6], ap++;
[ap + 0] = [fp + -5], ap++;
[ap + 0] = [fp + -3], ap++;
ret;

Look at how the following is way more optimized:

fn main() -> MyStruct{
    let mut my_struct = MyStruct { a: 1, b: 2, c: 3 };
    my_struct.a = 4;
    my_struct.b = 5;
    my_struct.c = 6;
    my_struct
}

[ap + 0] = 4, ap++;
[ap + 0] = 5, ap++;
[ap + 0] = 6, ap++;
ret;

Good programming practices would expect you to use getters and setters to modify/access struct members, as it makes refactoring easier. But if you're looking into optimizations, it might not be the best idea...

One last thing: What if our getters/setters were inlined?

Let's look at the following code:

#[derive(Copy, Drop)]
struct MyStruct {
    a: felt252,
    b: felt252,
    c: felt252
}

#[generate_trait]
impl MyStructImpl of MyStructTrait {
    #[inline(always)]
    fn set_a(ref self: MyStruct, a: felt252) {
        self.a = a;
    }

    #[inline(always)]
    fn set_b(ref self: MyStruct, b: felt252) {
        self.b = b;
    }

    #[inline(always)]
    fn set_c(ref self: MyStruct, c: felt252) {
        self.c = c;
    }
}

fn main() -> MyStruct {
    let mut my_struct = MyStruct { a: 1, b: 2, c: 3 };
    my_struct.set_a(4);
    my_struct.set_b(5);
    my_struct.set_c(6);
    my_struct
}

Surely, we would expect the compiler to behave the same way as what we noticed before, right? Well, not exactly. Let's look at the CASM generated for this code:

[ap + 0] = 4, ap++;
[ap + 0] = 5, ap++;
[ap + 0] = 6, ap++;
ret;

It's strictly equivalent to the code generated when we directly update the struct members. So the solution might actually be to use inlined methods!

One weird edge case: inlining free functions

#[derive(Copy, Drop)]
struct MyStruct {
    a: felt252,
    b: felt252,
    c: felt252
}

#[inline(always)]
fn set_a(ref self: MyStruct, a: felt252) {
    self.a = a;
}

#[inline(always)]
fn set_b(ref self: MyStruct, b: felt252) {
    self.b = b;
}

#[inline(always)]
fn set_c(ref self: MyStruct, c: felt252) {
    self.c = c;
}

fn main() -> MyStruct {
    let mut my_struct = MyStruct { a: 1, b: 2, c: 3 };
    set_a(ref my_struct, 4);
    set_b(ref my_struct, 5);
    set_c(ref my_struct, 6);
    my_struct
}

We would expect it to be the same result as with methods, but it's not the case! However, the difference is actually not that big: When we look at the main function (the last one in this block, what the program really executes), it's the same - the only difference is that the CASM codes contain the code of the inlined functions, but they're not called in this specific case.

Here's the output

[ap + 0] = [fp + -3], ap++;
[ap + 0] = [fp + -5], ap++;
[ap + 0] = [fp + -4], ap++;
ret;
[ap + 0] = [fp + -6], ap++;
[ap + 0] = [fp + -3], ap++;
[ap + 0] = [fp + -4], ap++;
ret;
[ap + 0] = [fp + -6], ap++;
[ap + 0] = [fp + -5], ap++;
[ap + 0] = [fp + -3], ap++;
ret;
[ap + 0] = 4, ap++;
[ap + 0] = 5, ap++;
[ap + 0] = 6, ap++;
ret;

Conclusion

This study gave us more insights into how the compiler optimizes struct modifications. We can draw one very valuable conclusion: what is actually expensive is to have multiple functions call and pass around the struct as an argument, which requires writing the struct to memory before each function call and reading it back from memory after each function call.

We highlighted the fact that methods and free functions are not treated the same way by the compiler and that inlining free functions can actually be a good idea if you're looking for performance.

If you want to combine best programming practices and good optimizations, then consider defining methods on your type and inlining them. This way, you can still refactor your code easily, and you can benefit from the optimizations of the compiler.

loothero · 2023-08-24T06:15:20Z

loothero
Aug 24, 2023

Thanks for performing and sharing this great analysis @enitrat. I was able to shed a notable amount of gas from Loot Survivor using this knowledge: BibliothecaDAO/loot-survivor@d9be998

I appreciate you 🙏

0 replies

ClementWalter · 2023-08-25T10:22:32Z

ClementWalter
Aug 25, 2023
Maintainer

I'm wondering in the sierra code if update_many as it is (which is update_all) would be the same for a update_some (not all)

1 reply

enitrat Aug 25, 2023
Maintainer Author

It would be a bit different, we would start by destructing the sturct into multiple variables, dropping the ones not re-used(so those updated), and the rest would be similar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding how the compiler optimizes structs mutation #135

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Understanding how the compiler optimizes structs mutation #135

Uh oh!

Uh oh!

enitrat Aug 18, 2023 Maintainer

How the compiler optimizes structs modifications

Introduction

Methodology

Results

What if we looked at Cairo Assembly?

Inlined functions

Back to understanding how structs are handled

Why you should minimize the amount of functions called

One last thing: What if our getters/setters were inlined?

One weird edge case: inlining free functions

Conclusion

Replies: 2 comments · 1 reply

Uh oh!

loothero Aug 24, 2023

Uh oh!

Uh oh!

ClementWalter Aug 25, 2023 Maintainer

Uh oh!

enitrat Aug 25, 2023 Maintainer Author

enitrat
Aug 18, 2023
Maintainer

Replies: 2 comments 1 reply

loothero
Aug 24, 2023

ClementWalter
Aug 25, 2023
Maintainer

enitrat Aug 25, 2023
Maintainer Author