Skip to content

Conversation

gustavo-grieco
Copy link
Collaborator

@gustavo-grieco gustavo-grieco commented Jun 28, 2025

Description

This PR was created just to receive comments regarding a potential usage of the first Futamura projection.
Right now, it has only a couple of opcodes implemented. Implementing opcode in this context means taking the code from src/EVM.hs and copy them into a new function with essentially now changes at all, so it is a fast process, compared to the actual implementation from scratch.

In order to do some quick testing, I added a new flag in the hevm exec called --use-futamura that will do the concrete executing with the corresponding Futamura projection. A small demo:

$ hevm exec --code 0x5F5F0100FD
"Return: 0x"
$ hevm exec --code 0x5F5F0100FD --use-futamura
Generating Haskell code for EVM specialization in: /tmp/evmjit-e235cc1552380e54/Generated.hs
...
[1 of 1] Compiling Generated        ( /tmp/evmjit-e235cc1552380e54/Generated.hs, /tmp/evmjit-e235cc1552380e54/Generated.o )
"Return: 0x"

This example shows the execution of an add instruction, using two zeros as arguments. Finally, the program stops. There is an additional revert opcode, but it has no effect (as intended). We can peek the generated code:

runSpecialized :: StateT (VM Concrete s) (ST s) ()
runSpecialized = do
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodePush0
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodePush0
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodeAdd
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodeStop
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodeRevert
     Just r -> return ()

The use of of the op implicit parameter needs to be fixed but I hardcoded it for a quick test. Despite the code is not as efficient as it should be, the execution looks good. Is it faster than concrete execution as we have it today? It is hard to say at this point, we need to benchmark to see this.

There is a non-trivial amount of work as it needs a few important pieces more:

  • Take the bytecodes, splits it in basic blocks (with external calls considered as the end of a basic block as well), and save each futamura projection. This needs to be done for every contract involved.
  • Write a top level function, similar to run (EVM/Exec.hs) which fetches the current futamura projection function and runs it. It should find the next basic block and repeat until the transaction is over.
  • Keep adding opcodes
  • Add multi-contract support (right now, only executing the target contract is supported)

The final implementation will require a large refactoring of the opcode implementation (but it should not be too hard, since all the code is there).

@gustavo-grieco gustavo-grieco changed the title RFC: Futamura projection proof-of-concept RFC: The path to Futumura 🗻 Jul 1, 2025
@gustavo-grieco gustavo-grieco changed the title RFC: The path to Futumura 🗻 RFC: The path to Futamura 🗻 Jul 1, 2025
@msooseth
Copy link
Collaborator

msooseth commented Jul 2, 2025

Oh wow, this is really cool! I am very busy with writing a presentation for a 4h seminar I will be giving on smart contract verification next week, but I will be back full speed on this and other PRs in about 1-1.5 weeks! This actually looks like a great place to start :) Thanks for this, I really needed this initial kick-off, I think I will be able to do some things from here on!

@d-xo
Copy link
Collaborator

d-xo commented Jul 2, 2025

Very exciting! I'm curious if you had considered at all generating llvm directly instead of haskell?

@gustavo-grieco
Copy link
Collaborator Author

Current state of this PR:

  • Basic blocks are fully produced (but probably the code to split them is not 100% correct)
  • Basic blocks should be properly switch when the PC jumps into them
  • Code produced is not very efficient.

Missing components:

  • Only a few opcodes implemented. Jumps and calls are missing so we don't know if the basic block switching really works.
  • No support for handling more than one contract, nor deploying new ones.

Demo of the code produced:

$ hevm exec --code 0x5F5F9000005FFD  --use-futamura
Splitting into basic blocks: [((0,4),[OpPush0,OpPush0,OpSwap 1,OpStop]),((4,5),[OpStop]),((5,7),[OpPush0,OpRevert])]
...

The code produced contains this one:

runBasicBlock_0_4 :: StateT (VM Concrete s) (ST s) ()
runBasicBlock_0_4 = do
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodePush0
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodePush0
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodeSwap (1 :: Int)
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodeStop
     Just r -> return ()
runBasicBlock_4_5 :: StateT (VM Concrete s) (ST s) ()
runBasicBlock_4_5 = do
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodeStop
     Just r -> return ()
runBasicBlock_5_7 :: StateT (VM Concrete s) (ST s) ()
runBasicBlock_5_7 = do
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodePush0
     Just r -> return ()
 get >>= \vm ->
   case vm.result of
     Nothing -> let ?op = 1 in runOpcodeRevert
     Just r -> return ()

@gustavo-grieco
Copy link
Collaborator Author

I finally implemented enough opcodes to be able to benchmark a simple loop, with some operations and different revert conditions. The contract I'm using for benchmark looks like this:

contract C {
	fallback() payable external {
        uint256 z = 1;
        for (int i = 0; i < (10 ** 6) * 8; i++) {
        	z = 2 * z + 1;
		z = z + 1;
                z = z / 5;
                z = z + 1;
                z = z * 6;
                z = (z * 5) / 10;
        }
        if ((z % 1) == 0) 
	        revert("one");
        if ((z % 2) == 0) 
	        revert("two");
        if ((z % 3) == 0) 
	        revert("three");
        if ((z % 4) == 0) 
	        revert("four");
        }
}

Currently, the basic blocks are stored in a map, and then called when need it, using the dispatcherLoop function.
The prototype contains everything needed to run the contract and compare it to the non-futamura execution (e.g. using the interpreter). Unfortunately, this iteration is around 5% to 10% slower slower than the interpreter for this contract. I believe there are a few reasons for it:

  1. The use of the stack is hard to optimize: it is immutable and requires to be copied every time it is used. There are different options here, either we do stack unboxing where the top of the stack is moved into another data structure and updated when a basic block has finished executing, or we remove the stack completely. Removing the stack means to have a way to compute pure expressions from every basic block, which what they did the authors of EFCF.

  2. The dispatcher loop calls the compiled basic block code as pointers. This means that GHC cannot see the complete program during the compilation process, and therefore, cannot apply optimizations beyond the basic block level. It seems that basic block optimizations are not enough, and if it is able to reasoning about the complete program, including loops, it will be able to know which parts of the VM state are changed. In order to workaround this, we will need to generate the complete program as Haskell code, not only basic block.
    To get that, we will need to know how basic block are connected so we can chain then with if conditions like this:

runBasicBlock_0_13 :: StateT (VM Concrete s) (ST s) ()
runBasicBlock_0_13 = do
 runOpcodePush 1
 ...
 runOpcodeJumpi
  if cond /= 0
    then runBasicBlock_25_48
    else runBasicBlock_48_49 

@gustavo-grieco
Copy link
Collaborator Author

I need to discuss with the rest of the team regarding this prototype, but my impression on what needs to be done to unlock full speed here:

  1. Process gas and PC at the basic block level, so it will only get updated once. This will mean that, if the gas is not enough to execute the complete basic block, then we can abort the execute before it is finishes. For the PC is easier, since we know the value of it at each point of the basic block.
  2. Compute stack items (e.g. last three elements) at the basic block level, this could allow to only execute pure expressions at the end of each basic block instead of the intermediate steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants