Skip to content

Conversation

yacovm
Copy link
Contributor

@yacovm yacovm commented Oct 5, 2025

Why this should be merged

Currently whenever we ask the VM to wait for an event, it checks if the mempool has transactions in it, and if so, it proceeds to return a "pending transactions" event.

The mempool has or has not transactions in it according to the block that the VM is configured to build on top on.

When the VM is being told to change the preferred block to build the next block on, the mempool is asynchronously re-organized in the background according to the block preference change.

Since the mempool re-organization happens asynchronously and in the background, when a VM changes its preference and is immediately asked to build a new block, the mempool may still contain transactions in it, because the mempool re-organization is still being performed asynchronously in the background. This leads to a false positive, as the mempool may not have transactions in it after its re-organization, but the WaitForEvent API call may return that the mempool has transactions in it and would result in building a block that shouldn't have been built.

This PR fixes this false positive, as explained below.

How this works

When we set the preference in the VM or accept a block, we mark the block height to be pending.
In parallel, we subscribe to updates from the mempool which are sent only once the re-org corresponding to a block has finished. Upon such an update, we mark the block as the latest block.

When the VM checks whether to build a block, additionally to checking whether there are transactions in the mempool, we now also check whether the mempool is pending a re-org. A mempool is pending a re-org if the block number that is set when the block is accepted or the preference changes (the pending number) is different than the block number that is received in the latest subscription update from the mempool (the latest number).

How this was tested

Rename TestWaitForEvent from the PR to testWaitForEvent and create TestWaitForEvent as follows:

func TestWaitForEvent(t *testing.T) {
	for i := 0; i < 1000; i++ {
		t.Run(fmt.Sprintf("run-%d", i), func(t *testing.T) {
			t.Parallel()
			testWaitForEvent(t)
		})
	}
}

The test passes but miserably fails in master.

Need to be documented?

No

Need to update RELEASES.md?

No

@yacovm yacovm requested a review from a team as a code owner October 5, 2025 17:54
@yacovm yacovm marked this pull request as draft October 5, 2025 17:55
@yacovm yacovm force-pushed the yacovm/1312 branch 2 times, most recently from da88c79 to 7bfe897 Compare October 5, 2025 19:25
@yacovm yacovm marked this pull request as ready for review October 5, 2025 19:32
@yacovm yacovm changed the title Wait for tx pool event loop Wait for tx pool re-org when emitting pending transaction events Oct 5, 2025
plugin/evm/vm.go Outdated
case <-vm.shutdownChan:
return
case event := <-events:
if event.Head == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can event.Head be nil or event.Head.Number be nil?


if b.ethBlock != nil {
vm.setPendingBlock(b.ethBlock.NumberU64())
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this happen?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we do check this in verification

Copy link
Collaborator

@ceyonur ceyonur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, I think we should move most of the logic to builder.

plugin/evm/vm.go Outdated
Comment on lines 259 to 261
blockNumLock sync.RWMutex
pendingBlockNum uint64
latestBlockNum uint64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just push these to builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it

plugin/evm/vm.go Outdated
Comment on lines 867 to 874
events := make(chan core.NewTxPoolReorgEvent)
sub := vm.txPool.SubscribeNewReorgEvent(events)
vm.shutdownWg.Add(1)
go func() {
vm.subscribeToTxPoolEvents(events)
sub.Unsubscribe()
vm.shutdownWg.Done()
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use builder to subscribe this? I think generally block builder should be a separate entity from vm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it

plugin/evm/vm.go Outdated
Comment on lines 1247 to 1250
vm.blockNumLock.Lock()
defer vm.blockNumLock.Unlock()
vm.pendingBlockNum = blockNum
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again can be part of block builder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it

Comment on lines 1065 to 1032
err = vm.blockChain.SetPreference(block.GetEthBlock())
if err == nil {
vm.setPendingBlock(block.GetEthBlock().NumberU64())
}
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = vm.blockChain.SetPreference(block.GetEthBlock())
if err == nil {
vm.setPendingBlock(block.GetEthBlock().NumberU64())
}
return err
}
err = vm.blockChain.SetPreference(block.GetEthBlock())
if err != nil {
return err
}
vm.setPendingBlock(block.GetEthBlock().NumberU64())
return err
}

plugin/evm/vm.go Outdated
func (vm *VM) subscribeToTxPoolEvents(events <-chan core.NewTxPoolReorgEvent) {
for {
select {
case <-vm.shutdownChan:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the ctx in onNormalOperationsStarted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can

"height", b.Height(),
)

if b.ethBlock != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we set this here? I think setting this in SetPreference should be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we later call vm.blockchain.Accept() and then setPreference.

plugin/evm/vm.go Outdated
vm.builder = vm.NewBlockBuilder(vm.extensionConfig.ExtraMempool, func() bool {
vm.blockNumLock.RLock()
defer vm.blockNumLock.RUnlock()
return vm.pendingBlockNum != vm.latestBlockNum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused with this equation doesn't this mean

b.pendingPoolUpdate will return true if they're not equal and it will keep waiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, if they are not equal - it means there is a pending mempool re-org, so we need to keep waiting.

If they are equal, then there is no pending mempool re-org and it is safe to return the pending event.

plugin/evm/vm.go Outdated
vm.blockNumLock.Lock()
vm.latestBlockNum = event.Head.Number.Uint64()
vm.blockNumLock.Unlock()
vm.builder.signalCanBuild()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it enough to check event.Head against vm.pendingBlockNumber and then signal/not signal the builder? why do we need vm.latestBlockNum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wait on the condition variable:

	for !b.needToBuild() || b.pendingPoolUpdate() {
		if err := b.pendingSignal.Wait(ctx); err != nil {
			return time.Time{}, common.Hash{}, err
		}
	}

In case we either don't need to build a new block, or there is a pending pool update.
If we need to build a new block but there is a pending pool update, we need to wait on the condition variable.

If we don't check the pending pool update in case we need to build a block, then we won't wait on the condition variable and will immediately try to build a block and may suffer the false positive.

The signal is done to wake up and check again if the mempool re-org has finished, because without the signal we may wait on the condition variable and not wake up once the re-org has finished.

@yacovm
Copy link
Contributor Author

yacovm commented Oct 7, 2025

Few comments, I think we should move most of the logic to builder.

Thanks for the quick review, addressed your comments!

yacovm added 3 commits October 8, 2025 14:37
Currently whenever we ask the VM to wait for an event, it checks if the mempool has transactions in it, and if so, it proceeds to return a "pending transactions" event.

The mempool has or has not transactions in it according to the block that the VM is configured to build on top on.

When the VM is being told to change the preferred block to build the next block on, the mempool is asynchronously re-organized in the background according to the block preference change.

Since the mempool re-organization happens asynchronously and in the background, when a VM changes its preference and is immediately asked to build a new block, the mempool may still contain transactions in it, because the mempool re-organization is still being performed asynchronously in the background. This leads to a false positive, as the mempool may not have transactions in it after its re-organization, but the WaitForEvent API call may return that the mempool has transactions in it and would result in building a block that shouldn't have been built.

This PR fixes this false positive, as explained below:

When we set the preference in the VM or accept a block, we mark the block height to be pending.
In parallel, we subscribe to updates from the mempool which are sent only once the re-org corresponding to a block has finished. Upon such an update, we mark the block as the latest block.

When the VM checks whether to build a block, additionally to checking whether there are transactions in the mempool, we now also check whether the mempool is pending a re-org. A mempool is pending a re-org if the block number that is set when the block is accepted or the preference changes (the pending number) is different than the block number that is received in the latest subscription update from the mempool (the latest number).
Signed-off-by: Yacov Manevich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants