Skip to content

Conversation

@bragaigor
Copy link
Contributor

Refactor PruneExecutionDB, more specifically findImportantRoots so it does not depend on consensusDB, but only on executionDB. The former PruneExecutionDB used to depend on consensusDB for 2 things:

  • latest validated block
    • used in Prune mode "validator"
    • fetched using staker.ReadLastValidatedInfo
  • latest finalized block
    • used in Prune mode "minimal"
    • fetched using MEL batch count + batch metadata

For the replacements we use:

  • latest validated block:
    • use executionDB to store such validated block using key LastValidatedBlockHashKey
    • add a call to executionDB.Put(validatedBlockHashKey, ... to SetFinalityData to store latest validated block
  • latest finalized block
    • no new logic was added for this
    • using existing rawdb.ReadFinalizedBlockHash and rawdb.ReadHeaderNumber to get finalized block number and hash

Closes NIT-4266

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 0% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.87%. Comparing base (9a40a7e) to head (8e31889).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4217      +/-   ##
==========================================
- Coverage   32.90%   32.87%   -0.03%     
==========================================
  Files         471      476       +5     
  Lines       56576    56781     +205     
==========================================
+ Hits        18614    18666      +52     
- Misses      34707    34849     +142     
- Partials     3255     3266      +11     

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

❌ 7 Tests Failed:

Tests completed Failed Passed Skipped
4028 7 4021 0
View the top 3 failed tests by shortest run time
TestFIFOSequentialLocking
Stack Traces | 0.010s run time
=== RUN   TestFIFOSequentialLocking
    fifo_lock_test.go:34: 
        	Error Trace:	/home/runner/work/nitro/nitro/bold/protocol/sol/fifo_lock_test.go:34
        	Error:      	Not equal: 
        	            	expected: []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
        	            	actual  : []int{0, 1, 2, 3, 4, 6, 5, 7, 8, 9}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -6,4 +6,4 @@
        	            	  (int) 4,
        	            	+ (int) 6,
        	            	  (int) 5,
        	            	- (int) 6,
        	            	  (int) 7,
        	Test:       	TestFIFOSequentialLocking
--- FAIL: TestFIFOSequentialLocking (0.01s)
TestRedisProduceComplex/one_producer,_all_consumers_are_active
Stack Traces | 1.330s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
�[36mDEBUG�[0m[01-28|17:53:50.538] consumer: xdel                           �[36mcid�[0m=19e92712-0243-4a10-9e9b-d2a81658d2de �[36mmessageId�[0m=1769622829408-12
�[36mDEBUG�[0m[01-28|17:53:50.538] consumer: xdel                           �[36mcid�[0m=2113d871-7a62-4cdf-bbba-daf26e6f4304 �[36mmessageId�[0m=1769622829408-15
�[36mDEBUG�[0m[01-28|17:53:50.539] Redis stream consuming                   �[36mconsumer_id�[0m=b8725f72-88dc-4433-84b2-10aeee4d938f �[36mmessage_id�[0m=1769622829411-0
�[36mDEBUG�[0m[01-28|17:53:50.539] consumer: setting result                 �[36mcid�[0m=b8725f72-88dc-4433-84b2-10aeee4d938f �[36mmsgIdInStream�[0m=1769622829411-0  �[36mresultKeyInRedis�[0m=result-key:stream:18c73973-87fc-4297-93ca-e69f3dced9a5.1769622829411-0
�[36mDEBUG�[0m[01-28|17:53:50.539] Redis stream consuming                   �[36mconsumer_id�[0m=fe07d597-118f-4699-a82d-868a7f72d5be �[36mmessage_id�[0m=1769622829419-0
�[36mDEBUG�[0m[01-28|17:53:50.539] consumer: setting result                 �[36mcid�[0m=fe07d597-118f-4699-a82d-868a7f72d5be �[36mmsgIdInStream�[0m=1769622829419-0  �[36mresultKeyInRedis�[0m=result-key:stream:18c73973-87fc-4297-93ca-e69f3dced9a5.1769622829419-0
�[36mDEBUG�[0m[01-28|17:53:50.539] consumer: xdel                           �[36mcid�[0m=d4d9e9fa-fff8-4016-8125-3a31c420a1da �[36mmessageId�[0m=1769622829408-16
�[33mWARN �[0m[01-28|17:53:50.539] XClaimJustID returned empty response when indicating heartbeat �[33mmsgID�[0m=1769622829408-14
�[36mDEBUG�[0m[01-28|17:53:50.539] consumer: xack                           �[36mcid�[0m=fe07d597-118f-4699-a82d-868a7f72d5be �[36mmessageId�[0m=1769622829419-0
�[36mDEBUG�[0m[01-28|17:53:50.539] consumer: xack                           �[36mcid�[0m=403fded5-4873-47d7-bbb4-b6855a62deeb �[36mmessageId�[0m=1769622829409-2
�[36mDEBUG�[0m[01-28|17:53:50.539] consumer: xack                           �[36mcid�[0m=b8725f72-88dc-4433-84b2-10aeee4d938f �[36mmessageId�[0m=1769622829411-0
�[36mDEBUG�[0m[01-28|17:53:50.540] consumer: xdel                           �[36mcid�[0m=b8725f72-88dc-4433-84b2-10aeee4d938f �[36mmessageId�[0m=1769622829411-0
�[36mDEBUG�[0m[01-28|17:53:50.540] consumer: xdel                           �[36mcid�[0m=fe07d597-118f-4699-a82d-868a7f72d5be �[36mmessageId�[0m=1769622829419-0
�[36mDEBUG�[0m[01-28|17:53:50.540] consumer: xdel                           �[36mcid�[0m=403fded5-4873-47d7-bbb4-b6855a62deeb �[36mmessageId�[0m=1769622829409-2
�[36mDEBUG�[0m[01-28|17:53:50.567] checkResponses                           �[36mresponded�[0m=70 �[36merrored�[0m=0 �[36mchecked�[0m=99
�[36mDEBUG�[0m[01-28|17:53:50.572] redis producer: check responses starting
�[36mDEBUG�[0m[01-28|17:53:50.606] checkResponses                           �[36mresponded�[0m=29 �[36merrored�[0m=0 �[36mchecked�[0m=29
�[31mERROR�[0m[01-28|17:53:50.607] Error from XpendingExt in getting PEL for auto claim �[31merr�[0m="context canceled" �[31mpendingLen�[0m=0
�[36mDEBUG�[0m[01-28|17:53:50.701] Error destroying a stream group          �[36merror�[0m="dial tcp 127.0.0.1:38313: connect: connection refused"
--- FAIL: TestRedisProduceComplex/one_producer,_all_consumers_are_active (1.33s)
TestRedisProduceComplex
Stack Traces | 20.850s run time
=== RUN   TestRedisProduceComplex
=== PAUSE TestRedisProduceComplex
=== CONT  TestRedisProduceComplex
--- FAIL: TestRedisProduceComplex (20.85s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

@bragaigor bragaigor marked this pull request as ready for review January 9, 2026 15:32

if executionDB != nil && finalizedBlockHeader != nil {
finalizedBlockHash := finalizedBlockHeader.Hash()
err := executionDB.Put(ValidatedBlockHashKey, finalizedBlockHash.Bytes())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a bit confusing: shouldn't we use something like FinalizedBlockHashKey instead of ValidatedBlockHashKey?

Copy link
Contributor Author

@bragaigor bragaigor Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we should be using validatedBlockHeader. I probably got confused with the names. So to get validatedBlockHeader we need something like getFinalityBlockHeader but for validatedBlockHeader. My tentative to do so is:

func (s *SyncMonitor) getValidatedBlockHeader(
	validatedFinalityData *arbutil.FinalityData,
) (*types.Header, error) {
	if validatedFinalityData == nil {
		return nil, nil
	}

	validatedMsgIdx := validatedFinalityData.MsgIdx
	validatedBlockHash := validatedFinalityData.BlockHash

	validatedBlockNumber := s.exec.MessageIndexToBlockNumber(validatedMsgIdx)
	validatedBlock := s.exec.bc.GetBlockByNumber(validatedBlockNumber)
	if validatedBlock == nil {
		log.Debug("Validated block not found", "blockNumber", validatedBlockNumber)
		return nil, nil
	}
	if validatedBlock.Hash() != validatedBlockHash {
		errorMsg := fmt.Sprintf(
			"finality block hash mismatch, blockNumber=%v, block hash provided by consensus=%v, block hash from execution=%v",
			validatedBlockNumber,
			validatedBlockHash,
			validatedBlock.Hash(),
		)
		return nil, errors.New(errorMsg)
	}
	return validatedBlock.Header(), nil
}

then we can get with:

   validatedBlockHeader, err := s.getValidatedBlockHeader(
		validatedFinalityData,
	)

and use that instead of finalizedBlockHeader. I'm just curious if there's a better way of fetching validatedBlockHeader?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds reasonable, but I think we should summon @diegoximenes here for his expertise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this part that compares the block hash provided by Consensus through the SetFinality call, with the block hash stored in the Execution DB, is a procedure that Tsahi suggested, to make the node a little bit more resistant to some reorg scenarios.

It is OK to have a function like you mentioned, so all cases (safe/finalized/validated) are checked with this block hash matching procedure, and we reuse code.
Something like that, variables should be renamed, func name can also have another name.

func checkBlockHashAndGetHeader(....) ... {
    finalityBlockNumber := s.exec.MessageIndexToBlockNumber(finalityMsgIdx)
	finalityBlock := s.exec.bc.GetBlockByNumber(finalityBlockNumber)
	if finalityBlock == nil {
		log.Debug("Finality block not found", "blockNumber", finalityBlockNumber)
		return nil, nil
	}
	if finalityBlock.Hash() != finalityBlockHash {
		errorMsg := fmt.Sprintf(
			"finality block hash mismatch, blockNumber=%v, block hash provided by consensus=%v, block hash from execution=%v",
			finalityBlockNumber,
			finalityBlockHash,
			finalityBlock.Hash(),
		)
		return nil, errors.New(errorMsg)
	}
	return finalityBlock.Header(), nil
}

This func can then be called by the existing getFinalityBlockHeader, and when storing the validated block hash in the DB.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep agreed, combining them into one function to reuse some code

if chainConfig == nil {
return nil, errors.New("database doesn't have a chain config (was this node initialized?)")
}
consensusDB, err := stack.OpenDatabaseWithOptions("arbitrumdata", node.DatabaseOptions{MetricsNamespace: "arbitrumdata/", ReadOnly: true, PebbleExtraOptions: persistentConfig.Pebble.ExtraOptions("arbitrumdata"), NoFreezer: true})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: stack is an argument of findImportantRoots, and it is not used anymore, so it can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing that along with cacheConfig and persistentConfig since none are used


if executionDB != nil && finalizedBlockHeader != nil {
finalizedBlockHash := finalizedBlockHeader.Hash()
err := executionDB.Put(ValidatedBlockHashKey, finalizedBlockHash.Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this part that compares the block hash provided by Consensus through the SetFinality call, with the block hash stored in the Execution DB, is a procedure that Tsahi suggested, to make the node a little bit more resistant to some reorg scenarios.

It is OK to have a function like you mentioned, so all cases (safe/finalized/validated) are checked with this block hash matching procedure, and we reuse code.
Something like that, variables should be renamed, func name can also have another name.

func checkBlockHashAndGetHeader(....) ... {
    finalityBlockNumber := s.exec.MessageIndexToBlockNumber(finalityMsgIdx)
	finalityBlock := s.exec.bc.GetBlockByNumber(finalityBlockNumber)
	if finalityBlock == nil {
		log.Debug("Finality block not found", "blockNumber", finalityBlockNumber)
		return nil, nil
	}
	if finalityBlock.Hash() != finalityBlockHash {
		errorMsg := fmt.Sprintf(
			"finality block hash mismatch, blockNumber=%v, block hash provided by consensus=%v, block hash from execution=%v",
			finalityBlockNumber,
			finalityBlockHash,
			finalityBlock.Hash(),
		)
		return nil, errors.New(errorMsg)
	}
	return finalityBlock.Header(), nil
}

This func can then be called by the existing getFinalityBlockHeader, and when storing the validated block hash in the DB.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests here should be extended to check that validated block hash is properly stored in the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding checks for both validated and finalized blocks

// PathScheme prunes the state trie by itself, so only HashScheme should be tested
builder.RequireScheme(t, rawdb.HashScheme)

// Needed to create safe blocks; hence forcing SetFinalityData call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: safe blocks are not used by pruning today, only validated and finalized.

defer executionDB.Close()
executionDBEntriesBeforePruning := countStateEntries(executionDB)

// Since we're dealing with a new executionDB we store both validatedBlockHash and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this.
builder.L2.cleanup() should guarantee that those validated and finalized block info are persisted to disk.
Those info should already be available when opening the DB again here.
Have you noticed that this is not the case?

Since finality data is pushed to Execution asynchronously, you can check that validated/finalized info is non empty before calling builder.L2.cleanup().
And then check that the validated/finalized info, retrieved through this DB here, is equal, or even more up to date, than the info that you retrieved before calling builder.L2.cleanup().
Validated/finalized info can be more up to date since, between its retrieval and calling builder.L2.cleanup(), SyncMonitor.SetFinalityData could have been called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did observe something finicky but most likely because I was checking the wrong executionDB or something. I'll add the check to make sure data in the recently opened executionDB matches the one from builder.

@@ -179,56 +165,21 @@ func findImportantRoots(ctx context.Context, executionDB ethdb.Database, stack *
return nil, fmt.Errorf("unknown pruning mode: \"%v\"", initConfig.Prune)
}
if initConfig.Prune != "minimal" && l1Client != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l1Client check not needed anymore here.

Copy link
Contributor

@diegoximenes diegoximenes Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something important is not being tested today.
This is more a lack of test that could have be introduced before, but this PR is changing some code that is not really being tested.

When using a pruning mode that is not "minimal" and not "validator", the trie nodes that are related to the finalized block and genesis should be kept in disk, and the rest of the trie nodes should be pruned.
We can possibly test that by:

  1. Transferring some eth to an account, this will happen in block B1.
  2. Transferring some eth to the same account, this will happen in block B2.
  3. Guarantee that block BF, where BF >= B2, is the latest finalized. There can be other blocks after BF.
  4. Guarantee that we are able to retrieve the account's balance at B1 and BF.
  5. Shutdown the node.
  6. Prune.
  7. Restart the node.
  8. Guarantee that we are able to retrieve the account's balance at BF, but it fails when retrieving it at B1.

The same reasoning can apply for the "validator" pruning mode, in which this PR modified code related to it.

This test makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to add a test for that but I'm struggling with step 8. What kind of information should be missing for the blocks smaller than BF? For instance, after restarting the node I'm not able to call builder.L2.Client.BalanceAt for any block including the finalized block; on the other hand, I'm still able to retrieve block hashes through rawdb.ReadCanonicalHash(builder.L2.ExecNode.ExecutionDB, blockNum) from block 1 up until BF and subsequent blocks. What information would be missing for those B1, B2 blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the changes that adds checks to TestFinalityDataWaitForBlockValidator and cleans TestPruning. So this seems to be the only missing piece

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to read block hashes is expected.
Pruning only removes MPT trie nodes, and block hashes are not stored in trie nodes.

The BalanceAt issue is unexpected.
Which error messages are you seeing?
Can you check running this test without the changes introduced in this PR?
And also add a commit with the implemented test, so I can play with it too?

A brief explanation of why it is unexpected.
With hash scheme, to retrieve a trie node, geth needs to retrieve an entry from the DB, in which the key is the hash of the trie node.
Account's balances are stored in the state trie.

Simplifying a lot here, and making some intentional "mistakes" to make it easier to explain, to retrieve "alice" account's balance:
t1. Retrieves hash(blockHash(B1).stateRootTrie) related node from the DB, lets call it node(t1).
t2. Gets the child hash, related to "a", from node(t1), lets call it hash(node(t1)->a)
t3. Retrieves hash(node(t1)->a) related node from DB, lets call it node(t3).
t4. Gets the child hash, related to "l", from node(t3), lets call it hash(node(t3)->l)
t5. etc

Let S1 be the set of observed node hashes when retrieving "alice" account balance in B1, and SF the set related to BF.
It is quite likely that the intersection of S1 and SF is empty.
Also, it is quite likely that S1 nodes will not be referenced by the tree rooted at blockHash(BF).stateRootTrie.

During pruning we will add blockHash(BF).stateRootTrie as an important root, meaning that nodes reached by it shouldn't be removed.
The other nodes, the ones that are reached by blockHash(B1).stateRootTrie and not by blockHash(BF).stateRootTrie, will be removed.

So we should be able to retrieve "alice" account's balance in BF but not in B1.

Makes sense?

Copy link
Contributor Author

@bragaigor bragaigor Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so say for some weird reason there was an intersection between S1 and SF, would that mean that pruning procedure would prune all those nodes or would it leave them untouched, because of the fact that blockHash(BF).stateRootTrie being an important root?

The error I'm getting is:

    pruning_test.go:346:  [] missing trie node 449a81a33e76a7e1d94904d16948071799bde0a23ca1d4968ef838ac5c97fae2 (path ) state 0x449a81a33e76a7e1d94904d16948071799bde0a23ca1d4968ef838ac5c97fae2 is not available, not found 

I did test with current master and I'm getting the same behaviour. For instance, I'm only able to fetch balance from the latest block and sometimes from latest block - 1. Yeah, some runs the pruner deletes the second oldest block and sometimes it doesn't. Any older blocks I'm not able to retrieve balances. And that behaviour is the same for modes validator, minimal and full. The only mode that I was able to find balance for older blocks was the empty mode "" but I guess in that case we're not pruning the nodes?

Pushed the WIP test TestStateAfterPruning so you can take a look what I'm testing. Wondering if it's anything related to do how I'm setting the finality block through:

	err = builder.L2.ExecNode.SyncMonitor.SetFinalityData(builder.L2.ExecNode.ExecutionDB, &safeFinalityData, &finalizedFinalityData, &validatedFinalityData)

I also tried higher values for finality block in:

	finalizedMsgIdx := arbutil.MessageIndex(96) // <<<<<<<<<<<<<
	finalizedMsgResult, err := builder.L2.ExecNode.ResultAtMessageIndex(finalizedMsgIdx).Await(ctx)
    // ...
	err = builder.L2.ExecNode.SyncMonitor.SetFinalityData(builder.L2.ExecNode.ExecutionDB, &safeFinalityData, &finalizedFinalityData, &validatedFinalityData)

but no dice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure if opening a full node with a DB created by an archive node will work though. Maybe even restarting as a an archive node is already enough, but not sure if something will fail while starting the node in this case. But this seems the most straightforward approach to try first.

yeah that part worked fine, the problem is that archival mode keeps every state, More info below.


There's an interesting behaviour with running the node in archive mode. Even though I see the expected blocks being added as important roots as part of the pruning procedure, BalanceAt calls still succeeds for most of the blocks. This is what I'm doing:

  1. Start node in archive mode by setting builder.execConfig.Caching.Archive to true
  2. Generate 5,000 blocks
  3. Set last finalized block to 4850 and last validated block to 2550 (they're > 2000 block apart as per minRootDistance)
  4. Stop node
  5. Call pruning.PruneExecutionDB(...) with validator pruning mode
    a. I observe that blocks 0, 2550 and 4850 have been added as important roots
  6. Start another node with the same executionDB (no archival mode)
  7. Get L2 chain tip, call it l2_head which is at around block 9000
  8. Call BalanceAt for every block between 2000 - 9000 and check which succeed and which doesn't

Say l2_head is 9072

  • Blocks between [7484, 8945] failed with state recreation l2 gas depth limit exceeded
  • Blocks between [2550, 7483] succeeded
  • Blocks between [2521, 2549] failed with state recreation l2 gas depth limit exceeded
  • Blocks between [2000, 2520] succeeded

So it doesn't seem like pruning procedure is able to prune blocks in a node in archive mode? I would expect that some blocks would return the error missing trie node. To test that theory, I conducted the same exeriement but instead of running the node in archival mode, I ran a regular not and added the following lines to commit blocks [2000, 5000] to triedDB just before stopping the node:

	bc := builder.L2.ExecNode.Backend.ArbInterface().BlockChain()
	triedb := bc.StateCache().TrieDB()

	for i := 2000; i < 5000; i++ {
		header := bc.GetHeaderByNumber(uint64(i))
		err = triedb.Commit(header.Root, false)
		Require(t, err)
	}

And that behaved as we expected, meaning, pruning procedure pruned all tried nodes beween blocks [2000, 5000] except for finalized block 4850 and last validated block 2550. So something about spinning up a node in archival node prevents the pruning procedure to prune trieDB nodes the way we expect. With that in mind I tried to dig a bit deeper into how BalanceAt looks for such information:

  • when using node in archival mode
    • BalanceAt eventually calls StateAndHeaderFromHeader here
    • the interesting thing is how BalanceAt is able to find the state for the blocks I'm calling. For instance, for the blocks we mark as important roots which are the last validated and last finalized blocks we are able to find them here [1]
      		statedb, err := state.New(header.Root, db)
      		...
              if header.Root != (common.Hash{}) {
      			headerRoot := header.Root
      			return statedb, func() { db.TrieDB().Dereference(headerRoot) }, nil // <<<<< find state header here
      		}
    • and they are successfully returned here:
      liveState, liveStateRelease, err := stateFor(bc.StateCache(), bc.Snapshots())(header)
      if err == nil {
      	...
      	return liveState, header, nil // <<<<< return state header here
      }
  • now, the other blocks for the archival mode are found through a different code path. First they call this line:
	lastState, lastHeader, lastStateRelease, err := FindLastAvailableState(ctx, bc, stateFor(ephemeral, nil), header, nil, maxRecreateStateDepth)
  • that line calls stateFor multiple times in an attempt to find the last available state by calling stateFor with header - 1 until it returns something, which eventually does. Meaning, balanceAt is successfully recreating the state; probably because we're dealing with archival mode, and enough state remains that balanceAt is able to recreate such state.

  • when using node in regular mode
    • BalanceAt eventually calls StateAndHeaderFromHeader, that doesn't change.
    • for the blocks we mark as important roots which are the last validated and last finalized blocks, BalanceAt finds the state through the same code path as the above [1].
    • but here we fail to find the header root here which eventually calls newTrieReader:
      		statedb, err := state.New(header.Root, db)
      		if err != nil {
      			return nil, nil, err
      		}
    • to eventually return the error in this line. And unlike in archival mode, in regular mode we are never able to recreate the state for those other blocks, so they fail with missing trie node

  • So it seems we can't use a node in archival mode unfortunetely if we're using balanceAt to test since it tries to recreate those states and for the most part it's successful.

Do what you are already doing, but instead of using BF here, after restarting the node, you use the latest available block available in the execution DB, that is older or equal to BF...just matching the addHeader behavior here.

  • we could do that, meaning, we can have a test running a node that's not in archival mode, and where we don't commit the state manually and check for the blocks that get committed as part of the cleanup which are HEAD, HEAD - 1, etc. Here we can't test if last validated and last finalized blocks are prunned due to the committing issue.
  • or we could do that but also manually committing a few blocks manually (last validated and last finalized blocks); then that way we can check that only genesis, last validated and last finalized blocks have not been pruned by pruning procedure.

Let me know what you think.

Copy link
Contributor

@diegoximenes diegoximenes Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems we can't use a node in archival mode unfortunetely if we're using balanceAt to test since it tries to recreate those states and for the most part it's successful.

Makes sense

or we could do that but also manually committing a few blocks manually (last validated and last finalized blocks); then that way we can check that only genesis, last validated and last finalized blocks have not been pruned by pruning procedure.

This approach is good 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test under TestStateAfterPruningValidator. My original idea was to call testStateAfterPruning for different modes for which all works locally. Similar to TestPruning. But something about the amount of blocks present in this test is preventing me to have more than one at the same time. I think it would be best if I created a ticket to investigate how to add different modes for this test if we still want that going forward. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm even with just one test it seems to be timing out this CI workflow. Trying once more, but if that is a problem then I think it would be best to bring the number of block down from 5000 to 3000. That means we won't be able to test pruning procedure for adding both last validated and last finalized blocks since they would be less than 2000 blocks apart; but that's probably better than nothing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passed on the 3rd try, it could also be that workflow is flaky; as I had to restart it in different occasions in the past in different PRs

Refactor PruneExecutionDB, more specifically findImportantRoots
so it does not depend on consensusDB, but only on executionDB

Signed-off-by: Igor Braga <[email protected]>
Signed-off-by: Igor Braga <[email protected]>
@bragaigor bragaigor force-pushed the braga/no-consensus-for-prune branch from a59d76b to 4965575 Compare January 15, 2026 15:58
@bragaigor bragaigor assigned diegoximenes and unassigned bragaigor Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants