Skip to content

Bug Fix: Make batch production non-optional during recovery. Add sequencer proof processing tests#2605

Open
preston-evans98 wants to merge 6 commits intodevfrom
preston/sequencer-proof-tests
Open

Bug Fix: Make batch production non-optional during recovery. Add sequencer proof processing tests#2605
preston-evans98 wants to merge 6 commits intodevfrom
preston/sequencer-proof-tests

Conversation

@preston-evans98
Copy link
Member

Description

This PR rewrites and expands the sequencer proof processing test suite to include resync and recovery, as well as handling of proofs that arrive while an open batch is in progress. This should fix the flaky test as well as improving coverage.

The new tests uncovered a pre-existing bug in recovery mode. The sequencer used produce_batch_if_convenient, but assumed that the requested batches were always produced. Trigger a failure was easier thanks to proofs (the blob sender is much more likely to be busy) which allowed the new tests to detect the issue. This PR fixes the issue by removing the "is_convenient" checks from batch production during recovery mode only.

  • I have updated CHANGELOG.md with a new entry if my PR makes any breaking changes or fixes a bug. If my PR removes a feature or changes its behavior, I provide help for users on how to migrate to the new behavior.
  • I have carefully reviewed all my Cargo.toml changes before opening the PRs. (Are all new dependencies necessary? Is any module dependency leaked into the full-node (hint: it shouldn't)?)

Linked Issues

Expose zk proof progress through a shared manager status and a controlled prover/blueprint path so sequencer tests can hold and release aggregate proof publication deterministically. Refactor proof progress tracking into a helper so status updates stay coupled to state changes.
// Update the next height to receive
self.stf_info_receiver
.inc_next_height_to_receive_by(num_proofs_to_create as u64);
self.proof_progress.finish_aggregate_proof_creation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

begin_aggregate_proof_creation sets aggregate_proof_in_flight = true, and finish_aggregate_proof_creation clears it. However, if create_aggregate_proof_with_retries or publish_proof_blob_with_metadata fails in this code block, the flag is never reset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as the ZkProofManager just exits if this happens

/// background to test node APIs.
#[derive(Clone)]
pub struct RollupBuilder<R: FullNodeBlueprint<Native>> {
blueprint: Arc<R>,
Copy link
Member

@bkolad bkolad Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make RollupBuilder stateful, which means some state can persist across restarts.

builder.with_blueprint(..);
let rollup = builder.start()
let builder = rollup.shutdown() / rollup.restart()
rollup_after_restart = builder.star()

rollup_after_restart <- inherited r`eady_proof_count`

In this flow, rollup_after_restart may inherit state from the previous run. For example, in ManualProofPostingRtAgnosticBlueprint, we keep shared state such as ready_proof_count, available_releases, and is_open. That shared state would also be inherited by rollup_after_restart.

Would it be possible to add an Option<R> in start_test_rollup. So we would provide the bluerint only when TestRollup is created?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could jut add another method start_test_rollup_with_manual_proof_posting and call ManualProofPostingRtAgnosticBlueprint::<TestSpec, RT>::new_with_control() in it.

.trigger_batch_production_if_convenient_msg(
"recover_and_catch_up:dump_catchup_batches",
)
.trigger_batch_production_msg("recover_and_catch_up:dump_catchup_batches")
Copy link
Member

@bkolad bkolad Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the checks that is skipped now is:

if !self.seq_config.automatic_batch_production {
       warn!("Skipping batch production due to settings");
       return;
 }

This means that the sequencer will produce batches during recovery, although it is configured to never do it. This only matters in tests, but it can still be confusing. Let's log an error if we hit recovery and automatic_batch_production = true?

Copy link
Member

@bkolad bkolad Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option: we could panic when this happens, and cfg!(debug_assertions) = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants