test: poa should be validated against parent blocks epoch snapshot, not current by craigmayhew · Pull Request #822 · Irys-xyz/irys

craigmayhew · 2025-09-30T14:23:49Z

For whomenever may pick this up: the e2e test is not working and I have pushed all my debug. The other two tests look LGTM.

Describe the changes

Three new tests added.

data_poa_boundary_acceptance is an incomplete end-to-end boundary snapshot smoke test.

Guards the core regression at the epoch boundary: PoA must be validated against the parent epoch snapshot, not the child. If code accidentally uses the child snapshot at the boundary, this block would be rejected (or become non-canonical).

deterministic_boundary_poa_crafted_snapshot

Lightweight integration test, does not spin up a node.
Proves the PoA boundary invariant in a minimal, deterministic environment: PoA validates under the PARENT snapshot and fails under a CHILD-like snapshot with a changed slot_index.

heavy_poa_parent_snapshot_integration

Heavy integration test, spins up a node.
At an epoch boundary, a data PoA constructed using the parent epoch snapshot must validate against that parent snapshot.
The same PoA must fail against a later child snapshot after subsequent epoch transitions

Related Issue(s)
Original fix: #815

Checklist

Tests have been added/updated for the changes.
Documentation has been updated for the changes (if applicable).
The code follows Rust's style guidelines.

Summary by CodeRabbit

Tests
- Added end-to-end tests validating Proof of Availability functionality at epoch boundaries
- Added comprehensive validation tests for PoA behavior across epoch snapshots
- Enhanced testing utilities with new helper functions for data transaction and proof submission workflows

_{✏️ Tip: You can customize this high-level summary in your review settings.}

craigmayhew · 2025-10-15T16:33:16Z

TRY 2 FAIL [ 6.738s] irys-chain::mod term_ledger_expiry::data_poa_boundary_acceptance::data_poa_boundary_acceptance

craigmayhew · 2025-10-28T20:20:24Z

Note these flaky tests on the last run:
FLAKY 2/2 [ 60.959s] irys-chain::mod multi_node::validation::slow_heavy_block_invalid_reth_hash_gets_rejected
FLAKY 2/2 [ 54.093s] irys-chain::mod promotion::data_promotion_double::slow_heavy_double_root_data_promotion_test
TRY 2 FAIL [ 35.867s] irys-chain::mod multi_node::fork_recovery::slow_heavy_reorg_upto_block_migration_depth

craigmayhew · 2025-10-28T20:34:43Z

note more flakes:

FLAKY (2/2) [ 48.225s] irys-chain::mod::block_production::basic_contract::heavy_test_erc20
FLAKY (2/2) [ 8.713s] irys-chain::mod::block_production::block_production::heavy_block_prod_fails_with_insufficient_storage_fees
FLAKY (2/2) [ 10.260s] irys-chain::mod::block_production::block_production::heavy_block_prod_will_not_build_on_invalid_blocks
FLAKY (2/2) [ 3.007s] irys-chain::mod::block_production::block_validation::heavy_test_future_block_rejection
FLAKY (2/2) [ 2.961s] irys-chain::mod::block_production::block_validation::heavy_test_prevalidation_rejects_tampered_vdf_seeds
FLAKY (2/2) [ 95.073s] irys-chain::mod::block_production::reset_seed::slow_heavy_reset_seeds_should_be_correctly_applied_by_the_miner_and_verified_by_the_peer
TRY 2 FAIL [ 25.084s] irys-chain::mod::multi_node::fork_recovery::heavy_shallow_fork_triggers_migration_prune_and_fcu

craigmayhew · 2025-10-29T12:53:49Z

note a fail and some flake:

FLAKY 2/2 [   3.239s] irys-chain::mod block_production::block_validation::heavy_test_future_block_rejection
FLAKY 2/2 [   3.052s] irys-chain::mod block_production::block_validation::heavy_test_prevalidation_rejects_tampered_vdf_seeds
FLAKY 2/2 [  11.065s] irys-chain::mod block_production::unpledge_refund::heavy_unpledge_all_partitions_refund_flow
TRY 2 FAIL [  17.137s] irys-chain::mod multi_node::fork_recovery::heavy_shallow_fork_triggers_migration_prune_and_fcu

craigmayhew · 2025-11-06T09:59:02Z

Note: current state of e2e test:

Error: Failed to craft solution meeting difficulty after 200 attempts (last 
got=69657161332160831908330912430606525786921681880888770571758383211263951394128, 
expected>=77194726158210796949047323339125271902179989777093709359638389338608753093290)

coderabbitai · 2025-12-24T14:26:43Z

📝 Walkthrough

Walkthrough

This PR introduces comprehensive test coverage for Proof of Availability (PoA) validation at epoch boundaries. It adds new test files covering deterministic and integration scenarios, provides utility functions for PoA testing workflows, and enables the irys-efficient-sampling workspace dependency for development.

Changes

Cohort / File(s)	Summary
Dependency Configuration `crates/chain/Cargo.toml`	Added `irys-efficient-sampling.workspace = true` as a dev-dependency to enable workspace member access for testing purposes.
Test Utilities `crates/chain/tests/utils.rs`	Added three new public async helper functions: `craft_data_poa_solution_from_tx` (crafts PoA solutions from transactions), `get_epoch_snapshot` (retrieves epoch snapshots from node), and `submit_solution_to_block_producer` (submits PoA solutions). Also reworded timeout error message for ingress proof validation.
Block Production Tests `crates/chain/tests/block_production/mod.rs`, `crates/chain/tests/block_production/data_poa_boundary_acceptance.rs`	Added end-to-end test `data_poa_boundary_acceptance` that validates PoA at epoch boundaries by mining blocks, advancing epochs, posting data transactions, and verifying PoA solutions meet difficulty requirements. Includes helper functions for partition debugging and snapshot assertions.
Validation Tests `crates/chain/tests/validation/mod.rs`, `crates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rs`, `crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs`	Added two new validation test files: `deterministic_boundary_poa_crafted_snapshot` validates PoA against parent and mutated child epoch snapshots with deterministic scenarios; `heavy_poa_parent_snapshot_integration` exercises PoA validation across epoch boundaries with actual chain state. Module declarations added to expose new tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change: adding tests to verify PoA validation uses parent epoch snapshot, not current. It directly matches the primary objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…lution_from_tx()

…ht 16 after 21 retries". Just prior to noted out panic.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/chain/tests/utils.rs (1)

3168-3242: solution_context_with_poa_chunk correctly builds deterministic PoA solutions; consider emphasizing its constraints

This helper cleanly:

Starts/stops VDF internally and waits for a (step-1, step) pair with at least two steps.

Computes checkpoints for step-1.

Uses a fixed partition_hash = H256::zero and partition_chunk_offset = 0 and builds solution_hash = sha256(poa_chunk || offset_le || vdf_output).

Two suggestions:

Document that recall-range alignment is intentionally ignored. The comment mentions “without recall-range dependency”, but it’s helpful to be explicit that this must only be used in tests where PoA verification is disabled, otherwise PoA will (correctly) fail.

Guard against steps.len() < 2 defensively. You already check steps.len() >= 2 in the loop, but an additional assert before indexing keeps the function future-proof if get_steps semantics ever change.

Otherwise the construction is consistent with how solution_hash is used elsewhere.

📜 Review details

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 978ff95 and fef3c63.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

crates/chain/Cargo.toml
crates/chain/tests/block_production/data_poa_boundary_acceptance.rs
crates/chain/tests/block_production/mod.rs
crates/chain/tests/utils.rs
crates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rs
crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs
crates/chain/tests/validation/mod.rs

🧰 Additional context used

🧬 Code graph analysis (4)

crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs (3)

crates/actors/src/block_validation.rs (1)

poa_is_valid (1063-1196)

crates/types/src/block.rs (1)

merklize_tx_root (619-635)

crates/types/src/config/node.rs (2)

consensus_config (658-674)

miner_address (689-691)

crates/chain/tests/block_production/mod.rs (1)

crates/chain/tests/block_production/data_poa_boundary_acceptance.rs (1)

data_poa_boundary_acceptance (22-237)

crates/chain/tests/block_production/data_poa_boundary_acceptance.rs (1)

crates/chain/tests/utils.rs (5)

craft_data_poa_solution_from_tx (88-285)

get_epoch_snapshot (288-299)

read_block_from_state (3288-3343)

submit_solution_to_block_producer (302-318)

new_genesis (549-552)

crates/chain/tests/validation/mod.rs (2)

crates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rs (1)

deterministic_boundary_poa_crafted_snapshot (30-331)

crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs (1)

heavy_poa_parent_snapshot_integration (35-274)

🔇 Additional comments (4)

crates/chain/Cargo.toml (1)

72-88: Dev-dependency wiring for irys-efficient-sampling looks correct

The crate is correctly added as a workspace dev-dependency and matches the irys_efficient_sampling usage in tests; no further changes needed.

crates/chain/tests/block_production/mod.rs (1)

7-7: Module wiring for data_poa_boundary_acceptance is appropriate

Private module inclusion cleanly hooks the new e2e test into the block_production test suite.

crates/chain/tests/validation/mod.rs (1)

3-4: New PoA validation test modules correctly registered

The added mod deterministic_boundary_poa_crafted_snapshot; and mod heavy_poa_parent_snapshot_integration; correctly integrate the new PoA boundary tests into the validation suite without affecting public APIs.
crates/chain/tests/utils.rs (1)
1323-1327: Timeout message improvement is clearer and more actionable

The new message:
"Timed out waiting for {} ingress proof(s) within {} seconds"
is more explicit than the previous wording and should make debugging test timeouts easier. No further changes needed here.

coderabbitai · 2025-12-29T18:29:59Z

crates/chain/tests/block_production/data_poa_boundary_acceptance.rs

+    // Post TWO txs so data extends into slot 1 (which should have partitions).
+    let _tx1 = node
+        .post_data_tx(anchor, vec![7_u8; DATA_SIZE], &user_signer)
+        .await;
+    let _ = node.mine_block().await?;
+    let tx = node
+        .post_data_tx(anchor, vec![9_u8; DATA_SIZE], &user_signer)
+        .await;
+    let _ = node.mine_block().await?;
+    // tx2 starts at slot 1
+    let tx_ledger_offset: u64 = slot_size;
+


⚠️ Potential issue | 🟠 Major

tx_ledger_offset = slot_size is a brittle assumption about ledger layout

Here you assume the second tx’s first chunk starts exactly at slot_size:

// tx2 starts at slot 1 let tx_ledger_offset: u64 = slot_size;

That’s only guaranteed if:

There is no prior Submit data occupying those chunks, and

Each of the two txs contributes exactly one full partition’s worth of chunks in order, with no interleaving.

Given the amount of prior epoch activity (stake/pledge, capacity assignments, backfill, etc.), this coupling to implementation details of packing is fragile. If the packing or ledger layout changes (or other txs are present), the actual ledger offset for tx can differ, and the crafted PoA will be misaligned even when the parent snapshot selection is correct.

To make the test robust, consider:

Deriving the actual ledger chunk offset for tx from BlockIndex once its inclusion block is known (e.g., via get_block_bounds or a small helper), then feeding that concrete offset into craft_data_poa_solution_from_tx, or

Asserting (via introspection) that the computed offset really matches slot_size before crafting the PoA, and failing fast with a clear message if not.

This will decouple the regression check from low-level packing assumptions and should reduce false negatives.

coderabbitai · 2025-12-29T18:29:59Z

crates/chain/tests/block_production/data_poa_boundary_acceptance.rs

+    let miner_addr = node.node_ctx.config.node_config.reward_address;
+
+    // Verify partition slot alignment
+    {
+        let pa = parent_snapshot
+            .partition_assignments
+            .get_assignment(partition_hash)
+            .expect("partition assignment must exist");
+        let slot_index = pa
+            .slot_index
+            .expect("slot_index must exist for data partition") as u64;
+        assert_eq!(
+            slot_index as usize, target_slot_idx,
+            "partition slot_index should match the tx's slot"
+        );
+    }


⚠️ Potential issue | 🟡 Minor

Verify miner address used for PoA entropy matches the block’s miner address

The PoA chunk entropy is keyed off the miner address, and craft_data_poa_solution_from_tx uses the miner_addr you pass to derive the entropy:

let miner_addr = node.node_ctx.config.node_config.reward_address; ... let sol_result = craft_data_poa_solution_from_tx( &node, &tx, partition_hash, miner_addr, Some(tx_ledger_offset), )

Elsewhere (e.g., solution_context/capacity_chunk_solution) the miner address is taken from node_ctx.config.node_config.miner_address(). If reward_address and miner_address() ever diverge, the entropy used when crafting the PoA will not match the address passed into poa_is_valid during validation, and you’ll get PoAChunkHashMismatch even when everything else is aligned.

Please double-check that:

SolutionContext.mining_address (filled by the helper) becomes the same address poa_is_valid uses as miner_address, and

That this address is the one you intend to use here (likely miner_address() rather than reward_address, unless they’re guaranteed equal in tests).

If they differ, switch to node.node_ctx.config.node_config.miner_address() for entropy and solution crafting to avoid subtle mismatches.

coderabbitai · 2025-12-29T18:30:00Z

crates/chain/tests/block_production/data_poa_boundary_acceptance.rs

+    // Enable mining for VDF advancement while retrying solutions
+    node.node_ctx.start_mining()?;
+
+    let solution = {
+        let max_attempts = 500_usize;
+        let mut attempt = 0_usize;
+        let delay_ms = 50_u64;
+        let parent_block = node
+            .get_block_by_height(node.get_canonical_chain_height().await)
+            .await?;
+        let parent_diff = parent_block.diff;
+
+        loop {
+            let sol_result = craft_data_poa_solution_from_tx(
+                &node,
+                &tx,
+                partition_hash,
+                miner_addr,
+                Some(tx_ledger_offset),
+            )
+            .await;
+
+            match sol_result {
+                Ok(sol) => {
+                    let solution_diff = irys_types::u256_from_le_bytes(&sol.solution_hash.0);
+                    if solution_diff >= parent_diff {
+                        tracing::debug!(attempt, "Found valid solution after attempts");
+                        break sol;
+                    }
+                }
+                Err(e) => {
+                    let err_msg = e.to_string();
+                    if !err_msg.contains("recall range mismatch") {
+                        return Err(e);
+                    }
+                }
+            }
+
+            attempt += 1;
+            if attempt >= max_attempts {
+                eyre::bail!(
+                    "Failed to craft solution after {} attempts (last error: recall range or difficulty mismatch)",
+                    attempt
+                );
+            }
+
+            // Short constant delay to allow VDF advancement between attempts
+            tokio::time::sleep(std::time::Duration::from_millis(delay_ms)).await;
+        }
+    };


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the test file and check if it exists find . -name "data_poa_boundary_acceptance.rs" -type f

Repository: Irys-xyz/irys

Length of output: 126

🏁 Script executed:

#!/bin/bash # Get repo structure overview ls -la crates/chain/tests/block_production/ 2>/dev/null | head -20

Repository: Irys-xyz/irys

Length of output: 1223

🏁 Script executed:

#!/bin/bash # Search for VDF-related code to understand the mechanism rg "start_vdf|current_step|vdf_advance" --type rust -i | head -30

Repository: Irys-xyz/irys

Length of output: 2378

🏁 Script executed:

#!/bin/bash # Look for difficulty configuration in tests rg "max_attempts|500" crates/chain/tests/block_production/ --type rust -B 2 -A 2

Repository: Irys-xyz/irys

Length of output: 1534

🏁 Script executed:

#!/bin/bash # Read the test file to see full context wc -l crates/chain/tests/block_production/data_poa_boundary_acceptance.rs

Repository: Irys-xyz/irys

Length of output: 128

🏁 Script executed:

#!/bin/bash # Read the test file starting from earlier to understand context cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | head -200

Repository: Irys-xyz/irys

Length of output: 9263

🏁 Script executed:

#!/bin/bash # Look for the function containing lines 151-200 cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '1,250p'

Repository: Irys-xyz/irys

Length of output: 11366

🏁 Script executed:

#!/bin/bash # Check if start_mining calls start_vdf or if they're separate rg "start_mining|start_vdf" crates/chain/src/chain.rs -A 10 | head -50

Repository: Irys-xyz/irys

Length of output: 962

🏁 Script executed:

#!/bin/bash # Look at the craft_data_poa_solution_from_tx implementation rg "fn craft_data_poa_solution_from_tx" crates/chain/tests/ -A 40 | head -100

Repository: Irys-xyz/irys

Length of output: 2967

🏁 Script executed:

#!/bin/bash # Check VDF advancement and what makes current_step change rg "global_step" crates/chain/tests/utils.rs -B 5 -A 5 | head -80

Repository: Irys-xyz/irys

Length of output: 2373

🏁 Script executed:

#!/bin/bash # Look for difficulty settings and how they impact solution finding rg "sha_1s_difficulty|solution_diff|parent_diff" crates/ --type rust | head -30

Repository: Irys-xyz/irys

Length of output: 2567

🏁 Script executed:

#!/bin/bash # Check if there's a direct call to start_vdf before the loop cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '145,165p'

Repository: Irys-xyz/irys

Length of output: 853

🏁 Script executed:

#!/bin/bash # Look at what start_mining does more carefully cat -n crates/chain/src/chain.rs | grep -A 15 "pub fn start_mining"

Repository: Irys-xyz/irys

Length of output: 809

🏁 Script executed:

#!/bin/bash # Check if there are any asserts on VDF progression in similar test patterns rg "global_step|current_step" crates/chain/tests/ -B 3 -A 3 | head -100

Repository: Irys-xyz/irys

Length of output: 6192

🏁 Script executed:

#!/bin/bash # Look at how other tests handle the solution loop and difficulty rg "max_attempts|craft_.*_solution" crates/chain/tests/block_production/ --type rust -B 5 -A 10 | head -150

Repository: Irys-xyz/irys

Length of output: 5444

🏁 Script executed:

#!/bin/bash # Check if there are any recent test failures or flakiness issues mentioned git log --oneline --all --grep="flak\|data_poa_boundary\|solution.*fail" | head -20

Repository: Irys-xyz/irys

Length of output: 917

🏁 Script executed:

#!/bin/bash # Search for examples of VDF progression assertions in test code rg "while.*global_step|assert.*global_step|greater.*step" crates/chain/tests/ --type rust -B 3 -A 3 | head -60

Repository: Irys-xyz/irys

Length of output: 2048

🏁 Script executed:

#!/bin/bash # Look for any recent changes to the data_poa_boundary_acceptance test git log --oneline -p crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | head -100

Repository: Irys-xyz/irys

Length of output: 4493

🏁 Script executed:

#!/bin/bash # Check what the craft_data_poa_solution_from_tx function returns and how it handles VDF step changes sed -n '560,650p' crates/chain/tests/utils.rs

Repository: Irys-xyz/irys

Length of output: 3380

🏁 Script executed:

#!/bin/bash # Look at the full craft_data_poa_solution_from_tx to see if it handles VDF progression sed -n '440,530p' crates/chain/tests/utils.rs

Repository: Irys-xyz/irys

Length of output: 3149

🏁 Script executed:

#!/bin/bash # Check if there are assertions about VDF step advancement in the solution finding loop rg "attempt|step.*advance|steps.*change" crates/chain/tests/block_production/data_poa_boundary_acceptance.rs -B 2 -A 2

Repository: Irys-xyz/irys

Length of output: 973

🏁 Script executed:

#!/bin/bash # Get the full context of how solution_crafting_loop_capacity_poa handles VDF rg "solution_crafting_loop_capacity_poa" crates/chain/tests/utils.rs -A 100 | head -120

Repository: Irys-xyz/irys

Length of output: 39

🏁 Script executed:

#!/bin/bash # Check if craft_data_poa_solution_from_tx actually uses new VDF steps or reuses the same one cat -n crates/chain/tests/utils.rs | sed -n '295,380p'

Repository: Irys-xyz/irys

Length of output: 3606

🏁 Script executed:

#!/bin/bash # Look at the actual implementation more carefully to see step advancement rg "fn craft_data_poa_solution_from_tx" crates/chain/tests/utils.rs -A 80 | head -100

Repository: Irys-xyz/irys

Length of output: 3264

🏁 Script executed:

#!/bin/bash # See the rest of craft_data_poa_solution_from_tx to understand the full flow sed -n '100,200p' crates/chain/tests/utils.rs | grep -A 150 "pub async fn craft_data_poa_solution_from_tx" | tail -80

Repository: Irys-xyz/irys

Length of output: 39

🏁 Script executed:

#!/bin/bash # Get a longer look at the craft_data_poa_solution_from_tx function cat -n crates/chain/tests/utils.rs | sed -n '100,280p'

Repository: Irys-xyz/irys

Length of output: 8305

🏁 Script executed:

#!/bin/bash # Check if there are any VDF step advancement checks/assertions we might have missed rg "assert|expect" crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | grep -i vdf

Repository: Irys-xyz/irys

Length of output: 39

🏁 Script executed:

#!/bin/bash # Understand the VDF progression rate - how fast does global_step advance with sha_1s_difficulty=10_000? rg "sha_1s_difficulty.*10.*000|vdf.*step.*time|VDF.*advance" crates/ --type rust -B 2 -A 2 | head -40

Repository: Irys-xyz/irys

Length of output: 2755

🏁 Script executed:

#!/antml:bash # Check if 50ms is sufficient for VDF to advance by one step rg "Duration.*millis|sleep.*[0-9]" crates/chain/tests/ --type rust | grep -A 2 -B 2 "200\|500" | head -30

Repository: Irys-xyz/irys

Length of output: 2866

🏁 Script executed:

#!/bin/bash # Check if there's monitoring of VDF step progression in the test loop itself cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '154,200p'

Repository: Irys-xyz/irys

Length of output: 2001

🏁 Script executed:

#!/antml:bash # See how long 500 attempts with 50ms delays would take python3 <<'EOF' attempts = 500 delay_ms = 50 total_ms = attempts * delay_ms total_secs = total_ms / 1000 print(f"Max loop time: {total_ms}ms = {total_secs}s") print(f"This assumes every attempt takes negligible time") print(f"Test timeout is typically 10-30 seconds") EOF

Repository: Irys-xyz/irys

Length of output: 143

🏁 Script executed:

#!/bin/bash # Check what happens if VDF step doesn't advance - does the loop get stuck? # Look at error handling in the test cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '200,210p'

Repository: Irys-xyz/irys

Length of output: 495

Solution loop lacks VDF progression verification despite existing mitigations

The test already calls start_mining() (which starts VDF) and sets difficulty to 10,000. However, the core flakiness risk remains unaddressed: there is no assertion or explicit monitoring that global_step actually advances between the 500 solution attempts.

Each iteration of craft_data_poa_solution_from_tx() reads the current VDF step independently, but if VDF advancement stalls, the same steps[1] seed will be used repeatedly, generating identical solution hashes and triggering "recall range mismatch" errors. With 50ms delays between attempts (500 × 50ms = 25 seconds), the loop can timeout before finding a valid solution.

To stabilize this test, add an explicit check that global_step increases across attempts, or fail fast if VDF is not progressing. Alternatively, increase the delay between attempts or further reduce difficulty to reduce reliance on VDF progression.

coderabbitai · 2025-12-29T18:30:00Z

crates/chain/tests/block_production/data_poa_boundary_acceptance.rs

+// reusable debug asserts helper
+async fn debug_asserts_first_slots(
+    node: &IrysNodeTest<IrysNodeCtx>,
+    expected_submit_slots: usize,
+    expected_submit_partitions: usize,
+    expected_publish_slots: usize,
+    expected_publish_partitions: usize,
+) -> eyre::Result<()> {
+    let snapshot = get_epoch_snapshot(node).await?;
+    let submit_slots = snapshot.ledgers.get_slots(DataLedger::Submit);
+    let publish_slots = snapshot.ledgers.get_slots(DataLedger::Publish);
+
+    assert_eq!(
+        expected_submit_slots,
+        submit_slots.len(),
+        "We expected {} Submit slot(s)",
+        expected_submit_slots
+    );
+    assert_eq!(
+        expected_submit_partitions,
+        submit_slots
+            .first()
+            .map(|s| s.partitions.len())
+            .unwrap_or_default(),
+        "We expected {} Submit partition(s)",
+        expected_submit_partitions
+    );
+    assert_eq!(
+        expected_publish_slots,
+        publish_slots.len(),
+        "We expected {} publish slot(s)",
+        expected_publish_slots
+    );
+    assert_eq!(
+        expected_publish_partitions,
+        publish_slots
+            .first()
+            .map(|s| s.partitions.len())
+            .unwrap_or_default(),
+        "We expected {} publish partition(s)",
+        expected_publish_partitions
+    );
+    Ok(())
+}


⚠️ Potential issue | 🟡 Minor

Snapshot-shape debug assertions are helpful but tightly coupled to config

debug_asserts_first_slots enforces exact numbers of slots and partitions:

assert_eq!(expected_submit_slots, submit_slots.len(), ...); assert_eq!(expected_submit_partitions, submit_slots.first().map(|s| s.partitions.len()).unwrap_or_default(), ...); ...

With expected_* = 1 throughout the test, this effectively asserts that the first Submit/Publish slot always has a single partition. That’s reasonable for the specific config you set up here, but quite sensitive to any changes in epoch/backfill or capacity partition logic.

Given this is primarily a regression test for snapshot selection, consider:

Either relaxing these checks (e.g., assert “>= 1 partition” where that’s all you strictly need), or

Adding a brief comment near the call sites noting that these asserts encode invariants of the current test config and will need adjustment if the consensus layout changes.

Not strictly wrong, but it’s an easy source of future breakage unrelated to the PoA regression itself.

🤖 Prompt for AI Agents

In crates/chain/tests/block_production/data_poa_boundary_acceptance.rs around lines 281 to 324, the debug_asserts_first_slots helper rigidly asserts exact slot/partition counts which couples the test to current epoch/partition layout; change the equality assertions to more robust checks (e.g., assert!(submit_slots.len() >= expected_submit_slots) and assert!(submit_slots.first().map(|s| s.partitions.len()).unwrap_or_default() >= expected_submit_partitions) and similarly for publish slots/partitions) or, if you prefer to keep exact counts, add a short comment at the call sites explaining these asserts encode the current test configuration and must be updated if epoch/backfill/partitioning logic changes; update the assertion messages to reflect the new comparison (">=") so failure output remains clear.

coderabbitai · 2025-12-29T18:30:00Z

crates/chain/tests/utils.rs

+pub async fn craft_data_poa_solution_from_tx(
+    node: &IrysNodeTest<IrysNodeCtx>,
+    tx: &DataTransaction,
+    partition_hash: H256,
+    miner_addr: IrysAddress,
+    tx_ledger_offset: Option<u64>,
+) -> eyre::Result<SolutionContext> {
+    // Compute slot_index from parent epoch snapshot (prev block)
+    let head_height = node.get_canonical_chain_height().await;
+    let block = node.get_block_by_height(head_height).await?;
+    let parent_epoch_snapshot = node
+        .node_ctx
+        .block_tree_guard
+        .read()
+        .get_epoch_snapshot(&block.block_hash)
+        .ok_or_else(|| eyre!("parent epoch snapshot not found"))?;
+    let pa = parent_epoch_snapshot
+        .partition_assignments
+        .get_assignment(partition_hash)
+        .ok_or_else(|| eyre!("partition assignment missing for {}", partition_hash))?;
+    let slot_index =
+        pa.slot_index
+            .ok_or_else(|| eyre!("slot index missing for {}", partition_hash))? as u64;
+
+    let slot_start = slot_index
+        * node.node_ctx.config.consensus.num_partitions_per_slot
+        * node.node_ctx.config.consensus.num_chunks_in_partition;
+
+    let chunks_per_partition = node.node_ctx.config.consensus.num_chunks_in_partition;
+
+    // Determine the ledger offset for the tx's first chunk:
+    let chunk_ledger_offset =
+        tx_ledger_offset.unwrap_or_else(|| block.data_ledgers[DataLedger::Submit].total_chunks);
+
+    // Compute partition_chunk_offset from ledger_chunk_offset:
+    //   ledger_chunk_offset = slot_start + partition_chunk_offset
+    //   partition_chunk_offset = ledger_chunk_offset - slot_start
+    //
+    // IMPORTANT: The tx's position must be within this partition's slot range.
+    let slot_end = slot_start + chunks_per_partition;
+    eyre::ensure!(
+        chunk_ledger_offset >= slot_start && chunk_ledger_offset < slot_end,
+        "chunk ledger offset {} is outside partition slot range [{}, {}) for partition {} (slot_index={})",
+        chunk_ledger_offset,
+        slot_start,
+        slot_end,
+        partition_hash,
+        slot_index
+    );
+    let partition_chunk_offset_u64 = chunk_ledger_offset - slot_start;
+    let partition_chunk_offset: u32 = u32::try_from(partition_chunk_offset_u64).map_err(|_| {
+        eyre!(
+            "partition_chunk_offset {} doesn't fit u32",
+            partition_chunk_offset_u64
+        )
+    })?;
+
+    // tx_path for a single-tx block


⚠️ Potential issue | 🟠 Major

Clarify/strengthen assumptions around tx_ledger_offset and slot window

craft_data_poa_solution_from_tx assumes that tx_ledger_offset (or block.data_ledgers[Submit].total_chunks when None) is exactly the ledger chunk offset of the tx’s first chunk within the chosen partition’s slot:

let slot_start = slot_index * num_partitions_per_slot * num_chunks_in_partition; let chunk_ledger_offset = tx_ledger_offset.unwrap_or_else(|| block.data_ledgers[DataLedger::Submit].total_chunks); let slot_end = slot_start + chunks_per_partition; ensure!( chunk_ledger_offset >= slot_start && chunk_ledger_offset < slot_end, "chunk ledger offset {} is outside partition slot range [...]", ... );

Two points to tighten:

Callers must pass the actual ledger offset. For mempool txs you’re assuming they will be placed exactly at prev_total_chunks; if the packing/ordering logic changes (or other txs are inserted ahead), this invariant breaks and the helper silently fails. It’s worth:

Documenting this requirement explicitly in the function docs, and/or

Adding a helper that derives the ledger offset from BlockIndex or the inclusion block post-factum, then calling this helper with a precise offset.

Explicitly enforce partition_chunk_offset < num_chunks_in_partition. You currently rely on the inequality above plus subtraction, but a direct assert/ensure on the derived partition_chunk_offset_u64 being < chunks_per_partition makes the intent clear and protects against misconfigured tx_ledger_offset.

This will make failures easier to diagnose and decouple the helper from fragile assumptions about block-construction order.

🤖 Prompt for AI Agents

In crates/chain/tests/utils.rs around lines 88 to 145, the function assumes tx_ledger_offset (or block.data_ledgers[Submit].total_chunks when None) exactly equals the ledger chunk offset of the tx’s first chunk in the partition slot and only implicitly checks the slot range; update the code and docs to (1) document in the function comment that callers must supply the precise ledger chunk offset for the tx (or call a new helper to derive it from a BlockIndex/inclusion block) so callers don’t rely on the fragile "prev_total_chunks" assumption, and (2) add an explicit eyre::ensure! check after computing partition_chunk_offset_u64 that partition_chunk_offset_u64 < chunks_per_partition (with a clear error message) before converting to u32 to make the intent and invariant explicit and fail fast on misconfiguration.

coderabbitai · 2025-12-29T18:30:00Z

crates/chain/tests/utils.rs

+    // Ensure recall range alignment for the current VDF step and partition.
+    let num_chunks_in_partition = node.node_ctx.config.consensus.num_chunks_in_partition;
+    let num_chunks_in_recall_range = node.node_ctx.config.consensus.num_chunks_in_recall_range;
+    let num_recall_ranges_in_partition =
+        (num_chunks_in_partition / num_chunks_in_recall_range) as usize;
+    // Compute VDF reset window start and collect recall steps for reconstruction
+    let num_rr_u64 = num_chunks_in_partition.div_ceil(num_chunks_in_recall_range);
+    let recall_reset_step = if current_step == 0 {
+        0
+    } else {
+        ((current_step - 1) / num_rr_u64) * num_rr_u64 + 1
+    };
+    let recall_steps = vdf_steps_guard
+        .read()
+        .get_steps(ii(recall_reset_step, current_step))?;
+    let recall_index = (u64::from(partition_chunk_offset) / num_chunks_in_recall_range) as usize;
+
+    eyre::ensure!(
+        recall_index < num_recall_ranges_in_partition,
+        "recall_index {} out of bounds [0, {}) (partition_chunk_offset={}, num_chunks_in_recall_range={}, num_chunks_in_partition={})",
+        recall_index,
+        num_recall_ranges_in_partition,
+        partition_chunk_offset,
+        num_chunks_in_recall_range,
+        num_chunks_in_partition
+    );
+
+    // Use the efficient-sampling library to compute the expected recall range
+    let expected_recall_index = irys_efficient_sampling::get_recall_range(
+        num_recall_ranges_in_partition,
+        &recall_steps,
+        &partition_hash,
+    )?;
+
+    // Verify our chunk is in the expected recall range (can't snap - that would change which chunk we're proving)
+    eyre::ensure!(
+        recall_index == expected_recall_index,
+        "recall range mismatch: chunk at recall_index {} but VDF expects {} (partition_chunk_offset={}, vdf_step={})",
+        recall_index,
+        expected_recall_index,
+        partition_chunk_offset,
+        current_step
+    );


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '180,250p'

Repository: Irys-xyz/irys

Length of output: 3430

🏁 Script executed:

rg -n "solution_context_with_poa_chunk|solution_context|capacity_chunk_solution" crates/chain/tests/utils.rs -A 5

Repository: Irys-xyz/irys

Length of output: 1472

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '3175,3220p'

Repository: Irys-xyz/irys

Length of output: 2085

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '320,380p'

Repository: Irys-xyz/irys

Length of output: 2633

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '378,450p'

Repository: Irys-xyz/irys

Length of output: 3655

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '160,200p'

Repository: Irys-xyz/irys

Length of output: 1917

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '120,165p'

Repository: Irys-xyz/irys

Length of output: 2189

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '70,125p'

Repository: Irys-xyz/irys

Length of output: 2749

🏁 Script executed:

cat -n crates/chain/tests/utils.rs | sed -n '3220,3280p'

Repository: Irys-xyz/irys

Length of output: 2590

🏁 Script executed:

rg -n "craft_data_poa_solution_from_tx" crates/chain/tests/ -B 3 -A 3

Repository: Irys-xyz/irys

Length of output: 1940

🏁 Script executed:

cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '140,180p'

Repository: Irys-xyz/irys

Length of output: 1745

🏁 Script executed:

python3 << 'EOF' # Simulate the scenario where num_chunks_in_partition is not an exact multiple num_chunks_in_partition = 10 num_chunks_in_recall_range = 3 # Current code (floor division) num_recall_ranges_floor = (num_chunks_in_partition // num_chunks_in_recall_range) print(f"Floor division: {num_recall_ranges_floor}") # Correct code (div_ceil) import math num_recall_ranges_ceil = math.ceil(num_chunks_in_partition / num_chunks_in_recall_range) print(f"Ceil division: {num_recall_ranges_ceil}") # Test what happens with chunk offsets print("\nChunk offset analysis:") for offset in range(num_chunks_in_partition): recall_index = offset // num_chunks_in_recall_range print(f" offset {offset}: recall_index={recall_index}, valid_with_floor={recall_index < num_recall_ranges_floor}, valid_with_ceil={recall_index < num_recall_ranges_ceil}") EOF

Repository: Irys-xyz/irys

Length of output: 143

🏁 Script executed:

cat << 'EOF' # Scenario: num_chunks_in_partition = 10, num_chunks_in_recall_range = 3 # Floor division: 10 / 3 = 3 # Ceil division: (10 + 3 - 1) / 3 = 12 / 3 = 4 # # With floor division (3 ranges): # - Range 0: chunks 0-2 # - Range 1: chunks 3-5 # - Range 2: chunks 6-8 # - Chunk 9 would have index 3, but 3 is not < 3, so it fails bounds check # # With ceil division (4 ranges): # - Range 0: chunks 0-2 # - Range 1: chunks 3-5 # - Range 2: chunks 6-8 # - Range 3: chunks 9 # - Chunk 9 would have index 3, and 3 < 4, so it passes echo "Math check: 10 / 3 = 3 (floor), ceil(10/3) = 4" echo "Last chunk (9) maps to recall_index = 9/3 = 3" echo "With floor (3): 3 < 3 is FALSE (rejected)" echo "With ceil (4): 3 < 4 is TRUE (accepted)" EOF

Repository: Irys-xyz/irys

Length of output: 795

🏁 Script executed:

rg -n "get_recall_range" crates/ --type rust | grep -E "fn|pub" | head -20

Repository: Irys-xyz/irys

Length of output: 418

🏁 Script executed:

cat -n crates/actors/src/block_validation.rs | sed -n '1035,1100p'

Repository: Irys-xyz/irys

Length of output: 3012

🏁 Script executed:

cat -n crates/efficient-sampling/src/lib.rs | sed -n '1,100p'

Repository: Irys-xyz/irys

Length of output: 5056

Fix recall-range count to use div_ceil

The recall-window setup has a critical bounds-checking bug:

let num_chunks_in_partition = ...; let num_chunks_in_recall_range = ...; let num_recall_ranges_in_partition = (num_chunks_in_partition / num_chunks_in_recall_range) as usize; // ... let recall_index = (u64::from(partition_chunk_offset) / num_chunks_in_recall_range) as usize; ensure!( recall_index < num_recall_ranges_in_partition, ... );

When num_chunks_in_partition is not an exact multiple of num_chunks_in_recall_range, floor division undercounts the number of ranges. For example, with 10 chunks and 3 chunks per range: floor division gives 3 ranges (indices 0–2), but chunks 0–8 map to indices 0–2, while chunk 9 maps to index 3, which then fails the bounds check as out of bounds.

Fix by using div_ceil:

let num_recall_ranges_in_partition = num_chunks_in_partition.div_ceil(num_chunks_in_recall_range) as usize;

Additionally, this helper assumes the VDF is already running and that get_steps() will succeed. Unlike solution_context_with_poa_chunk which explicitly calls node_ctx.start_vdf(), this function relies on callers to have initialized the VDF separately. This should either be started within the helper for consistency or clearly documented so callers understand the requirement.

🤖 Prompt for AI Agents

crates/chain/tests/utils.rs lines ~196-238: the calculation of num_recall_ranges_in_partition uses floor division which undercounts ranges when partition size is not an exact multiple of the recall-range size; change the calculation to use div_ceil on num_chunks_in_partition with num_chunks_in_recall_range (and cast to usize) so the bounds check is correct; additionally, either start the VDF unconditionally in this helper (call the existing node_ctx.start_vdf() before get_steps()) or document in the function comment that the VDF must be started by callers, so get_steps() cannot fail unexpectedly.

coderabbitai · 2025-12-29T18:30:00Z

crates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rs

+async fn deterministic_boundary_poa_crafted_snapshot() -> eyre::Result<()> {
+    const CHUNK_SIZE: u64 = 32;
+    // Synthetic timestamp for test blocks (Nov 2023, in milliseconds)
+    const TEST_TIMESTAMP_MS: u128 = 1_700_000_000_000;
+
+    let mut node_config = NodeConfig::testing();
+    node_config.consensus.get_mut().block_migration_depth = 1;
+    node_config.consensus.get_mut().chunk_size = CHUNK_SIZE;
+    node_config.consensus.get_mut().num_chunks_in_partition = 8;
+    node_config.consensus.get_mut().num_partitions_per_slot = 1;
+    node_config.consensus.get_mut().epoch.num_blocks_in_epoch = 3;
+
+    let (
+        tmp_dir,
+        block_index,
+        block_index_tx,
+        parent_epoch_snapshot,
+        miner_address,
+        consensus,
+        genesis_hash,
+        _block_index_handle,
+    ) = init_min_context(&node_config).await;
+
+    let partition_hash = parent_epoch_snapshot
+        .ledgers
+        .get_slots(DataLedger::Submit)
+        .first()
+        .and_then(|slot| slot.partitions.first().copied())
+        .expect("submit ledger slot 0 should exist with a partition");
+
+    let signer = IrysSigner::random_signer(&consensus);
+    let data = vec![0xAB; CHUNK_SIZE as usize];
+    let tx = {
+        let tx_unsigned = signer
+            .create_transaction(data.clone(), H256::zero())
+            .expect("create tx");
+        signer.sign_transaction(tx_unsigned).expect("sign tx")
+    };
+    let tx_header: DataTransactionHeader = tx.header.clone();
+    assert_eq!(
+        tx.proofs.len(),
+        1,
+        "1-chunk tx should have exactly one data proof"
+    );
+
+    let tx_headers = vec![tx_header.clone()];
+    let (tx_root, tx_paths) = DataTransactionLedger::merklize_tx_root(&tx_headers);
+    let tx_path = tx_paths[0].proof.clone();
+    let data_path = if let Some(p) = tx.proofs.first() {
+        p.proof.clone()
+    } else {
+        tx_path.clone()
+    };
+
+    let mut partition_chunk_offset: u32 = 0;
+
+    let parent_slot_index = parent_epoch_snapshot
+        .partition_assignments
+        .get_assignment(partition_hash)
+        .and_then(|pa| pa.slot_index)
+        .unwrap_or(0) as u64;
+
+    let ledger_chunk_offset =
+        parent_slot_index * consensus.num_partitions_per_slot * consensus.num_chunks_in_partition
+            + partition_chunk_offset as u64;
+
+    let prelude_size_bytes = (ledger_chunk_offset as usize) * (CHUNK_SIZE as usize);
+    let prelude_tx_header = if prelude_size_bytes > 0 {
+        let prelude_data = vec![0xCD; prelude_size_bytes];
+        let pre_tx = signer
+            .create_transaction(prelude_data, H256::zero())
+            .expect("create prelude tx");
+        let pre_tx = signer.sign_transaction(pre_tx).expect("sign prelude tx");
+        Some(pre_tx.header)
+    } else {
+        None
+    };
+
+    let prelude_headers: Vec<DataTransactionHeader> =
+        prelude_tx_header.clone().into_iter().collect();
+    let (prelude_tx_root, _prelude_paths) =
+        DataTransactionLedger::merklize_tx_root(&prelude_headers);
+
+    let prelude_publish_ledger = DataTransactionLedger {
+        ledger_id: DataLedger::Publish.into(),
+        tx_root: H256::zero(),
+        tx_ids: H256List(Vec::new()),
+        total_chunks: 0,
+        expires: None,
+        proofs: None,
+        required_proof_count: Some(1),
+    };
+    let prelude_submit_ledger = DataTransactionLedger {
+        ledger_id: DataLedger::Submit.into(),
+        tx_root: prelude_tx_root,
+        tx_ids: H256List(prelude_tx_header.into_iter().map(|h| h.id).collect()),
+        total_chunks: ledger_chunk_offset,
+        expires: Some(41_000_000),
+        proofs: None,
+        required_proof_count: None,
+    };
+    let prelude_block_hash = H256::from_slice(&[1_u8; 32]);
+    let mut prelude_block_v = IrysBlockHeader::default();
+    {
+        let inner = &mut *prelude_block_v;
+        inner.height = 1;
+        inner.reward_address = miner_address;
+        inner.poa = irys_types::PoaData {
+            tx_path: None,
+            data_path: None,
+            chunk: None,
+            ledger_id: None,
+            partition_chunk_offset: 0,
+            partition_hash,
+        };
+        inner.block_hash = prelude_block_hash;
+        inner.previous_block_hash = genesis_hash;
+        inner.previous_cumulative_diff = U256::from(4000);
+        inner.miner_address = miner_address;
+        inner.signature = Signature::test_signature().into();
+        inner.timestamp = TEST_TIMESTAMP_MS.saturating_sub(1).into();
+        inner.data_ledgers = vec![prelude_publish_ledger, prelude_submit_ledger];
+    }
+    let prelude_block = Arc::new(prelude_block_v);
+
+    {
+        let block_height_pre_migration = {
+            let block_index_guard = BlockIndexReadGuard::new(block_index.clone());
+            let bi = block_index_guard.read();
+            bi.num_blocks().saturating_sub(1)
+        };
+        let prelude_all_txs = Arc::new(prelude_headers.clone());
+        let (tx_prelude, rx_prelude) = tokio::sync::oneshot::channel();
+        block_index_tx
+            .send(BlockIndexServiceMessage::MigrateBlock {
+                block_header: prelude_block.clone(),
+                all_txs: prelude_all_txs,
+                response: tx_prelude,
+            })
+            .expect("send migrate prelude block");
+        rx_prelude
+            .await
+            .expect("receive migration result (prelude)")
+            .expect("migrate prelude block");
+
+        let start = std::time::Instant::now();
+        let timeout = std::time::Duration::from_secs(10);
+        loop {
+            let current_height = {
+                let block_index_guard = BlockIndexReadGuard::new(block_index.clone());
+                let bi = block_index_guard.read();
+                bi.num_blocks().saturating_sub(1)
+            };
+            if current_height != block_height_pre_migration {
+                break;
+            }
+            if start.elapsed() > timeout {
+                panic!(
+                    "Timed out waiting for prelude block migration after {:?}",
+                    timeout
+                );
+            }
+            tokio::time::sleep(tokio::time::Duration::from_millis(200)).await;
+        }
+    }
+
+    let prev_total = {
+        let prelude_block_index_guard = BlockIndexReadGuard::new(block_index.clone());
+        let prelude_index = prelude_block_index_guard.read();
+        let latest_height = prelude_index.latest_height();
+        let latest_item = prelude_index
+            .get_item(latest_height)
+            .expect("latest block index item should exist");
+        latest_item.ledgers[DataLedger::Submit as usize].total_chunks
+    };
+
+    let slot_start =
+        parent_slot_index * consensus.num_partitions_per_slot * consensus.num_chunks_in_partition;
+    assert!(
+        prev_total == slot_start,
+        "prelude did not add expected chunks: prev_total={} slot_start={}",
+        prev_total,
+        slot_start
+    );
+    partition_chunk_offset = 0;
+
+    let mut poa_chunk: Vec<u8> = data.clone();
+    {
+        let mut entropy_chunk = Vec::<u8>::with_capacity(CHUNK_SIZE as usize);
+        compute_entropy_chunk(
+            miner_address,
+            partition_chunk_offset as u64,
+            partition_hash.into(),
+            consensus.entropy_packing_iterations,
+            CHUNK_SIZE as usize,
+            &mut entropy_chunk,
+            consensus.chain_id,
+        );
+        xor_vec_u8_arrays_in_place(&mut poa_chunk, &entropy_chunk);
+    }
+
+    let submit_ledger = DataTransactionLedger {
+        ledger_id: DataLedger::Submit.into(),
+        tx_root,
+        tx_ids: H256List(vec![tx_header.id]),
+        total_chunks: slot_start + 1,
+        expires: Some(42_000_000),
+        proofs: None,
+        required_proof_count: None,
+    };
+    let publish_ledger = DataTransactionLedger {
+        ledger_id: DataLedger::Publish.into(),
+        tx_root: H256::zero(),
+        tx_ids: H256List(Vec::new()),
+        total_chunks: 0,
+        expires: None,
+        proofs: None,
+        required_proof_count: Some(1),
+    };
+    let mut synthetic_block_v = IrysBlockHeader::default();
+    {
+        let inner = &mut *synthetic_block_v;
+        inner.height = 2;
+        inner.reward_address = miner_address;
+        inner.poa = irys_types::PoaData {
+            tx_path: Some(Base64(tx_path)),
+            data_path: Some(Base64(data_path)),
+            chunk: Some(Base64(poa_chunk.clone())),
+            ledger_id: Some(DataLedger::Submit.into()),
+            partition_chunk_offset,
+            partition_hash,
+        };
+        inner.block_hash = H256::from_slice(&[2_u8; 32]);
+        inner.previous_block_hash = prelude_block_hash;
+        inner.previous_cumulative_diff = U256::from(4000);
+        inner.miner_address = miner_address;
+        inner.signature = Signature::test_signature().into();
+        inner.timestamp = TEST_TIMESTAMP_MS.into();
+        inner.data_ledgers = vec![publish_ledger, submit_ledger];
+    }
+    let synthetic_block = Arc::new(synthetic_block_v);
+
+    let txs = Arc::new(vec![tx_header.clone()]);
+    let (tx_migrate, rx_migrate) = tokio::sync::oneshot::channel();
+    block_index_tx
+        .send(BlockIndexServiceMessage::MigrateBlock {
+            block_header: synthetic_block.clone(),
+            all_txs: txs.clone(),
+            response: tx_migrate,
+        })
+        .expect("send migrate block");
+    rx_migrate
+        .await
+        .expect("receive migration result")
+        .expect("migrate synthetic block");
+
+    let block_index_guard = BlockIndexReadGuard::new(block_index.clone());
+
+    poa_is_valid(
+        &synthetic_block.poa,
+        &block_index_guard,
+        &parent_epoch_snapshot,
+        &consensus,
+        &miner_address,
+    )
+    .expect("Parent epoch snapshot PoA should be valid for crafted block");
+
+    // Build child-like snapshot with different slot index to simulate epoch-boundary slot movement
+    let mut child_epoch_snapshot = parent_epoch_snapshot.clone();
+    let mut pa = child_epoch_snapshot
+        .partition_assignments
+        .get_assignment(partition_hash)
+        .expect("partition assignment should exist for partition_hash");
+    pa.slot_index = Some(pa.slot_index.unwrap_or(0).wrapping_add(1));
+    child_epoch_snapshot
+        .partition_assignments
+        .data_partitions
+        .insert(partition_hash, pa);
+
+    // Child-like snapshot PoA validation MUST fail
+    match poa_is_valid(
+        &synthetic_block.poa,
+        &block_index_guard,
+        &child_epoch_snapshot,
+        &consensus,
+        &miner_address,
+    ) {
+        Err(
+            PreValidationError::MerkleProofInvalid(_)
+            | PreValidationError::PoAChunkOffsetOutOfBlockBounds
+            | PreValidationError::BlockBoundsLookupError(_),
+        ) => {
+            info!("Child-like snapshot rejected PoA as expected due to slot change/out-of-bounds");
+        }
+        Ok(()) => panic!("Child-like snapshot unexpectedly validated crafted PoA"),
+        Err(e) => panic!("Unexpected child-like PoA error: {:?}", e),
+    }
+
+    drop(block_index);
+    drop(tmp_dir);
+    Ok(())
+}


🧹 Nitpick | 🔵 Trivial

Deterministic crafted-snapshot PoA test is well-structured and matches the intended invariant

The test:

Constructs a minimal BlockIndex + EpochSnapshot context.

Ensures the prelude block advances total_chunks to exactly slot_start, then places a single-chunk Submit tx at slot_start in the synthetic block.

Builds PoaData consistent with that layout and proves it valid under the parent snapshot.

Mutates only the partition’s slot_index in a cloned snapshot and correctly accepts MerkleProofInvalid | PoAChunkOffsetOutOfBlockBounds | BlockBoundsLookupError as expected child-invalid outcomes.

This is a solid, deterministic guard against regressions in snapshot selection. If you want to future-proof further, you could assert that the selected partition’s slot_index is initially non-None and document that this test assumes at least one Submit partition exists in the genesis snapshot.

🤖 Prompt for AI Agents

In crates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rs around lines 30 to 331, add an explicit assertion that the selected partition assignment has a non-None slot_index and that the parent_epoch_snapshot contains at least one Submit ledger slot/partition before using it; if either precondition fails, fail the test with a clear message. Modify the setup so you retrieve the partition assignment, assert pa.slot_index.is_some() and that parent_epoch_snapshot.ledgers.get_slots(DataLedger::Submit).first() exists (and contains a partition), then proceed — this documents and enforces the test’s assumption and prevents silent panics if fixtures change.

coderabbitai · 2025-12-29T18:30:00Z

crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs

+    // Derive slot_index and partition_chunk_offset from the parent snapshot and block
+    let parent_assignment = parent_epoch_snapshot
+        .partition_assignments
+        .get_assignment(partition_hash)
+        .expect("partition assignment must exist in parent snapshot");
+    let slot_index = parent_assignment
+        .slot_index
+        .expect("slot_index must exist for data partition") as u64;
+    let cpp = node.node_ctx.config.consensus.num_chunks_in_partition;
+    let npps = node.node_ctx.config.consensus.num_partitions_per_slot;
+    let slot_start = slot_index * npps * cpp;
+    let prev_total_chunks = parent_epoch_block.data_ledgers[DataLedger::Submit].total_chunks;
+    assert!(
+        prev_total_chunks >= slot_start,
+        "prev_total_chunks {} before slot_start {}",
+        prev_total_chunks,
+        slot_start
+    );
+    let partition_chunk_offset_u64 = prev_total_chunks - slot_start;
+    let partition_chunk_offset: u32 = partition_chunk_offset_u64 as u32;
+


⚠️ Potential issue | 🟠 Major

Guard partition_chunk_offset against exceeding the partition’s slot range

Here you only assert prev_total_chunks >= slot_start and then derive:

let partition_chunk_offset_u64 = prev_total_chunks - slot_start; let partition_chunk_offset: u32 = partition_chunk_offset_u64 as u32;

If prev_total_chunks ever reaches or exceeds slot_start + num_chunks_in_partition, partition_chunk_offset will point past the partition’s slot window, making the PoA invalid even under the parent snapshot (or silently truncating via as u32 if this code changes).

Suggest tightening the invariant to keep the crafted PoA within the intended slot:

let slot_end = slot_start + cpp; assert!( prev_total_chunks >= slot_start && prev_total_chunks < slot_end, "prelude did not add expected chunks: prev_total={} in [{}, {})", prev_total_chunks, slot_start, slot_end, ); let partition_chunk_offset_u64 = prev_total_chunks - slot_start; let partition_chunk_offset: u32 = partition_chunk_offset_u64 as u32;

This keeps the test robust to future tweaks in how total_chunks is advanced.

🤖 Prompt for AI Agents

In crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs around lines 96 to 116, tighten the invariant that computes partition_chunk_offset so it cannot point past the partition’s slot: compute slot_end = slot_start + cpp and replace the existing assert with one that checks prev_total_chunks >= slot_start && prev_total_chunks < slot_end (including a clear message showing prev_total_chunks, slot_start, slot_end); then derive partition_chunk_offset_u64 = prev_total_chunks - slot_start and cast to u32 as before. Ensure the assert message matches the new bounds and keeps the test robust to future changes.

craigmayhew force-pushed the craig/flaky-heavy_perm_fee_refund_for_unpromoted_tx branch from 2baf880 to baf62e4 Compare November 6, 2025 09:40

glottologist force-pushed the craig/flaky-heavy_perm_fee_refund_for_unpromoted_tx branch from f8f2870 to be658ab Compare December 24, 2025 14:47

craigmayhew and others added 20 commits December 29, 2025 16:47

add test data_poa_boundary_acceptance() and util fn craft_data_poa_so…

9e2f13d

…lution_from_tx()

remove superfluous cast to u64

9390829

switch to using deterministic_boundary_poa_crafted_snapshot

1d9305c

deterministic_boundary_poa_crafted_snapshot() passes

321a918

fix unused warning

d0a3c68

update comments

8a94eae

comments and clippy

3eb8a91

typo

fef4df7

test: add regression_boundary_poa_parent_snapshot

9fff24c

correct debug output

f61cb47

remove debug

c658bdf

chore: move tests to more appropriate test sub folders

01935d4

test: improved test name to integration

967d5e3

docs: update comments

c3f68c0

test: additional assert for proofs len

efdec49

docs: correct heavy_poa_parent_snapshot_integration description

8cf1bb6

docs: corrections

21b5aac

WIP: debug added in trying to fix "Error: Failed to reach target heig…

8dee343

…ht 16 after 21 retries". Just prior to noted out panic.

fix(poa/tests): fixup poa boundary tests

302c5fe

cre: fmt

fef3c63

glottologist force-pushed the craig/flaky-heavy_perm_fee_refund_for_unpromoted_tx branch from 14b259a to fef3c63 Compare December 29, 2025 16:47

glottologist marked this pull request as ready for review December 29, 2025 18:18

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

Conversation

craigmayhew commented Sep 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For whomenever may pick this up: the e2e test is not working and I have pushed all my debug. The other two tests look LGTM.

Summary by CodeRabbit

Uh oh!

craigmayhew commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

craigmayhew commented Oct 28, 2025

Uh oh!

craigmayhew commented Oct 28, 2025

Uh oh!

craigmayhew commented Oct 29, 2025

Uh oh!

craigmayhew commented Nov 6, 2025

Uh oh!

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

craigmayhew commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

craigmayhew commented Oct 15, 2025 •

edited

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading