test: poa should be validated against parent blocks epoch snapshot, not current#822
test: poa should be validated against parent blocks epoch snapshot, not current#822craigmayhew wants to merge 20 commits intomasterfrom
Conversation
|
TRY 2 FAIL [ 6.738s] irys-chain::mod term_ledger_expiry::data_poa_boundary_acceptance::data_poa_boundary_acceptance |
|
Note these flaky tests on the last run: |
|
note more flakes: FLAKY (2/2) [ 48.225s] irys-chain::mod::block_production::basic_contract::heavy_test_erc20 |
|
note a fail and some flake: |
2baf880 to
baf62e4
Compare
|
Note: current state of e2e test: |
📝 WalkthroughWalkthroughThis PR introduces comprehensive test coverage for Proof of Availability (PoA) validation at epoch boundaries. It adds new test files covering deterministic and integration scenarios, provides utility functions for PoA testing workflows, and enables the irys-efficient-sampling workspace dependency for development. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f8f2870 to
be658ab
Compare
…ht 16 after 21 retries". Just prior to noted out panic.
14b259a to
fef3c63
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/chain/tests/utils.rs (1)
3168-3242:solution_context_with_poa_chunkcorrectly builds deterministic PoA solutions; consider emphasizing its constraintsThis helper cleanly:
- Starts/stops VDF internally and waits for a
(step-1, step)pair with at least two steps.- Computes checkpoints for
step-1.- Uses a fixed
partition_hash = H256::zeroandpartition_chunk_offset = 0and buildssolution_hash = sha256(poa_chunk || offset_le || vdf_output).Two suggestions:
- Document that recall-range alignment is intentionally ignored. The comment mentions “without recall-range dependency”, but it’s helpful to be explicit that this must only be used in tests where PoA verification is disabled, otherwise PoA will (correctly) fail.
- Guard against
steps.len() < 2defensively. You already checksteps.len() >= 2in the loop, but an additional assert before indexing keeps the function future-proof ifget_stepssemantics ever change.Otherwise the construction is consistent with how
solution_hashis used elsewhere.
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
crates/chain/Cargo.tomlcrates/chain/tests/block_production/data_poa_boundary_acceptance.rscrates/chain/tests/block_production/mod.rscrates/chain/tests/utils.rscrates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rscrates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rscrates/chain/tests/validation/mod.rs
🧰 Additional context used
🧬 Code graph analysis (4)
crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs (3)
crates/actors/src/block_validation.rs (1)
poa_is_valid(1063-1196)crates/types/src/block.rs (1)
merklize_tx_root(619-635)crates/types/src/config/node.rs (2)
consensus_config(658-674)miner_address(689-691)
crates/chain/tests/block_production/mod.rs (1)
crates/chain/tests/block_production/data_poa_boundary_acceptance.rs (1)
data_poa_boundary_acceptance(22-237)
crates/chain/tests/block_production/data_poa_boundary_acceptance.rs (1)
crates/chain/tests/utils.rs (5)
craft_data_poa_solution_from_tx(88-285)get_epoch_snapshot(288-299)read_block_from_state(3288-3343)submit_solution_to_block_producer(302-318)new_genesis(549-552)
crates/chain/tests/validation/mod.rs (2)
crates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rs (1)
deterministic_boundary_poa_crafted_snapshot(30-331)crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs (1)
heavy_poa_parent_snapshot_integration(35-274)
🔇 Additional comments (4)
crates/chain/Cargo.toml (1)
72-88: Dev-dependency wiring forirys-efficient-samplinglooks correctThe crate is correctly added as a workspace dev-dependency and matches the
irys_efficient_samplingusage in tests; no further changes needed.crates/chain/tests/block_production/mod.rs (1)
7-7: Module wiring fordata_poa_boundary_acceptanceis appropriatePrivate module inclusion cleanly hooks the new e2e test into the block_production test suite.
crates/chain/tests/validation/mod.rs (1)
3-4: New PoA validation test modules correctly registeredThe added
mod deterministic_boundary_poa_crafted_snapshot;andmod heavy_poa_parent_snapshot_integration;correctly integrate the new PoA boundary tests into the validation suite without affecting public APIs.crates/chain/tests/utils.rs (1)
1323-1327: Timeout message improvement is clearer and more actionableThe new message:
"Timed out waiting for {} ingress proof(s) within {} seconds"is more explicit than the previous wording and should make debugging test timeouts easier. No further changes needed here.
| // Post TWO txs so data extends into slot 1 (which should have partitions). | ||
| let _tx1 = node | ||
| .post_data_tx(anchor, vec![7_u8; DATA_SIZE], &user_signer) | ||
| .await; | ||
| let _ = node.mine_block().await?; | ||
| let tx = node | ||
| .post_data_tx(anchor, vec![9_u8; DATA_SIZE], &user_signer) | ||
| .await; | ||
| let _ = node.mine_block().await?; | ||
| // tx2 starts at slot 1 | ||
| let tx_ledger_offset: u64 = slot_size; | ||
|
|
There was a problem hiding this comment.
tx_ledger_offset = slot_size is a brittle assumption about ledger layout
Here you assume the second tx’s first chunk starts exactly at slot_size:
// tx2 starts at slot 1
let tx_ledger_offset: u64 = slot_size;That’s only guaranteed if:
- There is no prior Submit data occupying those chunks, and
- Each of the two txs contributes exactly one full partition’s worth of chunks in order, with no interleaving.
Given the amount of prior epoch activity (stake/pledge, capacity assignments, backfill, etc.), this coupling to implementation details of packing is fragile. If the packing or ledger layout changes (or other txs are present), the actual ledger offset for tx can differ, and the crafted PoA will be misaligned even when the parent snapshot selection is correct.
To make the test robust, consider:
- Deriving the actual ledger chunk offset for
txfromBlockIndexonce its inclusion block is known (e.g., viaget_block_boundsor a small helper), then feeding that concrete offset intocraft_data_poa_solution_from_tx, or - Asserting (via introspection) that the computed offset really matches
slot_sizebefore crafting the PoA, and failing fast with a clear message if not.
This will decouple the regression check from low-level packing assumptions and should reduce false negatives.
| let miner_addr = node.node_ctx.config.node_config.reward_address; | ||
|
|
||
| // Verify partition slot alignment | ||
| { | ||
| let pa = parent_snapshot | ||
| .partition_assignments | ||
| .get_assignment(partition_hash) | ||
| .expect("partition assignment must exist"); | ||
| let slot_index = pa | ||
| .slot_index | ||
| .expect("slot_index must exist for data partition") as u64; | ||
| assert_eq!( | ||
| slot_index as usize, target_slot_idx, | ||
| "partition slot_index should match the tx's slot" | ||
| ); | ||
| } |
There was a problem hiding this comment.
Verify miner address used for PoA entropy matches the block’s miner address
The PoA chunk entropy is keyed off the miner address, and craft_data_poa_solution_from_tx uses the miner_addr you pass to derive the entropy:
let miner_addr = node.node_ctx.config.node_config.reward_address;
...
let sol_result = craft_data_poa_solution_from_tx(
&node,
&tx,
partition_hash,
miner_addr,
Some(tx_ledger_offset),
)Elsewhere (e.g., solution_context/capacity_chunk_solution) the miner address is taken from node_ctx.config.node_config.miner_address(). If reward_address and miner_address() ever diverge, the entropy used when crafting the PoA will not match the address passed into poa_is_valid during validation, and you’ll get PoAChunkHashMismatch even when everything else is aligned.
Please double-check that:
SolutionContext.mining_address(filled by the helper) becomes the same addresspoa_is_validuses asminer_address, and- That this address is the one you intend to use here (likely
miner_address()rather thanreward_address, unless they’re guaranteed equal in tests).
If they differ, switch to node.node_ctx.config.node_config.miner_address() for entropy and solution crafting to avoid subtle mismatches.
| // Enable mining for VDF advancement while retrying solutions | ||
| node.node_ctx.start_mining()?; | ||
|
|
||
| let solution = { | ||
| let max_attempts = 500_usize; | ||
| let mut attempt = 0_usize; | ||
| let delay_ms = 50_u64; | ||
| let parent_block = node | ||
| .get_block_by_height(node.get_canonical_chain_height().await) | ||
| .await?; | ||
| let parent_diff = parent_block.diff; | ||
|
|
||
| loop { | ||
| let sol_result = craft_data_poa_solution_from_tx( | ||
| &node, | ||
| &tx, | ||
| partition_hash, | ||
| miner_addr, | ||
| Some(tx_ledger_offset), | ||
| ) | ||
| .await; | ||
|
|
||
| match sol_result { | ||
| Ok(sol) => { | ||
| let solution_diff = irys_types::u256_from_le_bytes(&sol.solution_hash.0); | ||
| if solution_diff >= parent_diff { | ||
| tracing::debug!(attempt, "Found valid solution after attempts"); | ||
| break sol; | ||
| } | ||
| } | ||
| Err(e) => { | ||
| let err_msg = e.to_string(); | ||
| if !err_msg.contains("recall range mismatch") { | ||
| return Err(e); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| attempt += 1; | ||
| if attempt >= max_attempts { | ||
| eyre::bail!( | ||
| "Failed to craft solution after {} attempts (last error: recall range or difficulty mismatch)", | ||
| attempt | ||
| ); | ||
| } | ||
|
|
||
| // Short constant delay to allow VDF advancement between attempts | ||
| tokio::time::sleep(std::time::Duration::from_millis(delay_ms)).await; | ||
| } | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the test file and check if it exists
find . -name "data_poa_boundary_acceptance.rs" -type fRepository: Irys-xyz/irys
Length of output: 126
🏁 Script executed:
#!/bin/bash
# Get repo structure overview
ls -la crates/chain/tests/block_production/ 2>/dev/null | head -20Repository: Irys-xyz/irys
Length of output: 1223
🏁 Script executed:
#!/bin/bash
# Search for VDF-related code to understand the mechanism
rg "start_vdf|current_step|vdf_advance" --type rust -i | head -30Repository: Irys-xyz/irys
Length of output: 2378
🏁 Script executed:
#!/bin/bash
# Look for difficulty configuration in tests
rg "max_attempts|500" crates/chain/tests/block_production/ --type rust -B 2 -A 2Repository: Irys-xyz/irys
Length of output: 1534
🏁 Script executed:
#!/bin/bash
# Read the test file to see full context
wc -l crates/chain/tests/block_production/data_poa_boundary_acceptance.rsRepository: Irys-xyz/irys
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Read the test file starting from earlier to understand context
cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | head -200Repository: Irys-xyz/irys
Length of output: 9263
🏁 Script executed:
#!/bin/bash
# Look for the function containing lines 151-200
cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '1,250p'Repository: Irys-xyz/irys
Length of output: 11366
🏁 Script executed:
#!/bin/bash
# Check if start_mining calls start_vdf or if they're separate
rg "start_mining|start_vdf" crates/chain/src/chain.rs -A 10 | head -50Repository: Irys-xyz/irys
Length of output: 962
🏁 Script executed:
#!/bin/bash
# Look at the craft_data_poa_solution_from_tx implementation
rg "fn craft_data_poa_solution_from_tx" crates/chain/tests/ -A 40 | head -100Repository: Irys-xyz/irys
Length of output: 2967
🏁 Script executed:
#!/bin/bash
# Check VDF advancement and what makes current_step change
rg "global_step" crates/chain/tests/utils.rs -B 5 -A 5 | head -80Repository: Irys-xyz/irys
Length of output: 2373
🏁 Script executed:
#!/bin/bash
# Look for difficulty settings and how they impact solution finding
rg "sha_1s_difficulty|solution_diff|parent_diff" crates/ --type rust | head -30Repository: Irys-xyz/irys
Length of output: 2567
🏁 Script executed:
#!/bin/bash
# Check if there's a direct call to start_vdf before the loop
cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '145,165p'Repository: Irys-xyz/irys
Length of output: 853
🏁 Script executed:
#!/bin/bash
# Look at what start_mining does more carefully
cat -n crates/chain/src/chain.rs | grep -A 15 "pub fn start_mining"Repository: Irys-xyz/irys
Length of output: 809
🏁 Script executed:
#!/bin/bash
# Check if there are any asserts on VDF progression in similar test patterns
rg "global_step|current_step" crates/chain/tests/ -B 3 -A 3 | head -100Repository: Irys-xyz/irys
Length of output: 6192
🏁 Script executed:
#!/bin/bash
# Look at how other tests handle the solution loop and difficulty
rg "max_attempts|craft_.*_solution" crates/chain/tests/block_production/ --type rust -B 5 -A 10 | head -150Repository: Irys-xyz/irys
Length of output: 5444
🏁 Script executed:
#!/bin/bash
# Check if there are any recent test failures or flakiness issues mentioned
git log --oneline --all --grep="flak\|data_poa_boundary\|solution.*fail" | head -20Repository: Irys-xyz/irys
Length of output: 917
🏁 Script executed:
#!/bin/bash
# Search for examples of VDF progression assertions in test code
rg "while.*global_step|assert.*global_step|greater.*step" crates/chain/tests/ --type rust -B 3 -A 3 | head -60Repository: Irys-xyz/irys
Length of output: 2048
🏁 Script executed:
#!/bin/bash
# Look for any recent changes to the data_poa_boundary_acceptance test
git log --oneline -p crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | head -100Repository: Irys-xyz/irys
Length of output: 4493
🏁 Script executed:
#!/bin/bash
# Check what the craft_data_poa_solution_from_tx function returns and how it handles VDF step changes
sed -n '560,650p' crates/chain/tests/utils.rsRepository: Irys-xyz/irys
Length of output: 3380
🏁 Script executed:
#!/bin/bash
# Look at the full craft_data_poa_solution_from_tx to see if it handles VDF progression
sed -n '440,530p' crates/chain/tests/utils.rsRepository: Irys-xyz/irys
Length of output: 3149
🏁 Script executed:
#!/bin/bash
# Check if there are assertions about VDF step advancement in the solution finding loop
rg "attempt|step.*advance|steps.*change" crates/chain/tests/block_production/data_poa_boundary_acceptance.rs -B 2 -A 2Repository: Irys-xyz/irys
Length of output: 973
🏁 Script executed:
#!/bin/bash
# Get the full context of how solution_crafting_loop_capacity_poa handles VDF
rg "solution_crafting_loop_capacity_poa" crates/chain/tests/utils.rs -A 100 | head -120Repository: Irys-xyz/irys
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check if craft_data_poa_solution_from_tx actually uses new VDF steps or reuses the same one
cat -n crates/chain/tests/utils.rs | sed -n '295,380p'Repository: Irys-xyz/irys
Length of output: 3606
🏁 Script executed:
#!/bin/bash
# Look at the actual implementation more carefully to see step advancement
rg "fn craft_data_poa_solution_from_tx" crates/chain/tests/utils.rs -A 80 | head -100Repository: Irys-xyz/irys
Length of output: 3264
🏁 Script executed:
#!/bin/bash
# See the rest of craft_data_poa_solution_from_tx to understand the full flow
sed -n '100,200p' crates/chain/tests/utils.rs | grep -A 150 "pub async fn craft_data_poa_solution_from_tx" | tail -80Repository: Irys-xyz/irys
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Get a longer look at the craft_data_poa_solution_from_tx function
cat -n crates/chain/tests/utils.rs | sed -n '100,280p'Repository: Irys-xyz/irys
Length of output: 8305
🏁 Script executed:
#!/bin/bash
# Check if there are any VDF step advancement checks/assertions we might have missed
rg "assert|expect" crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | grep -i vdfRepository: Irys-xyz/irys
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Understand the VDF progression rate - how fast does global_step advance with sha_1s_difficulty=10_000?
rg "sha_1s_difficulty.*10.*000|vdf.*step.*time|VDF.*advance" crates/ --type rust -B 2 -A 2 | head -40Repository: Irys-xyz/irys
Length of output: 2755
🏁 Script executed:
#!/antml:bash
# Check if 50ms is sufficient for VDF to advance by one step
rg "Duration.*millis|sleep.*[0-9]" crates/chain/tests/ --type rust | grep -A 2 -B 2 "200\|500" | head -30Repository: Irys-xyz/irys
Length of output: 2866
🏁 Script executed:
#!/bin/bash
# Check if there's monitoring of VDF step progression in the test loop itself
cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '154,200p'Repository: Irys-xyz/irys
Length of output: 2001
🏁 Script executed:
#!/antml:bash
# See how long 500 attempts with 50ms delays would take
python3 <<'EOF'
attempts = 500
delay_ms = 50
total_ms = attempts * delay_ms
total_secs = total_ms / 1000
print(f"Max loop time: {total_ms}ms = {total_secs}s")
print(f"This assumes every attempt takes negligible time")
print(f"Test timeout is typically 10-30 seconds")
EOFRepository: Irys-xyz/irys
Length of output: 143
🏁 Script executed:
#!/bin/bash
# Check what happens if VDF step doesn't advance - does the loop get stuck?
# Look at error handling in the test
cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '200,210p'Repository: Irys-xyz/irys
Length of output: 495
Solution loop lacks VDF progression verification despite existing mitigations
The test already calls start_mining() (which starts VDF) and sets difficulty to 10,000. However, the core flakiness risk remains unaddressed: there is no assertion or explicit monitoring that global_step actually advances between the 500 solution attempts.
Each iteration of craft_data_poa_solution_from_tx() reads the current VDF step independently, but if VDF advancement stalls, the same steps[1] seed will be used repeatedly, generating identical solution hashes and triggering "recall range mismatch" errors. With 50ms delays between attempts (500 × 50ms = 25 seconds), the loop can timeout before finding a valid solution.
To stabilize this test, add an explicit check that global_step increases across attempts, or fail fast if VDF is not progressing. Alternatively, increase the delay between attempts or further reduce difficulty to reduce reliance on VDF progression.
| // reusable debug asserts helper | ||
| async fn debug_asserts_first_slots( | ||
| node: &IrysNodeTest<IrysNodeCtx>, | ||
| expected_submit_slots: usize, | ||
| expected_submit_partitions: usize, | ||
| expected_publish_slots: usize, | ||
| expected_publish_partitions: usize, | ||
| ) -> eyre::Result<()> { | ||
| let snapshot = get_epoch_snapshot(node).await?; | ||
| let submit_slots = snapshot.ledgers.get_slots(DataLedger::Submit); | ||
| let publish_slots = snapshot.ledgers.get_slots(DataLedger::Publish); | ||
|
|
||
| assert_eq!( | ||
| expected_submit_slots, | ||
| submit_slots.len(), | ||
| "We expected {} Submit slot(s)", | ||
| expected_submit_slots | ||
| ); | ||
| assert_eq!( | ||
| expected_submit_partitions, | ||
| submit_slots | ||
| .first() | ||
| .map(|s| s.partitions.len()) | ||
| .unwrap_or_default(), | ||
| "We expected {} Submit partition(s)", | ||
| expected_submit_partitions | ||
| ); | ||
| assert_eq!( | ||
| expected_publish_slots, | ||
| publish_slots.len(), | ||
| "We expected {} publish slot(s)", | ||
| expected_publish_slots | ||
| ); | ||
| assert_eq!( | ||
| expected_publish_partitions, | ||
| publish_slots | ||
| .first() | ||
| .map(|s| s.partitions.len()) | ||
| .unwrap_or_default(), | ||
| "We expected {} publish partition(s)", | ||
| expected_publish_partitions | ||
| ); | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Snapshot-shape debug assertions are helpful but tightly coupled to config
debug_asserts_first_slots enforces exact numbers of slots and partitions:
assert_eq!(expected_submit_slots, submit_slots.len(), ...);
assert_eq!(expected_submit_partitions, submit_slots.first().map(|s| s.partitions.len()).unwrap_or_default(), ...);
...With expected_* = 1 throughout the test, this effectively asserts that the first Submit/Publish slot always has a single partition. That’s reasonable for the specific config you set up here, but quite sensitive to any changes in epoch/backfill or capacity partition logic.
Given this is primarily a regression test for snapshot selection, consider:
- Either relaxing these checks (e.g., assert “>= 1 partition” where that’s all you strictly need), or
- Adding a brief comment near the call sites noting that these asserts encode invariants of the current test config and will need adjustment if the consensus layout changes.
Not strictly wrong, but it’s an easy source of future breakage unrelated to the PoA regression itself.
🤖 Prompt for AI Agents
In crates/chain/tests/block_production/data_poa_boundary_acceptance.rs around
lines 281 to 324, the debug_asserts_first_slots helper rigidly asserts exact
slot/partition counts which couples the test to current epoch/partition layout;
change the equality assertions to more robust checks (e.g.,
assert!(submit_slots.len() >= expected_submit_slots) and
assert!(submit_slots.first().map(|s| s.partitions.len()).unwrap_or_default() >=
expected_submit_partitions) and similarly for publish slots/partitions) or, if
you prefer to keep exact counts, add a short comment at the call sites
explaining these asserts encode the current test configuration and must be
updated if epoch/backfill/partitioning logic changes; update the assertion
messages to reflect the new comparison (">=") so failure output remains clear.
| pub async fn craft_data_poa_solution_from_tx( | ||
| node: &IrysNodeTest<IrysNodeCtx>, | ||
| tx: &DataTransaction, | ||
| partition_hash: H256, | ||
| miner_addr: IrysAddress, | ||
| tx_ledger_offset: Option<u64>, | ||
| ) -> eyre::Result<SolutionContext> { | ||
| // Compute slot_index from parent epoch snapshot (prev block) | ||
| let head_height = node.get_canonical_chain_height().await; | ||
| let block = node.get_block_by_height(head_height).await?; | ||
| let parent_epoch_snapshot = node | ||
| .node_ctx | ||
| .block_tree_guard | ||
| .read() | ||
| .get_epoch_snapshot(&block.block_hash) | ||
| .ok_or_else(|| eyre!("parent epoch snapshot not found"))?; | ||
| let pa = parent_epoch_snapshot | ||
| .partition_assignments | ||
| .get_assignment(partition_hash) | ||
| .ok_or_else(|| eyre!("partition assignment missing for {}", partition_hash))?; | ||
| let slot_index = | ||
| pa.slot_index | ||
| .ok_or_else(|| eyre!("slot index missing for {}", partition_hash))? as u64; | ||
|
|
||
| let slot_start = slot_index | ||
| * node.node_ctx.config.consensus.num_partitions_per_slot | ||
| * node.node_ctx.config.consensus.num_chunks_in_partition; | ||
|
|
||
| let chunks_per_partition = node.node_ctx.config.consensus.num_chunks_in_partition; | ||
|
|
||
| // Determine the ledger offset for the tx's first chunk: | ||
| let chunk_ledger_offset = | ||
| tx_ledger_offset.unwrap_or_else(|| block.data_ledgers[DataLedger::Submit].total_chunks); | ||
|
|
||
| // Compute partition_chunk_offset from ledger_chunk_offset: | ||
| // ledger_chunk_offset = slot_start + partition_chunk_offset | ||
| // partition_chunk_offset = ledger_chunk_offset - slot_start | ||
| // | ||
| // IMPORTANT: The tx's position must be within this partition's slot range. | ||
| let slot_end = slot_start + chunks_per_partition; | ||
| eyre::ensure!( | ||
| chunk_ledger_offset >= slot_start && chunk_ledger_offset < slot_end, | ||
| "chunk ledger offset {} is outside partition slot range [{}, {}) for partition {} (slot_index={})", | ||
| chunk_ledger_offset, | ||
| slot_start, | ||
| slot_end, | ||
| partition_hash, | ||
| slot_index | ||
| ); | ||
| let partition_chunk_offset_u64 = chunk_ledger_offset - slot_start; | ||
| let partition_chunk_offset: u32 = u32::try_from(partition_chunk_offset_u64).map_err(|_| { | ||
| eyre!( | ||
| "partition_chunk_offset {} doesn't fit u32", | ||
| partition_chunk_offset_u64 | ||
| ) | ||
| })?; | ||
|
|
||
| // tx_path for a single-tx block |
There was a problem hiding this comment.
Clarify/strengthen assumptions around tx_ledger_offset and slot window
craft_data_poa_solution_from_tx assumes that tx_ledger_offset (or block.data_ledgers[Submit].total_chunks when None) is exactly the ledger chunk offset of the tx’s first chunk within the chosen partition’s slot:
let slot_start = slot_index * num_partitions_per_slot * num_chunks_in_partition;
let chunk_ledger_offset =
tx_ledger_offset.unwrap_or_else(|| block.data_ledgers[DataLedger::Submit].total_chunks);
let slot_end = slot_start + chunks_per_partition;
ensure!(
chunk_ledger_offset >= slot_start && chunk_ledger_offset < slot_end,
"chunk ledger offset {} is outside partition slot range [...]",
...
);Two points to tighten:
-
Callers must pass the actual ledger offset. For mempool txs you’re assuming they will be placed exactly at
prev_total_chunks; if the packing/ordering logic changes (or other txs are inserted ahead), this invariant breaks and the helper silently fails. It’s worth:- Documenting this requirement explicitly in the function docs, and/or
- Adding a helper that derives the ledger offset from
BlockIndexor the inclusion block post-factum, then calling this helper with a precise offset.
-
Explicitly enforce
partition_chunk_offset < num_chunks_in_partition. You currently rely on the inequality above plus subtraction, but a direct assert/ensure on the derivedpartition_chunk_offset_u64being< chunks_per_partitionmakes the intent clear and protects against misconfiguredtx_ledger_offset.
This will make failures easier to diagnose and decouple the helper from fragile assumptions about block-construction order.
🤖 Prompt for AI Agents
In crates/chain/tests/utils.rs around lines 88 to 145, the function assumes
tx_ledger_offset (or block.data_ledgers[Submit].total_chunks when None) exactly
equals the ledger chunk offset of the tx’s first chunk in the partition slot and
only implicitly checks the slot range; update the code and docs to (1) document
in the function comment that callers must supply the precise ledger chunk offset
for the tx (or call a new helper to derive it from a BlockIndex/inclusion block)
so callers don’t rely on the fragile "prev_total_chunks" assumption, and (2) add
an explicit eyre::ensure! check after computing partition_chunk_offset_u64 that
partition_chunk_offset_u64 < chunks_per_partition (with a clear error message)
before converting to u32 to make the intent and invariant explicit and fail fast
on misconfiguration.
| // Ensure recall range alignment for the current VDF step and partition. | ||
| let num_chunks_in_partition = node.node_ctx.config.consensus.num_chunks_in_partition; | ||
| let num_chunks_in_recall_range = node.node_ctx.config.consensus.num_chunks_in_recall_range; | ||
| let num_recall_ranges_in_partition = | ||
| (num_chunks_in_partition / num_chunks_in_recall_range) as usize; | ||
| // Compute VDF reset window start and collect recall steps for reconstruction | ||
| let num_rr_u64 = num_chunks_in_partition.div_ceil(num_chunks_in_recall_range); | ||
| let recall_reset_step = if current_step == 0 { | ||
| 0 | ||
| } else { | ||
| ((current_step - 1) / num_rr_u64) * num_rr_u64 + 1 | ||
| }; | ||
| let recall_steps = vdf_steps_guard | ||
| .read() | ||
| .get_steps(ii(recall_reset_step, current_step))?; | ||
| let recall_index = (u64::from(partition_chunk_offset) / num_chunks_in_recall_range) as usize; | ||
|
|
||
| eyre::ensure!( | ||
| recall_index < num_recall_ranges_in_partition, | ||
| "recall_index {} out of bounds [0, {}) (partition_chunk_offset={}, num_chunks_in_recall_range={}, num_chunks_in_partition={})", | ||
| recall_index, | ||
| num_recall_ranges_in_partition, | ||
| partition_chunk_offset, | ||
| num_chunks_in_recall_range, | ||
| num_chunks_in_partition | ||
| ); | ||
|
|
||
| // Use the efficient-sampling library to compute the expected recall range | ||
| let expected_recall_index = irys_efficient_sampling::get_recall_range( | ||
| num_recall_ranges_in_partition, | ||
| &recall_steps, | ||
| &partition_hash, | ||
| )?; | ||
|
|
||
| // Verify our chunk is in the expected recall range (can't snap - that would change which chunk we're proving) | ||
| eyre::ensure!( | ||
| recall_index == expected_recall_index, | ||
| "recall range mismatch: chunk at recall_index {} but VDF expects {} (partition_chunk_offset={}, vdf_step={})", | ||
| recall_index, | ||
| expected_recall_index, | ||
| partition_chunk_offset, | ||
| current_step | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '180,250p'Repository: Irys-xyz/irys
Length of output: 3430
🏁 Script executed:
rg -n "solution_context_with_poa_chunk|solution_context|capacity_chunk_solution" crates/chain/tests/utils.rs -A 5Repository: Irys-xyz/irys
Length of output: 1472
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '3175,3220p'Repository: Irys-xyz/irys
Length of output: 2085
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '320,380p'Repository: Irys-xyz/irys
Length of output: 2633
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '378,450p'Repository: Irys-xyz/irys
Length of output: 3655
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '160,200p'Repository: Irys-xyz/irys
Length of output: 1917
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '120,165p'Repository: Irys-xyz/irys
Length of output: 2189
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '70,125p'Repository: Irys-xyz/irys
Length of output: 2749
🏁 Script executed:
cat -n crates/chain/tests/utils.rs | sed -n '3220,3280p'Repository: Irys-xyz/irys
Length of output: 2590
🏁 Script executed:
rg -n "craft_data_poa_solution_from_tx" crates/chain/tests/ -B 3 -A 3Repository: Irys-xyz/irys
Length of output: 1940
🏁 Script executed:
cat -n crates/chain/tests/block_production/data_poa_boundary_acceptance.rs | sed -n '140,180p'Repository: Irys-xyz/irys
Length of output: 1745
🏁 Script executed:
python3 << 'EOF'
# Simulate the scenario where num_chunks_in_partition is not an exact multiple
num_chunks_in_partition = 10
num_chunks_in_recall_range = 3
# Current code (floor division)
num_recall_ranges_floor = (num_chunks_in_partition // num_chunks_in_recall_range)
print(f"Floor division: {num_recall_ranges_floor}")
# Correct code (div_ceil)
import math
num_recall_ranges_ceil = math.ceil(num_chunks_in_partition / num_chunks_in_recall_range)
print(f"Ceil division: {num_recall_ranges_ceil}")
# Test what happens with chunk offsets
print("\nChunk offset analysis:")
for offset in range(num_chunks_in_partition):
recall_index = offset // num_chunks_in_recall_range
print(f" offset {offset}: recall_index={recall_index}, valid_with_floor={recall_index < num_recall_ranges_floor}, valid_with_ceil={recall_index < num_recall_ranges_ceil}")
EOFRepository: Irys-xyz/irys
Length of output: 143
🏁 Script executed:
cat << 'EOF'
# Scenario: num_chunks_in_partition = 10, num_chunks_in_recall_range = 3
# Floor division: 10 / 3 = 3
# Ceil division: (10 + 3 - 1) / 3 = 12 / 3 = 4
#
# With floor division (3 ranges):
# - Range 0: chunks 0-2
# - Range 1: chunks 3-5
# - Range 2: chunks 6-8
# - Chunk 9 would have index 3, but 3 is not < 3, so it fails bounds check
#
# With ceil division (4 ranges):
# - Range 0: chunks 0-2
# - Range 1: chunks 3-5
# - Range 2: chunks 6-8
# - Range 3: chunks 9
# - Chunk 9 would have index 3, and 3 < 4, so it passes
echo "Math check: 10 / 3 = 3 (floor), ceil(10/3) = 4"
echo "Last chunk (9) maps to recall_index = 9/3 = 3"
echo "With floor (3): 3 < 3 is FALSE (rejected)"
echo "With ceil (4): 3 < 4 is TRUE (accepted)"
EOFRepository: Irys-xyz/irys
Length of output: 795
🏁 Script executed:
rg -n "get_recall_range" crates/ --type rust | grep -E "fn|pub" | head -20Repository: Irys-xyz/irys
Length of output: 418
🏁 Script executed:
cat -n crates/actors/src/block_validation.rs | sed -n '1035,1100p'Repository: Irys-xyz/irys
Length of output: 3012
🏁 Script executed:
cat -n crates/efficient-sampling/src/lib.rs | sed -n '1,100p'Repository: Irys-xyz/irys
Length of output: 5056
Fix recall-range count to use div_ceil
The recall-window setup has a critical bounds-checking bug:
let num_chunks_in_partition = ...;
let num_chunks_in_recall_range = ...;
let num_recall_ranges_in_partition =
(num_chunks_in_partition / num_chunks_in_recall_range) as usize;
// ...
let recall_index = (u64::from(partition_chunk_offset) / num_chunks_in_recall_range) as usize;
ensure!(
recall_index < num_recall_ranges_in_partition,
...
);When num_chunks_in_partition is not an exact multiple of num_chunks_in_recall_range, floor division undercounts the number of ranges. For example, with 10 chunks and 3 chunks per range: floor division gives 3 ranges (indices 0–2), but chunks 0–8 map to indices 0–2, while chunk 9 maps to index 3, which then fails the bounds check as out of bounds.
Fix by using div_ceil:
let num_recall_ranges_in_partition =
num_chunks_in_partition.div_ceil(num_chunks_in_recall_range) as usize;Additionally, this helper assumes the VDF is already running and that get_steps() will succeed. Unlike solution_context_with_poa_chunk which explicitly calls node_ctx.start_vdf(), this function relies on callers to have initialized the VDF separately. This should either be started within the helper for consistency or clearly documented so callers understand the requirement.
🤖 Prompt for AI Agents
crates/chain/tests/utils.rs lines ~196-238: the calculation of
num_recall_ranges_in_partition uses floor division which undercounts ranges when
partition size is not an exact multiple of the recall-range size; change the
calculation to use div_ceil on num_chunks_in_partition with
num_chunks_in_recall_range (and cast to usize) so the bounds check is correct;
additionally, either start the VDF unconditionally in this helper (call the
existing node_ctx.start_vdf() before get_steps()) or document in the function
comment that the VDF must be started by callers, so get_steps() cannot fail
unexpectedly.
| async fn deterministic_boundary_poa_crafted_snapshot() -> eyre::Result<()> { | ||
| const CHUNK_SIZE: u64 = 32; | ||
| // Synthetic timestamp for test blocks (Nov 2023, in milliseconds) | ||
| const TEST_TIMESTAMP_MS: u128 = 1_700_000_000_000; | ||
|
|
||
| let mut node_config = NodeConfig::testing(); | ||
| node_config.consensus.get_mut().block_migration_depth = 1; | ||
| node_config.consensus.get_mut().chunk_size = CHUNK_SIZE; | ||
| node_config.consensus.get_mut().num_chunks_in_partition = 8; | ||
| node_config.consensus.get_mut().num_partitions_per_slot = 1; | ||
| node_config.consensus.get_mut().epoch.num_blocks_in_epoch = 3; | ||
|
|
||
| let ( | ||
| tmp_dir, | ||
| block_index, | ||
| block_index_tx, | ||
| parent_epoch_snapshot, | ||
| miner_address, | ||
| consensus, | ||
| genesis_hash, | ||
| _block_index_handle, | ||
| ) = init_min_context(&node_config).await; | ||
|
|
||
| let partition_hash = parent_epoch_snapshot | ||
| .ledgers | ||
| .get_slots(DataLedger::Submit) | ||
| .first() | ||
| .and_then(|slot| slot.partitions.first().copied()) | ||
| .expect("submit ledger slot 0 should exist with a partition"); | ||
|
|
||
| let signer = IrysSigner::random_signer(&consensus); | ||
| let data = vec![0xAB; CHUNK_SIZE as usize]; | ||
| let tx = { | ||
| let tx_unsigned = signer | ||
| .create_transaction(data.clone(), H256::zero()) | ||
| .expect("create tx"); | ||
| signer.sign_transaction(tx_unsigned).expect("sign tx") | ||
| }; | ||
| let tx_header: DataTransactionHeader = tx.header.clone(); | ||
| assert_eq!( | ||
| tx.proofs.len(), | ||
| 1, | ||
| "1-chunk tx should have exactly one data proof" | ||
| ); | ||
|
|
||
| let tx_headers = vec![tx_header.clone()]; | ||
| let (tx_root, tx_paths) = DataTransactionLedger::merklize_tx_root(&tx_headers); | ||
| let tx_path = tx_paths[0].proof.clone(); | ||
| let data_path = if let Some(p) = tx.proofs.first() { | ||
| p.proof.clone() | ||
| } else { | ||
| tx_path.clone() | ||
| }; | ||
|
|
||
| let mut partition_chunk_offset: u32 = 0; | ||
|
|
||
| let parent_slot_index = parent_epoch_snapshot | ||
| .partition_assignments | ||
| .get_assignment(partition_hash) | ||
| .and_then(|pa| pa.slot_index) | ||
| .unwrap_or(0) as u64; | ||
|
|
||
| let ledger_chunk_offset = | ||
| parent_slot_index * consensus.num_partitions_per_slot * consensus.num_chunks_in_partition | ||
| + partition_chunk_offset as u64; | ||
|
|
||
| let prelude_size_bytes = (ledger_chunk_offset as usize) * (CHUNK_SIZE as usize); | ||
| let prelude_tx_header = if prelude_size_bytes > 0 { | ||
| let prelude_data = vec![0xCD; prelude_size_bytes]; | ||
| let pre_tx = signer | ||
| .create_transaction(prelude_data, H256::zero()) | ||
| .expect("create prelude tx"); | ||
| let pre_tx = signer.sign_transaction(pre_tx).expect("sign prelude tx"); | ||
| Some(pre_tx.header) | ||
| } else { | ||
| None | ||
| }; | ||
|
|
||
| let prelude_headers: Vec<DataTransactionHeader> = | ||
| prelude_tx_header.clone().into_iter().collect(); | ||
| let (prelude_tx_root, _prelude_paths) = | ||
| DataTransactionLedger::merklize_tx_root(&prelude_headers); | ||
|
|
||
| let prelude_publish_ledger = DataTransactionLedger { | ||
| ledger_id: DataLedger::Publish.into(), | ||
| tx_root: H256::zero(), | ||
| tx_ids: H256List(Vec::new()), | ||
| total_chunks: 0, | ||
| expires: None, | ||
| proofs: None, | ||
| required_proof_count: Some(1), | ||
| }; | ||
| let prelude_submit_ledger = DataTransactionLedger { | ||
| ledger_id: DataLedger::Submit.into(), | ||
| tx_root: prelude_tx_root, | ||
| tx_ids: H256List(prelude_tx_header.into_iter().map(|h| h.id).collect()), | ||
| total_chunks: ledger_chunk_offset, | ||
| expires: Some(41_000_000), | ||
| proofs: None, | ||
| required_proof_count: None, | ||
| }; | ||
| let prelude_block_hash = H256::from_slice(&[1_u8; 32]); | ||
| let mut prelude_block_v = IrysBlockHeader::default(); | ||
| { | ||
| let inner = &mut *prelude_block_v; | ||
| inner.height = 1; | ||
| inner.reward_address = miner_address; | ||
| inner.poa = irys_types::PoaData { | ||
| tx_path: None, | ||
| data_path: None, | ||
| chunk: None, | ||
| ledger_id: None, | ||
| partition_chunk_offset: 0, | ||
| partition_hash, | ||
| }; | ||
| inner.block_hash = prelude_block_hash; | ||
| inner.previous_block_hash = genesis_hash; | ||
| inner.previous_cumulative_diff = U256::from(4000); | ||
| inner.miner_address = miner_address; | ||
| inner.signature = Signature::test_signature().into(); | ||
| inner.timestamp = TEST_TIMESTAMP_MS.saturating_sub(1).into(); | ||
| inner.data_ledgers = vec![prelude_publish_ledger, prelude_submit_ledger]; | ||
| } | ||
| let prelude_block = Arc::new(prelude_block_v); | ||
|
|
||
| { | ||
| let block_height_pre_migration = { | ||
| let block_index_guard = BlockIndexReadGuard::new(block_index.clone()); | ||
| let bi = block_index_guard.read(); | ||
| bi.num_blocks().saturating_sub(1) | ||
| }; | ||
| let prelude_all_txs = Arc::new(prelude_headers.clone()); | ||
| let (tx_prelude, rx_prelude) = tokio::sync::oneshot::channel(); | ||
| block_index_tx | ||
| .send(BlockIndexServiceMessage::MigrateBlock { | ||
| block_header: prelude_block.clone(), | ||
| all_txs: prelude_all_txs, | ||
| response: tx_prelude, | ||
| }) | ||
| .expect("send migrate prelude block"); | ||
| rx_prelude | ||
| .await | ||
| .expect("receive migration result (prelude)") | ||
| .expect("migrate prelude block"); | ||
|
|
||
| let start = std::time::Instant::now(); | ||
| let timeout = std::time::Duration::from_secs(10); | ||
| loop { | ||
| let current_height = { | ||
| let block_index_guard = BlockIndexReadGuard::new(block_index.clone()); | ||
| let bi = block_index_guard.read(); | ||
| bi.num_blocks().saturating_sub(1) | ||
| }; | ||
| if current_height != block_height_pre_migration { | ||
| break; | ||
| } | ||
| if start.elapsed() > timeout { | ||
| panic!( | ||
| "Timed out waiting for prelude block migration after {:?}", | ||
| timeout | ||
| ); | ||
| } | ||
| tokio::time::sleep(tokio::time::Duration::from_millis(200)).await; | ||
| } | ||
| } | ||
|
|
||
| let prev_total = { | ||
| let prelude_block_index_guard = BlockIndexReadGuard::new(block_index.clone()); | ||
| let prelude_index = prelude_block_index_guard.read(); | ||
| let latest_height = prelude_index.latest_height(); | ||
| let latest_item = prelude_index | ||
| .get_item(latest_height) | ||
| .expect("latest block index item should exist"); | ||
| latest_item.ledgers[DataLedger::Submit as usize].total_chunks | ||
| }; | ||
|
|
||
| let slot_start = | ||
| parent_slot_index * consensus.num_partitions_per_slot * consensus.num_chunks_in_partition; | ||
| assert!( | ||
| prev_total == slot_start, | ||
| "prelude did not add expected chunks: prev_total={} slot_start={}", | ||
| prev_total, | ||
| slot_start | ||
| ); | ||
| partition_chunk_offset = 0; | ||
|
|
||
| let mut poa_chunk: Vec<u8> = data.clone(); | ||
| { | ||
| let mut entropy_chunk = Vec::<u8>::with_capacity(CHUNK_SIZE as usize); | ||
| compute_entropy_chunk( | ||
| miner_address, | ||
| partition_chunk_offset as u64, | ||
| partition_hash.into(), | ||
| consensus.entropy_packing_iterations, | ||
| CHUNK_SIZE as usize, | ||
| &mut entropy_chunk, | ||
| consensus.chain_id, | ||
| ); | ||
| xor_vec_u8_arrays_in_place(&mut poa_chunk, &entropy_chunk); | ||
| } | ||
|
|
||
| let submit_ledger = DataTransactionLedger { | ||
| ledger_id: DataLedger::Submit.into(), | ||
| tx_root, | ||
| tx_ids: H256List(vec![tx_header.id]), | ||
| total_chunks: slot_start + 1, | ||
| expires: Some(42_000_000), | ||
| proofs: None, | ||
| required_proof_count: None, | ||
| }; | ||
| let publish_ledger = DataTransactionLedger { | ||
| ledger_id: DataLedger::Publish.into(), | ||
| tx_root: H256::zero(), | ||
| tx_ids: H256List(Vec::new()), | ||
| total_chunks: 0, | ||
| expires: None, | ||
| proofs: None, | ||
| required_proof_count: Some(1), | ||
| }; | ||
| let mut synthetic_block_v = IrysBlockHeader::default(); | ||
| { | ||
| let inner = &mut *synthetic_block_v; | ||
| inner.height = 2; | ||
| inner.reward_address = miner_address; | ||
| inner.poa = irys_types::PoaData { | ||
| tx_path: Some(Base64(tx_path)), | ||
| data_path: Some(Base64(data_path)), | ||
| chunk: Some(Base64(poa_chunk.clone())), | ||
| ledger_id: Some(DataLedger::Submit.into()), | ||
| partition_chunk_offset, | ||
| partition_hash, | ||
| }; | ||
| inner.block_hash = H256::from_slice(&[2_u8; 32]); | ||
| inner.previous_block_hash = prelude_block_hash; | ||
| inner.previous_cumulative_diff = U256::from(4000); | ||
| inner.miner_address = miner_address; | ||
| inner.signature = Signature::test_signature().into(); | ||
| inner.timestamp = TEST_TIMESTAMP_MS.into(); | ||
| inner.data_ledgers = vec![publish_ledger, submit_ledger]; | ||
| } | ||
| let synthetic_block = Arc::new(synthetic_block_v); | ||
|
|
||
| let txs = Arc::new(vec![tx_header.clone()]); | ||
| let (tx_migrate, rx_migrate) = tokio::sync::oneshot::channel(); | ||
| block_index_tx | ||
| .send(BlockIndexServiceMessage::MigrateBlock { | ||
| block_header: synthetic_block.clone(), | ||
| all_txs: txs.clone(), | ||
| response: tx_migrate, | ||
| }) | ||
| .expect("send migrate block"); | ||
| rx_migrate | ||
| .await | ||
| .expect("receive migration result") | ||
| .expect("migrate synthetic block"); | ||
|
|
||
| let block_index_guard = BlockIndexReadGuard::new(block_index.clone()); | ||
|
|
||
| poa_is_valid( | ||
| &synthetic_block.poa, | ||
| &block_index_guard, | ||
| &parent_epoch_snapshot, | ||
| &consensus, | ||
| &miner_address, | ||
| ) | ||
| .expect("Parent epoch snapshot PoA should be valid for crafted block"); | ||
|
|
||
| // Build child-like snapshot with different slot index to simulate epoch-boundary slot movement | ||
| let mut child_epoch_snapshot = parent_epoch_snapshot.clone(); | ||
| let mut pa = child_epoch_snapshot | ||
| .partition_assignments | ||
| .get_assignment(partition_hash) | ||
| .expect("partition assignment should exist for partition_hash"); | ||
| pa.slot_index = Some(pa.slot_index.unwrap_or(0).wrapping_add(1)); | ||
| child_epoch_snapshot | ||
| .partition_assignments | ||
| .data_partitions | ||
| .insert(partition_hash, pa); | ||
|
|
||
| // Child-like snapshot PoA validation MUST fail | ||
| match poa_is_valid( | ||
| &synthetic_block.poa, | ||
| &block_index_guard, | ||
| &child_epoch_snapshot, | ||
| &consensus, | ||
| &miner_address, | ||
| ) { | ||
| Err( | ||
| PreValidationError::MerkleProofInvalid(_) | ||
| | PreValidationError::PoAChunkOffsetOutOfBlockBounds | ||
| | PreValidationError::BlockBoundsLookupError(_), | ||
| ) => { | ||
| info!("Child-like snapshot rejected PoA as expected due to slot change/out-of-bounds"); | ||
| } | ||
| Ok(()) => panic!("Child-like snapshot unexpectedly validated crafted PoA"), | ||
| Err(e) => panic!("Unexpected child-like PoA error: {:?}", e), | ||
| } | ||
|
|
||
| drop(block_index); | ||
| drop(tmp_dir); | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Deterministic crafted-snapshot PoA test is well-structured and matches the intended invariant
The test:
- Constructs a minimal BlockIndex + EpochSnapshot context.
- Ensures the prelude block advances
total_chunksto exactlyslot_start, then places a single-chunk Submit tx atslot_startin the synthetic block. - Builds
PoaDataconsistent with that layout and proves it valid under the parent snapshot. - Mutates only the partition’s
slot_indexin a cloned snapshot and correctly acceptsMerkleProofInvalid | PoAChunkOffsetOutOfBlockBounds | BlockBoundsLookupErroras expected child-invalid outcomes.
This is a solid, deterministic guard against regressions in snapshot selection. If you want to future-proof further, you could assert that the selected partition’s slot_index is initially non-None and document that this test assumes at least one Submit partition exists in the genesis snapshot.
🤖 Prompt for AI Agents
In crates/chain/tests/validation/deterministic_boundary_poa_crafted_snapshot.rs
around lines 30 to 331, add an explicit assertion that the selected partition
assignment has a non-None slot_index and that the parent_epoch_snapshot contains
at least one Submit ledger slot/partition before using it; if either
precondition fails, fail the test with a clear message. Modify the setup so you
retrieve the partition assignment, assert pa.slot_index.is_some() and that
parent_epoch_snapshot.ledgers.get_slots(DataLedger::Submit).first() exists (and
contains a partition), then proceed — this documents and enforces the test’s
assumption and prevents silent panics if fixtures change.
| // Derive slot_index and partition_chunk_offset from the parent snapshot and block | ||
| let parent_assignment = parent_epoch_snapshot | ||
| .partition_assignments | ||
| .get_assignment(partition_hash) | ||
| .expect("partition assignment must exist in parent snapshot"); | ||
| let slot_index = parent_assignment | ||
| .slot_index | ||
| .expect("slot_index must exist for data partition") as u64; | ||
| let cpp = node.node_ctx.config.consensus.num_chunks_in_partition; | ||
| let npps = node.node_ctx.config.consensus.num_partitions_per_slot; | ||
| let slot_start = slot_index * npps * cpp; | ||
| let prev_total_chunks = parent_epoch_block.data_ledgers[DataLedger::Submit].total_chunks; | ||
| assert!( | ||
| prev_total_chunks >= slot_start, | ||
| "prev_total_chunks {} before slot_start {}", | ||
| prev_total_chunks, | ||
| slot_start | ||
| ); | ||
| let partition_chunk_offset_u64 = prev_total_chunks - slot_start; | ||
| let partition_chunk_offset: u32 = partition_chunk_offset_u64 as u32; | ||
|
|
There was a problem hiding this comment.
Guard partition_chunk_offset against exceeding the partition’s slot range
Here you only assert prev_total_chunks >= slot_start and then derive:
let partition_chunk_offset_u64 = prev_total_chunks - slot_start;
let partition_chunk_offset: u32 = partition_chunk_offset_u64 as u32;If prev_total_chunks ever reaches or exceeds slot_start + num_chunks_in_partition, partition_chunk_offset will point past the partition’s slot window, making the PoA invalid even under the parent snapshot (or silently truncating via as u32 if this code changes).
Suggest tightening the invariant to keep the crafted PoA within the intended slot:
let slot_end = slot_start + cpp;
assert!(
prev_total_chunks >= slot_start && prev_total_chunks < slot_end,
"prelude did not add expected chunks: prev_total={} in [{}, {})",
prev_total_chunks,
slot_start,
slot_end,
);
let partition_chunk_offset_u64 = prev_total_chunks - slot_start;
let partition_chunk_offset: u32 = partition_chunk_offset_u64 as u32;This keeps the test robust to future tweaks in how total_chunks is advanced.
🤖 Prompt for AI Agents
In crates/chain/tests/validation/heavy_poa_parent_snapshot_integration.rs around
lines 96 to 116, tighten the invariant that computes partition_chunk_offset so
it cannot point past the partition’s slot: compute slot_end = slot_start + cpp
and replace the existing assert with one that checks prev_total_chunks >=
slot_start && prev_total_chunks < slot_end (including a clear message showing
prev_total_chunks, slot_start, slot_end); then derive partition_chunk_offset_u64
= prev_total_chunks - slot_start and cast to u32 as before. Ensure the assert
message matches the new bounds and keeps the test robust to future changes.
For whomenever may pick this up: the e2e test is not working and I have pushed all my debug. The other two tests look LGTM.
Describe the changes
Three new tests added.
data_poa_boundary_acceptanceis an incomplete end-to-end boundary snapshot smoke test.deterministic_boundary_poa_crafted_snapshotheavy_poa_parent_snapshot_integrationRelated Issue(s)
Original fix: #815
Checklist
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.