Skip to content

feat(collateral): harden dual-state collateral contract and validator integration#372

Draft
itzlambda wants to merge 45 commits intomainfrom
collateral-integration
Draft

feat(collateral): harden dual-state collateral contract and validator integration#372
itzlambda wants to merge 45 commits intomainfrom
collateral-integration

Conversation

@itzlambda
Copy link
Collaborator

@itzlambda itzlambda commented Feb 16, 2026

Summary

This PR hardens and completes Basilica’s collateral integration across:

  1. the upgradeable collateral contract,
  2. validator event ingestion and state persistence, and
  3. collateral tooling used in localnet workflows.

Primary outcomes are stronger reclaim/slash invariants, correct dual-asset state handling (TAO + alpha), and end-to-end SHA-256 evidence consistency.

What Changed

1) Contract hardening and lifecycle correctness (crates/collateral-contract)

  • Strengthened core invariants in CollateralUpgradeable for deposit, reclaim, deny, and slash flows.
  • Fixed edge cases around pending reclaims and slashes so collateral cannot be stranded by lifecycle races.
  • Removed mutable coldkey setter behavior and moved to derived/immutable contract coldkey initialization.
  • Added taoDepositsEnabled and alphaDepositsEnabled toggles.
  • Added minAlphaCollateralIncrease.
  • Standardized evidence fields/checksums to SHA-256 (bytes32) across contract events and calls.
  • Aligned naming for clarity (collaterals -> taoCollaterals, CONTRACT_HOTKEY -> validatorHotkey).
  • Removed stale/legacy contract artifacts and shifted upgrade testing to a dedicated upgrade mock flow.

2) Validator integration and state consistency (crates/basilica-validator)

  • Expanded collateral event handling to cover full lifecycle events (Deposit, ReclaimProcessStarted, Denied, Reclaimed, Slashed).
  • Updated persistence to track TAO and alpha separately, including pending reclaim state and reclaim request linkage.
  • Made per-block collateral event application transactional to avoid partial state application.
  • Updated slash execution path to use full SHA-256 checksum payloads and aligned slash fraction behavior with percentage semantics.
  • Added optional collateral rpc_url override support for validator configuration.

3) CLI/bindings and API surface alignment (collateral-cli + Rust library)

  • Updated Rust bindings and CLI command flow to match the new contract ABI and event model.
  • Reclaim/deny/slash checksum inputs now use SHA-256 naming and types.
  • Query/API naming is aligned to explicit TAO vs alpha collateral reads.
  • Improved network/contract resolution through env-aware CLI defaults.

4) Localnet collateral workflow improvements (scripts/*)

  • Added collateral setup/deploy/integration/e2e scripts for repeatable local testing.
  • Wired collateral deployment into localnet startup flow and validator contract-address patching.
  • Updated localnet bootstrap behavior to better support collateral lifecycle testing.

Behavioral/API Changes to Validate

  • Evidence checksum schema changed to SHA-256 (bytes32) end-to-end.
  • Reclaim/slash behavior now enforces stricter pending-state consistency.
  • ABI/event payloads changed; downstream clients should use updated bindings.
  • Validator collateral config supports rpc_url override.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch collateral-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@itzlambda itzlambda force-pushed the collateral-integration branch 2 times, most recently from bb23d5d to 5a6753e Compare February 16, 2026 13:44
Fix finalizeReclaim to decrement alphaCollateralUnderPendingReclaims,
cap transfers to available balance instead of reverting after slash,
and enforce proper CEI ordering. Fix denyReclaimRequest to recognize
alpha-only reclaims. Add regression tests for partial/full slash
scenarios and pending counter lifecycle.
…reclaims

Check collateralUnderPendingReclaims and alphaCollateralUnderPendingReclaims
before clearing nodeToMiner in both finalizeReclaim and slashCollateral, closing
a theft vector where a new depositor's funds could be drained by stale reclaim
finalizations. Added regression tests for the attack path.
…taking details

Add coldkey/hotkey model, StakingV2 function reference, alpha token
economics, dual-mode collateral state, and delegatecall vs call semantics.
Add uint256[50] storage gap for upgrade safety and guard deposit()
against msg.value == 0 && alphaAmount == 0 to prevent ownership
griefing without collateral.
…s are zero

After a full slash with a pending reclaim, denyReclaimRequest zeroed
pending trackers but never cleared nodeToMiner, permanently locking
the node to the old owner.
@itzlambda itzlambda force-pushed the collateral-integration branch from 56bbbbb to 3931232 Compare February 18, 2026 09:12
The variable holds the validator's hotkey where all alpha collateral
is consolidated, not a generic contract key.
Clarify that withdrawAlpha uses transferStake (changes coldkey ownership
only), not removeStake. Alpha remains staked under VALIDATOR_HOTKEY.
Also fix TAO slash description: sent to address(0), not trustee.
Mainnet whitelist is already disabled (open to all). Localnet
now matches by auto-disabling via sudo during initialization.
- set Foundry solc to 0.8.24 and evm_version to cancun
- align contract/script/test pragmas to ^0.8.24
- fix deploy/query helper script shebang lines
Use OpenZeppelin ReentrancyGuardUpgradeable on collateral lifecycle entrypoints.

Also block constructor-based first-claim bypasses and verify alpha slash transfers to trustee coldkey with regression tests.
@itzlambda itzlambda changed the title fix(collateral): alpha reclaim accounting fixes and validator integration feat(collateral): harden dual-state collateral contract and validator integration Feb 20, 2026
Enforce required env inputs in the Foundry deploy script for trustee/admin/hotkey and numeric params, removing unsafe defaults. Update collateral docs and audit report to reflect trustee-only authority semantics and remediation status.
Refresh generated collateral ABI and embedded bytecode from the current Solidity build, and align miner tests with lowerCamel getter bindings.
Switch trustee-gated paths to TRUSTEE_ROLE, replace custom math with OpenZeppelin Math/SafeCast, and replace string requires with custom errors while keeping current operational flow.
Automate localnet contract bring-up by creating/funding a deployer wallet, deploying proxy+implementation, and generating scripts/collateral/.env.local.

Also replace forge script deployment with deploy.sh/forge create and update collateral/localnet docs and configs.
Overwrite directly instead of creating timestamped .bak copies.
Admin-controlled flags gate TAO and alpha deposits independently,
replacing the old alpha-required-for-ownership constraint.
…osit toggles

Restore _gap to [49] since the contract is undeployed, pass
taoDepositsEnabled/alphaDepositsEnabled through initialize in deploy
scripts, and add test coverage for toggle combinations and
post-disable reclaim/slash flows.
Hardcoded localhost:9944 is unreachable from inside Docker containers.
Migrate from deprecated dot-notation flags and underscore commands to
current kebab-case CLI interface, pin Python 3.12, and improve wallet
balance/list error handling.
Add deploy_collateral step that deploys via setup-localnet-env.sh and
patches validator.toml with the contract address. Gracefully skips when
Foundry is missing. Also fix stop.sh teardown to use all profiles in a
single down call and update subtensor image to latest.
…units

Add `btcli subnet start` as step 6/6 in init-subnet.sh to activate
the emission schedule and enable subtokens/alpha on localnet. Without
this, addStake calls fail with SubtokenDisabled.

Document that all StakingV2 precompile amount parameters use RAO
(1e9 per TAO), not EVM wei (1e18). Passing wei-scale values causes
NotEnoughBalanceToStake reverts. Expand the unit conversion section
to clarify the precompile boundary as the conversion point.
Slashed TAO now goes to the trustee's EVM address instead of
address(0), keeping funds recoverable. Also fixes missing boolean
args in IStakingIntegration test setUp.
Adds end-to-end test script for deposit, reclaim, and slash flows
with a wei comparison utility. Refactors init-subnet.sh to dissolve
genesis subnet 1 and re-register with 10,000 TAO liquidity for
predictable swap rates during testing.
Replace dynamically computed alpha deposit/slash amounts with fixed
constants (4 alpha, 1 alpha) for deterministic test behavior.
Add end-to-end test script that exercises the full collateral pipeline
(on-chain tx -> EVM event -> validator scan -> SQLite persist). Harden
localnet startup by verifying contract code on-chain before reusing a
saved address, and restart the validator when the contract is redeployed.
…pgradeable

Fix factually incorrect comments, add missing @param docs, fix typos,
and remove stale notes so NatSpec accurately describes contract behavior.
Replace unused Ed25519Verify with AddressMapping precompile, use actual
variable names instead of misleading ALL_CAPS constants, remove
non-existent TRUSTEE_COLDKEY reference, and simplify StakingV2 purpose.
Use _deriveOwnerColdkey(trustee) instead of msg.sender so slashed alpha
goes to the same recipient as slashed TAO. Also add
minAlphaCollateralIncrease state variable and clarify alpha denomination
in CLAUDE.md.
Scale slash_fraction by 100 instead of 10,000 to match the expected
percentage-based (not basis-point) representation.
Adds a separate minimum deposit threshold for alpha collateral and
corrects alpha unit conversions from wei (1e18) to RAO (1e9) across
the miner, validator, and contract stack.
Add StakingV2 same-subnet transfer guarantees and no-runtime-slashing
notes to CLAUDE.md. Update deploy defaults to realistic minimums
(0.1 TAO, 5 alpha RAO) and fix initialize calldata in deployment guide.
@itzlambda itzlambda force-pushed the collateral-integration branch from b9cbd4e to 4a47f3f Compare February 26, 2026 09:38
Copy link
Member

@epappas epappas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #372 Review: feat(collateral): harden dual-state collateral contract and validator integration

Summary

This PR is a substantial change (+5904/-4305 across 51 files) that hardens Basilica's collateral integration across three layers: (1) the upgradeable Solidity collateral contract with dual-asset TAO + alpha support, (2) the Rust-side validator event ingestion and state persistence, and (3) localnet tooling and deployment scripts.

Three specialized review agents analyzed the PR in parallel: blockchain-developer (smart contract security and correctness), rust-engineer (Rust code quality and architecture), and security-engineer (cross-cutting security analysis and threat modeling).


Issues Found

CRITICAL Issues (Must Fix Before Merge)

C-1: Both taoEnabled and alphaEnabled can be set simultaneously, causing double-spending of msg.value

Classification: Smart Contract Fund Safety Vulnerability
File: crates/collateral-contract/src/CollateralUpgradeable.sol, lines 218-248 (setters) and 280-340 (deposit)

setTaoEnabled and setAlphaEnabled are independent admin toggles with no mutual exclusion. If an admin enables both:

  1. deposit() at line 297 records d.taoAmount += msg.value (ETH stays in contract)
  2. deposit() at line 318 calls stakingPrecompile.addStake{value: msg.value}(hotkey, netuid) (sends the same ETH to staking)

The same msg.value is counted as both TAO collateral AND alpha collateral, effectively draining the contract's ETH pool.

Impact: Contract insolvency. If Miner A deposits 1 ETH (TAO-only), then admin enables both modes, then Miner B deposits 1 ETH (both modes active) -- Miner B's addStake sends the ETH to staking, and the contract now has 0 ETH but tracks 2 ETH in TAO deposits.

Proposed Fix: Add mutual exclusion to the setter functions:

function setTaoEnabled(bool _enabled) external onlyRole(DEFAULT_ADMIN_ROLE) {
    if (_enabled) require(!alphaEnabled, "Cannot enable TAO while alpha is enabled");
    taoEnabled = _enabled;
    emit TaoEnabledUpdated(_enabled);
}

C-2: pendingReclaimCount not decremented on full slash when reclaim was pending

Classification: State Inconsistency Bug
File: crates/collateral-contract/src/CollateralUpgradeable.sol, lines 740-755

When a node with a pending reclaim is fully slashed, the _slash function clears d.reclaimRequest = ReclaimStatus.NONE on line 749 but does NOT decrement pendingReclaimCount[miner]. Compare with denyReclaim (line 599) and _finalizeReclaim (line 523) which both correctly decrement.

Impact: pendingReclaimCount becomes permanently inflated. The getPendingReclaimCount view function returns incorrect data to off-chain systems (validator, CLI).

Proposed Fix:

// In _slash, before clearing reclaimRequest:
if (d.reclaimRequest == ReclaimStatus.PENDING) {
    pendingReclaimCount[miner] -= 1;
}
d.reclaimRequest = ReclaimStatus.NONE;

C-3: nodeToMiner mapping not cleared on full slash, preventing node re-use

Classification: State Inconsistency / Denial of Service
File: crates/collateral-contract/src/CollateralUpgradeable.sol, lines 740-755

After a 100% slash, taoAmount and alphaAmount are zero, but nodeToMiner[node] retains the old miner address. The deposit function (line 269) checks require(nodeToMiner[node] == address(0) || nodeToMiner[node] == msg.sender), so no other miner can deposit for that node. The only recovery is a manual clearNodeMinerMapping call by the trustee.

Impact: Slashed nodes become permanently unusable by other miners without manual intervention.

Proposed Fix:

// In _slash, after zeroing amounts:
if (d.taoAmount == 0 && d.alphaAmount == 0) {
    delete nodeToMiner[node];
}

C-4: Collateral event scan loop has no graceful shutdown mechanism

Classification: Operational Reliability
File: crates/basilica-validator/src/collateral/collateral_scan.rs, lines 28-41

The start() method runs an infinite loop inside tokio::select! but has no CancellationToken. This means the event scan loop can never be gracefully stopped. The project's established pattern (used in rental monitoring and billing telemetry) uses CancellationToken for shutdown.

Impact: Process termination could interrupt a partially-committed block scan. The loop never exits cleanly.

Proposed Fix: Accept a CancellationToken and add a cancellation branch:

pub async fn start(&mut self, cancel: CancellationToken) -> Result<()> {
    loop {
        tokio::select! {
            _ = cancel.cancelled() => return Ok(()),
            _ = interval.tick() => { /* scan */ }
        }
    }
}

HIGH Severity Issues (Strongly Advised to Fix Before Merge)

H-1: Trustee key compromise enables complete fund theft (no on-chain rate limiting)

Classification: Security Architecture
File: crates/collateral-contract/src/CollateralUpgradeable.sol, lines 508-556

TRUSTEE_ROLE has unrestricted ability to slash 100% of every miner's collateral (TAO + alpha) and direct it to the trustee address. The only safeguard is the off-chain SlashRateLimiter in the validator Rust code, which does not protect against direct contract interaction.

Impact: Total loss of all deposited collateral if trustee key is compromised.

Recommendation: Consider on-chain rate limiting (max slash per block/timeframe), multi-sig, or timelock for slashes above a threshold. At minimum, document key custody requirements.


H-2: requestReclaim clears nodeToMiner before finalization, enabling slash-evasion

Classification: Smart Contract Logic Vulnerability
File: crates/collateral-contract/src/CollateralUpgradeable.sol, line 435

requestReclaim deletes nodeToMiner[node] immediately upon request (before finalization). Another miner can immediately deposit for the same node. The slash function looks up the miner via nodeToMiner, so it would target the new miner instead of the misbehaving one. A malicious miner could request reclaim + have an accomplice deposit to shield from slashing.

Impact: Wrong miner gets slashed; enables slash evasion.

Proposed Fix: Keep nodeToMiner until reclaim is finalized, or add a node-level lock during pending reclaim.


H-3: receive() function allows arbitrary ETH deposits without tracking

Classification: Fund Stranding
File: crates/collateral-contract/src/CollateralUpgradeable.sol, line 947

The receive() external payable {} function accepts any ETH without accounting. While necessary for removeStake precompile returns, anyone can send ETH to the contract with no recovery mechanism.

Proposed Fix: Add an admin sweep function for excess ETH, or restrict receive() to only accept from the staking precompile.


H-4: No event emission for setNodeMinerMapping and clearNodeMinerMapping

Classification: Auditability Gap
File: crates/collateral-contract/src/CollateralUpgradeable.sol, lines 780-812

These functions modify the critical nodeToMiner mapping but do not emit events. All other state-mutating functions emit events. The validator's off-chain event scanner cannot track these changes.


H-5: MAX_BLOCKS_PER_SCAN defined but never enforced in scanning

Classification: Operational Reliability
File: crates/collateral-contract/src/config.rs:21 and crates/basilica-validator/src/collateral/collateral_scan.rs:57-58

MAX_BLOCKS_PER_SCAN = 1000 is defined but scan_events() queries from from_block to current_block in a single RPC call with no chunking. On first run (from block 0) or after extended downtime, this will timeout or be rejected by the RPC node.

Proposed Fix: Implement block range chunking in the scan loop.


H-6: compute_slash_amount has precision edge case -- small fractions cause 100% slash

Classification: Financial Calculation Bug
File: crates/basilica-validator/src/collateral/slash_executor.rs, lines 281-298

The slash amount computation rounds to integer percentage. A slash_fraction of 0.004 (0.4%) would compute numerator = 0, and the function returns full collateral (100% slash). For slash_fraction = 0.10 with collateral = 1 RAO: 1 * 10 / 100 = 0, so it returns 1 (100% slash instead of 10%).

Proposed Fix: Use basis points (10000) instead of percentage (100) for better precision.


MEDIUM Severity Issues

M-1: Deposit after reclaim request inflates returned amount

File: CollateralUpgradeable.sol, lines 254-380

A miner can deposit more after requesting reclaim; _finalizeReclaim returns the FULL balance including post-request deposits (d.taoAmount). The trustee cannot predict the exact payout.

Proposed Fix: Block deposits while reclaim is pending, or snapshot amounts at request time.


M-2: Network string parsing duplicated with inconsistent fallback behavior

Files: collateral_scan.rs:46-51 (falls through to Mainnet silently) vs slash_executor.rs:596-610 (returns error)

A config typo could cause the scanner to connect to mainnet while the slash executor errors out. Extract a shared parse_network() function.


M-3: get_collateral_amount() returns TAO despite API name suggesting generic collateral

File: collateral_persistence.rs:178-187

Confusing API could lead callers to use TAO for eligibility checks when alpha should be used. The wrapper get_tao_collateral_amount() delegates to this same method, making one redundant.


M-4: default_shadow_mode() returns false -- live slashing by default

File: crates/basilica-validator/src/config/collateral.rs:206-208

A validator that enables collateral config without explicitly setting shadow_mode = true will immediately execute real on-chain slashes. Consider defaulting to true for safety.


M-5: ABI file naming inconsistency (CollateralUpgradable vs CollateralUpgradeable)

File: crates/collateral-contract/src/CollateralUpgradableABI.json

Missing 'e' in "Upgradeable". Should be CollateralUpgradeableABI.json.


M-6: RAO conversion precision loss can strand dust in staking

File: CollateralUpgradeable.sol:855-858

_convertToRao truncates via integer division by 1e9. Sub-gwei deposits lose dust permanently. Enforce msg.value % RAO_PER_TAO == 0 for alpha deposits.


M-7: CLI event pretty-print omits TAO amounts for Deposit/Reclaimed/Slashed events

Files: crates/collateral-contract/src/main.rs:579-589, 624-647, 662-715

TAO amounts are missing from both pretty-print and JSON output. Operators cannot audit TAO flows through the CLI.


LOW Severity Issues

# Issue File
L-1 from_block + 1 potential overflow (use checked_add) collateral_scan.rs:44-45
L-2 std::env::set_var in #[tokio::test] is unsound in multi-threaded context collateral_e2e.rs:148, 209, 257
L-3 refresh_price_cache is a public no-op manager.rs:120-122
L-4 Unnecessary node_id.clone() in CLI query commands main.rs:448, 460-461
L-5 Multiple TODO comments -- should be tracked as issues Multiple files
L-6 Miner uses f64 for RAO-to-alpha while validator uses Decimal basilica-miner/src/main.rs:385-390
L-7 flow.sh and query.sh have hardcoded contract addresses flow.sh:3, query.sh:3
L-8 deploy.sh doesn't validate env vars before use deploy.sh
L-9 .env.local written without restrictive file permissions setup-localnet-env.sh:360-380
L-10 CONTRACT_DEPLOYED_BLOCK_NUMBER is 0 -- will scan from genesis on mainnet config.rs:24
L-11 Private key accepted as CLI argument (visible in /proc/*/cmdline) main.rs:52
L-12 Integration test scripts lack set -e for error handling integration-test.sh
L-13 StakingV2PrecompileMock.getStake returns meaningless values based on gas CollateralUpgradeable.t.sol:21-23

Suggestions for Improvements

  1. Add on-chain rate limiting for slashes: A daily/weekly cap on total slashable amounts per trustee would limit blast radius of key compromise. This is the single most impactful security enhancement for the financial system.

  2. Implement block range chunking in event scanner: Critical for first-run and catch-up scenarios. Without it, new validators cannot sync collateral state.

  3. Add idempotency checks to deposit event handler: The reclaim handler checks for duplicates (collateral_persistence.rs:446-456), but deposit handler does not. If an RPC returns duplicate events, deposits could be double-counted.

  4. Consider reorg safety margin: Re-scan from last_block - N with idempotent event handling to handle chain reorganizations.

  5. Share network parsing utility: Extract to config/collateral.rs and fail consistently on unknown networks in both scanner and slash executor.

  6. Set CONTRACT_DEPLOYED_BLOCK_NUMBER to actual deployment block before mainnet: Currently 0, which would cause full chain scan.


Positive Observations

  1. CEI pattern correctly applied throughout: All external-call-making functions properly update state before external calls. Local variables capture values before zeroing storage.
  2. ReentrancyGuard consistently on all mutative functions: Correct use of OpenZeppelin's ReentrancyGuardUpgradeable.
  3. UUPS upgrade pattern properly implemented: _disableInitializers() in constructor, initializer modifier, reinitializer(2) on V2, correct _authorizeUpgrade with role check.
  4. Atomic per-block event processing: apply_collateral_events_for_block wraps all events + block number update in a single SQLite transaction. If any event fails, the entire block rolls back.
  5. Comprehensive rate limiting on slash execution: Three-layer protection (per-miner cooldown, global rate, circuit breaker) with proper entry pruning.
  6. SHA-256 evidence integrity chain: Slash evidence stored, hashed, checksum submitted on-chain -- creating an immutable audit trail.
  7. SQL injection prevention: All queries use parameterized sqlx bindings. Column selection in get_collateral_amount_internal uses match instead of string interpolation.
  8. Saturating arithmetic throughout: Both Solidity (OpenZeppelin Math) and Rust (U256 saturating_add/saturating_sub) prevent over/underflow.
  9. Comprehensive test suite: 3600+ lines of Solidity tests covering deposits, reclaims, slashes, denials, access control, edge cases, upgrades. Rust-side tests cover dual-asset state machine transitions and atomic rollback.
  10. Clean separation of concerns: Evaluator, manager, executor, scanner, evidence, and grace tracker each have single responsibilities following SOLID principles.
  11. Config validation: CollateralConfig::validate() performs comprehensive parameter validation including R2 evidence config requirements when shadow mode is disabled.
  12. EOA enforcement for first deposit: msg.sender.code.length == 0 && tx.origin == msg.sender prevents contract-based ownership claims. Constructor bypass also blocked.

Recommendation

Do NOT merge until C-1, C-2, C-3, C-4 are fixed. These are state-correctness bugs in the smart contract that could lead to fund insolvency (C-1), permanent state corruption (C-2), denial of service (C-3), and operational reliability issues (C-4).

Suggested merge path:

  1. Fix C-1 through C-4 (critical contract and Rust bugs)
  2. Address H-1 through H-6 (high severity -- strongly recommended before production)
  3. Create tracking issues for M-1 through M-7 (can be addressed in follow-up PRs)
  4. Low issues are non-blocking

The overall architecture is sound and demonstrates strong engineering fundamentals. The dual-asset collateral model is well-designed with proper separation of TAO and alpha state. The primary concerns are around edge cases in the state machine (pending reclaim + slash interactions) and operational safety (scan loop, precision, defaults).

Ok(())
}

async fn apply_collateral_event(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a weird place to have your channel handlers, in the persestnce layer. i'd expect a domain module for collateral of some sorts. This is a bad design as you're mixing handling logix with the DB layer


ALTER TABLE collateral_status RENAME TO collateral_status_legacy;

CREATE TABLE IF NOT EXISTS collateral_status (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have a table design that marks what's makes this entry unique, is it the combination ot (hotkey, node_id, miner) ?

UNIQUE(hotkey, node_id)
);

CREATE TABLE IF NOT EXISTS collateral_reclaims (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

.await
}

pub async fn get_tao_collateral_amount(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the get_collateral_amount needs to simply be renamed to get_tao_collateral_amount otherwise it's confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand right, it's a bit scary; because you have in the DB the status of the collateral, INSTEAD the source of truth should had been ONLY the chain. now you have two states of truth, that's risky to manage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was very much needed, well done!

@epappas
Copy link
Member

epappas commented Feb 26, 2026

Architectural Concern: DB as Source of Truth for Collateral Eligibility (No Chain Reconciliation)

Beyond the issues raised in the formal review, there is a structural concern with the current design that warrants discussion before merge.

The Problem

The system uses two independent trust paths for collateral data, and they diverge in a way that creates a consistency gap:

Path 1 -- Eligibility decisions (DB-only, no chain verification):

Chain events -> collateral_scan.rs -> SQLite DB -> manager.rs::get_collateral_alpha() -> evaluator.rs::evaluate()

The `get_collateral_alpha` method in `manager.rs:160-162` reads only from SQLite:

let amount = self.persistence
    .get_alpha_collateral_amount(&hotkey_hex, &node_hex)
    .await?;

This feeds into `evaluator.rs:70-89` which makes the `Sufficient/Warning/Undercollateralized/Excluded` determination. There is zero on-chain verification at this step. If the DB is stale or wrong, eligibility decisions are wrong.

Path 2 -- Slash execution (reads from chain, ignores DB):

slash_executor.rs::resolve_slash_amount() -> resolve_onchain_alpha_amount() -> chain_client.alpha_collaterals()

The slash executor at `slash_executor.rs:633-652` does read from chain:

async fn resolve_onchain_alpha_amount(...) -> Result<U256> {
    let amount = self.chain_client
        .alpha_collaterals(*hotkey_bytes, *node_bytes, network_config)
        .await?;
    ...
}

For slashing, the chain IS the source of truth. Good. But the two paths have no reconciliation mechanism between them.

Concrete Drift Scenarios

  1. No idempotency on deposits: `handle_deposit_with_tx` (`collateral_persistence.rs:396-436`) uses `saturating_add`. If the same Deposit event is processed twice (duplicate RPC response, restart bug), the DB balance doubles while the chain stays the same. Note: the reclaim handler does have duplicate detection (`collateral_persistence.rs:446-456`), but the deposit handler does not.

  2. No reorg handling: If the chain reorganizes, already-committed SQLite events become phantom state. There is no re-scan or rollback mechanism.

  3. Scan gaps can permanently desync state: If the scanner (`collateral_scan.rs`) fails for an extended period (it has no graceful shutdown -- C-4 in the review), or if the block range exceeds what the RPC will return (H-5), events can be permanently missed.

  4. External state changes invisible to the DB: If a miner unstakes through a different path (directly calling the precompile, or through another tool), the DB would still show the old collateral balance. The contract's `alphaCollaterals` would be correct on-chain, but the validator's DB would be stale.

Impact

The `get_preference()` method (`manager.rs:96-118`) determines whether a miner is `Preferred`, `Fallback`, or `Excluded` -- this directly affects whether they receive rental bids. A desync where:

  • DB shows collateral but chain has zero: An undercollateralized miner continues receiving work (they shouldn't).
  • DB shows zero but chain has collateral: A properly-collateralized miner gets excluded (unfair penalty).

The Saving Grace

The slash path reads from chain, so the worst case from slashing is bounded by reality (you can't slash more than what's actually on-chain). The risk is more about unfairly allowing or excluding miners based on stale DB state.

What's Missing

A production-grade system handling financial state should have:

  1. Periodic on-chain reconciliation: A background loop that samples DB state vs `alpha_collaterals()` on-chain for active miners and flags divergence. Doesn't need to run every cycle -- even a 5-minute health check would catch drift.

  2. Chain as authoritative fallback: When the evaluator checks eligibility, it should have an option to verify against the chain when the DB indicates a borderline state (near the minimum threshold).

  3. Drift alerting: At minimum, a Prometheus metric like `collateral_db_chain_drift_count` that fires when the two sources disagree beyond a tolerance, so operators know the materialized view is stale.

  4. Deposit idempotency: Add duplicate detection to `handle_deposit_with_tx`, matching what already exists for reclaim events.

Recommendation

This is not necessarily a merge-blocker if the team is comfortable with the risk during initial rollout (especially with shadow mode). But before production with real collateral at stake, a reconciliation mechanism between the DB materialized view and the actual on-chain `alphaCollaterals` state should be added. This would be a targeted, surgical addition -- not a rewrite -- and would close the consistency gap.

@epappas
Copy link
Member

epappas commented Feb 27, 2026

Correction: Retraction of Incorrect Findings and Revised Assessment

After a thorough line-by-line re-verification of the source code against every claim in the original review, I am issuing corrections. Several findings attributed to the automated review agents were factually incorrect -- they referenced code constructs, variables, and line numbers that do not exist in the actual source. The following is the corrected, evidence-backed assessment.


RETRACTED -- C-1: "Both taoEnabled and alphaEnabled cause double-spending of msg.value"

Status: INCORRECT. This issue does not exist.

The original claim stated that enabling both deposit toggles would cause msg.value to be spent twice -- once recorded as TAO and once sent via addStake{value: msg.value}. This is factually wrong.

Evidence -- CollateralUpgradeable.sol, deposit() function, lines 308-317:

uint256 actualAlphaAmount = alphaAmount;
if (alphaAmount > 0) {
    // ...
    actualAlphaAmount = transferAlpha(alphaHotkey, alphaAmount);  // line 313
    alphaCollaterals[hotkey][nodeId] += actualAlphaAmount;         // line 314
}
taoCollaterals[hotkey][nodeId] += msg.value;                       // line 317

transferAlpha() (lines 644-678) uses delegatecall to IStaking.transferStake, which moves alpha stakes between coldkeys on the Substrate staking layer. It does not consume ETH or use msg.value. There is no addStake{value: msg.value} call anywhere in the deposit flow. TAO (ETH held by the contract) and alpha (staking precompile transfer) are independent operations that can coexist without double-spending.


RETRACTED -- C-2: "pendingReclaimCount not decremented on full slash"

Status: INCORRECT. The referenced variable does not exist.

The original claim referenced a pendingReclaimCount mapping at "lines 740-755". The contract file is 703 lines long. The variable pendingReclaimCount does not exist anywhere in the contract. The full state variable declaration is at lines 52-75. Pending reclaims are tracked via amount-based mappings taoCollateralUnderPendingReclaims and alphaCollateralUnderPendingReclaims (lines 66-67), not a per-miner request counter.


RETRACTED -- C-3: "nodeToMiner not cleared on full slash"

Status: INCORRECT. The code already handles this case.

Evidence -- slashCollateral(), lines 535-542:

if (
    amount == slashAmount && alphaAmount == slashAlphaAmount
        && taoCollateralUnderPendingReclaims[hotkey][nodeId] == 0
        && alphaCollateralUnderPendingReclaims[hotkey][nodeId] == 0
) {
    nodeToMiner[hotkey][nodeId] = address(0);
    ownerColdkeys[hotkey][nodeId] = bytes32(0);
}

The contract explicitly clears nodeToMiner when: (1) all TAO is slashed, (2) all alpha is slashed, (3) no pending TAO reclaims, (4) no pending alpha reclaims. The only case where it is retained is when there are pending reclaims still in flight, which is correct behavior -- those reclaims must be resolved first.


RETRACTED -- H-2: "requestReclaim clears nodeToMiner before finalization"

Status: INCORRECT. reclaimCollateral does not modify nodeToMiner.

Evidence -- reclaimCollateral() (lines 332-389) performs five operations: (1) ownership check (line 336), (2) available amount calculation (lines 340-344), (3) reclaim record creation (lines 362-370), (4) pending counter increment (lines 372-373), (5) event emission (lines 375-386). There is no delete nodeToMiner or any modification to the nodeToMiner mapping in this function. nodeToMiner is cleared only in finalizeReclaim (line 433) and denyReclaimRequest (line 491), and only when all four balance/pending counters reach zero.


RETRACTED -- H-3: "receive() allows arbitrary ETH deposits"

Status: INCORRECT. The opposite is true.

Evidence -- lines 254-262:

receive() external payable {
    revert InvalidDepositMethod();
}

fallback() external payable {
    revert InvalidDepositMethod();
}

Both receive() and fallback() revert with InvalidDepositMethod(). The contract explicitly rejects all ETH sent outside the deposit() function. The original claim stated receive() external payable {} (empty, accepting body), which is the opposite of the actual code.


RETRACTED -- H-4: "No events for setNodeMinerMapping/clearNodeMinerMapping"

Status: INCORRECT. These functions do not exist.

The original claim referenced functions setNodeMinerMapping (line 780) and clearNodeMinerMapping (line 800). The contract is 703 lines. These functions do not exist anywhere in the contract source. nodeToMiner is modified only internally within deposit, finalizeReclaim, denyReclaimRequest, and slashCollateral.


Confirmed Valid Findings

The following findings have been re-verified against the source code and remain valid:

1. compute_slash_amount precision edge case (Originally H-6)

Severity: Medium
File: crates/basilica-validator/src/collateral/slash_executor.rs, lines 281-298

Evidence:

fn compute_slash_amount(&self, collateral: U256) -> U256 {
    if collateral.is_zero() || self.config.slash_fraction >= Decimal::ONE {
        return collateral;
    }
    let numerator = (self.config.slash_fraction * Decimal::from(100u64))
        .round()
        .to_u64()
        .unwrap_or(0);
    if numerator == 0 || numerator >= 100 {
        return collateral;  // <-- BUG: returns FULL collateral when numerator rounds to 0
    }
    // ...
}

Reproduction: For slash_fraction = 0.004 (0.4%):

  • 0.004 * 100 = 0.4
  • .round() = 0
  • numerator == 0 matches the guard at line 289
  • Returns collateral (100% slash instead of 0.4%)

The config validation at collateral.rs:122 allows any value in (0.0, 1.0]. The default is Decimal::ONE (line 240-241), which takes the early return at line 282. But any configured sub-1% fraction rounds to zero and triggers a full slash.

Impact: If a validator operator configures a slash fraction below 1% (e.g., 0.5%), the system would slash 100% of collateral instead of the intended fraction.

Proposed fix: When numerator == 0, return a minimum slash amount (e.g., U256::from(1u64)) instead of the full collateral. Or use basis points (10000) for better sub-percent precision.


2. Scan loop has no graceful shutdown (Originally C-4)

Severity: Low-Medium
File: crates/basilica-validator/src/collateral/collateral_scan.rs, lines 28-41

Evidence:

pub async fn start(&mut self) -> Result<()> {
    info!("Starting collateral event scan loop");
    let mut interval = tokio::time::interval(self.config.collateral_event_scan_interval);
    loop {
        tokio::select! {
            _ = interval.tick() => {
                if let Err(e) = self.scan_handle_collateral_events().await {
                    error!("Collateral event scan failed: {}", e);
                }
            }
        }
    }
}

The tokio::select! has a single branch with no cancellation token. The loop never exits. The project's established pattern (rental monitoring, billing telemetry) uses CancellationToken for graceful shutdown. The severity depends on how this is spawned -- if the parent task is aborted externally, the functional impact is limited.


3. MAX_BLOCKS_PER_SCAN is defined but never used (Originally H-5)

Severity: Low-Medium
File: crates/collateral-contract/src/config.rs:21 (definition), crates/collateral-contract/src/lib.rs:114-130 (scan function)

Evidence:

config.rs:21 defines:

pub const MAX_BLOCKS_PER_SCAN: u64 = 1000;

lib.rs:114-129 -- scan_events() calls scan_events_with_scope(from_block, current_block, ...) with no reference to MAX_BLOCKS_PER_SCAN. A repository-wide search confirms the constant is never imported or referenced outside its declaration.

scan_events_with_scope() at lib.rs:132-146 issues a single provider.get_logs(&filter) call spanning the full from_block..to_block range with no chunking.

Impact: On first run or after extended downtime, the scanner would issue an eth_getLogs call spanning potentially millions of blocks. Most RPC providers reject or timeout requests beyond ~10k blocks. The scan loop would fail repeatedly at the configured interval without making progress.


4. Trustee key compromise (Originally H-1) -- Architectural Observation

Severity: Informational (design tradeoff, not a code bug)

The TRUSTEE_ROLE has unrestricted on-chain slashing authority. The off-chain SlashRateLimiter in slash_executor.rs provides defense-in-depth but does not protect against direct contract interaction. This is a design choice with documented tradeoffs, not a defect.


Summary of Corrections

Original ID Original Claim Verdict Reason
C-1 Double-spend msg.value Retracted transferAlpha uses transferStake, not addStake{value}
C-2 pendingReclaimCount bug Retracted Variable does not exist in contract
C-3 nodeToMiner not cleared Retracted Lines 535-542 handle this correctly
C-4 No graceful shutdown Valid (Low-Medium) Confirmed: no CancellationToken in scan loop
H-1 Trustee key risk Valid (Informational) Architectural observation, not a bug
H-2 requestReclaim clears nodeToMiner Retracted reclaimCollateral does not touch nodeToMiner
H-3 receive() accepts ETH Retracted receive() reverts with InvalidDepositMethod
H-4 Missing events for nonexistent functions Retracted Functions don't exist in contract
H-5 MAX_BLOCKS_PER_SCAN unused Valid (Low-Medium) Constant defined but never referenced
H-6 Slash precision bug Valid (Medium) Sub-1% fractions cause 100% slash

I apologize for the inaccuracies in the original review. Six of the ten critical/high findings were incorrect -- the review agents fabricated code constructs, referenced nonexistent variables and line numbers, and in one case described the exact opposite of the actual code behavior. The three confirmed valid issues are: the slash fraction precision bug (medium), the unused scan chunking constant (low-medium), and the missing cancellation token (low-medium).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants