Skip to content

fix(relayer): improve merkle tree sync error logging#401

Closed
danwt wants to merge 502 commits intomainfrom
danwt/claude/improve-merkle-tree-logging
Closed

fix(relayer): improve merkle tree sync error logging#401
danwt wants to merge 502 commits intomainfrom
danwt/claude/improve-merkle-tree-logging

Conversation

@danwt
Copy link
Copy Markdown

@danwt danwt commented Dec 10, 2025

Summary

  • Upgrade log level from info to error when merkle tree is empty/not synced, making RPC connectivity issues immediately obvious
  • Add detailed error messages explaining common causes (RPC unreachable, validators not signing, insufficient storage announcements)
  • Include origin/destination domain context in error messages for faster debugging
  • Add debug logging when tree count is zero in highest_known_leaf_index()

Context

When the relayer cannot reach the origin chain RPC, the local merkle tree sync fails silently, resulting in cryptic "Unable to reach quorum" errors. This PR makes it immediately clear that the root cause is merkle tree sync failure.

Test plan

  • Run relayer with unreachable origin RPC and verify error messages are clear
  • Verify cargo check passes
  • Verify cargo fmt passes

🤖 Generated with Claude Code

srene and others added 30 commits September 10, 2025 11:42
…yz#7018)

Co-authored-by: Danil Nemirovsky <4614623+ameten@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Danil Nemirovsky <4614623+ameten@users.noreply.github.com>
Co-authored-by: Danil Nemirovsky <4614623+ameten@users.noreply.github.com>
Co-authored-by: Danil Nemirovsky <4614623+ameten@users.noreply.github.com>
Signed-off-by: pbio <10051819+paulbalaji@users.noreply.github.com>
…os (hyperlane-xyz#7047)

Co-authored-by: Danil Nemirovsky <4614623+ameten@users.noreply.github.com>
…xyz#7055)

Co-authored-by: Danil Nemirovsky <4614623+ameten@users.noreply.github.com>
danwt and others added 28 commits December 2, 2025 15:36
Fix linter issues in fork-specific Rust code:
- Use std::io::Error::other() for simpler error construction
- Use unwrap_or_default() instead of unwrap_or(Type::zero())
- Replace unwrap() with expect() with descriptive messages
- Add missing documentation for public modules and fields
- Remove redundant closures in error mapping
- Use From::from for infallible conversions
- Add tokio "tracing" feature to Cargo.toml

Note: There is a remaining compilation issue with tokio::task::Builder
API that prevents full compilation. This appears to be a pre-existing
issue that needs separate investigation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…372)

* claude: feat(kaspa): verify validator signatures before counting toward threshold

Add signature verification for deposit signatures to prevent misconfigured
validators from causing bridge transaction failures.

Changes:
- Add validator_ism_addresses field to RelayerStuff config
- Refactor collect_with_threshold to accept optional validation function
- Add signature verification in get_deposit_sigs that checks recovered
  signer against expected ISM address from config
- Only count verified signatures toward threshold

This ensures that even if a validator returns a signature signed with
the wrong key, the relayer will reject it and continue waiting for
valid signatures.

Fixes #129

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* claude: refactor: remove verbose debug log for successful signature verification

* claude: feat(kaspa): parse validatorIsmAddresses from config

* format and remove debug

* claude: feat(relayer): make validator signature verification optional

Make signature verification conditional on validatorIsmAddresses being
populated. If the list is empty, skip verification entirely. This allows
gradual rollout and testing before enforcement.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
#374)

Downgrade repetitive per-validator logs from info to debug level to
prevent log spam in production:
- base_builder.rs:228: validator storage locations (logs per validator)
- multisig.rs:68: successful validator index returns (logs per validator)

Also removes verbose checkpoint_syncers field from success log since
the count provides sufficient information.

These logs fire for every validator on every message, causing
excessive verbosity at info level. Debug level is more appropriate
for this detailed diagnostic information.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
… log size (#375)

The Transaction struct's derived Debug implementation printed all
tx_hashes, which can contain hundreds of 64-char hashes. This caused
single log lines to exceed 40KB. Replace derived Debug with a custom
implementation that shows only essential fields (uuid, tx_hashes_count,
status, submission_attempts).

Closes dymensionxyz/hyperlane-deployments#134

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
Retries are expected behavior in RPC clients, not warnings. Change the
log level from warn to debug to reduce log noise in production while
maintaining debuggability via RUST_LOG configuration.

Closes dymensionxyz/hyperlane-deployments#134

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…debug (#377)

Rate limiting is expected when using public RPC providers. Change log
level from info to debug for both JsonRpcError and SerdeJson rate limit
detection paths to reduce log noise while maintaining debuggability.

Closes dymensionxyz/hyperlane-deployments#134

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
#378)

* progress checkpoint: code might be broken

* claude: feat(relayer): improve logging for message submission and confirmation

Add structured logging to help diagnose Base and BSC transaction issues:

- debug log before submitting process transaction (message_id, gas_limit, destination)
- info log after successful tx submission (message_id, tx_id, gas_used)
- error log with full context on submission failure
- debug log when checking delivery status
- warn log with has_tx_outcome field when reverted/reorged to identify missing broadcasts

These logs help diagnose:
- Whether transactions are being broadcast at all
- What gas limits are being used
- Whether the issue is at submission or confirmation stage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Cap the maximum backoff time for message retries to 15 minutes.
This ensures stuck messages are retried promptly rather than waiting
hours between attempts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…nd_threshold (#383)

The CosmosNativeIsm's validators_and_threshold() function only handled
MerkleRootMultisigIsm but not MessageIdMultisigIsm, even though module_type()
correctly recognizes both ISM types. This caused relayer to fail with
"ISM not a multi sig ism" error when processing messages on Cosmos chains
configured with MessageIdMultisigIsm.

Add handling for MessageIdMultisigIsm in validators_and_threshold() to
match the existing MerkleRootMultisigIsm case.

Fixes: https://github.com/dymensionxyz/hyperlane-deployments/issues/141

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…381)

The hardcode crate has hyperlane-core listed as a dependency but doesn't
actually use it anywhere. This change removes the dependency to reduce
coupling between libs/kaspa and rust/main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
Move Hyperlane domain ID constants from libs/kaspa/lib/hardcode/src/hl.rs
to a new dedicated crate rust/main/chains/dymension-kaspa-hl-constants.

This refactoring:
- Makes the hardcode crate purely Kaspa-focused
- Places HL integration constants in the rust/main workspace
- Breaks circular dependencies by creating a minimal constants crate
- Updates all imports across both workspaces

The new dymension-kaspa-hl-constants crate has no dependencies and is
imported by both dymension-kaspa and the libs/kaspa crates (core, validator).

Related to issue #140 in hyperlane-deployments repo.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
Deduplicates kaspa-* dependencies by moving them from dymension-kaspa crate
to rust/main workspace dependencies. This eliminates duplication and reduces
maintenance burden while keeping the same git rev (9ff5d0f).

Note: The libs/kaspa workspace keeps its own definitions as Cargo doesn't
support cross-workspace dependency inheritance.

Related to dymensionxyz/hyperlane-deployments#140

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…sion-kaspa (#386)

Re-export dym_kas_core::message as dymension_kaspa::hl_message to create
semantic separation between pure Kaspa libs and Hyperlane integration layer.

- Add pub use dym_kas_core::message as hl_message in dymension-kaspa lib.rs
- Update imports in rust/main to use dymension_kaspa::hl_message instead of dym_kas_core::message
- Files in dymension/libs/kaspa continue using corelib::message (no circular deps)
- Both workspaces build successfully

This provides clear naming that indicates Hyperlane-specific functionality
while avoiding circular dependencies between workspaces.

Related to dymensionxyz/hyperlane-deployments#140

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
… bridge crate (#389)

This PR splits the core module into two parts:
- Pure Kaspa functionality stays in libs/kaspa/lib/core
- Bridge/HL-dependent logic moves to new libs/kaspa/lib/bridge crate

The new bridge crate contains:
- message.rs - HL message parsing
- deposit.rs - DepositFXG struct
- withdraw.rs - WithdrawFXG struct
- payload.rs - MessageIDs encoding
- confirmation.rs - ConfirmationFXG struct
- util.rs - Address<->H256 conversion
- user/ - User deposit/payload utilities

After this change:
- libs/kaspa/lib/core has ZERO hyperlane dependencies
- libs/kaspa/lib/bridge depends on both core (pure Kaspa) and HL
- relayer/validator/tooling updated to use bridge crate
- dymension-kaspa re-exports bridge as kas_bridge

This completes Phase 2 of dependency inversion.

Refs: #140

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
PR #389 moved confirmation and deposit modules from core to the new
bridge crate. This fix updates hyperlane-base to import from the
correct location and adds the bridge dependency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…main (#387)

- Move relayer source files to rust/main/chains/dymension-kaspa/src/kas_relayer/
- Update imports to use dym_kas_core, dym_kas_bridge, dym_kas_hardcode
- Remove relayer from libs/kaspa workspace
- Update hyperlane-base to use dymension_kaspa::kas_relayer

This reduces the HL dependencies in libs/kaspa, working towards the goal
of keeping only pure Kaspa code there.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…t/main (#388)

- Move validator module from libs/kaspa/lib/validator to rust/main/chains/dymension-kaspa/src/kas_validator
- Update imports: crate::error -> crate::kas_validator::error, corelib -> dym_kas_core, bridge -> dym_kas_bridge, api_rs -> dym_kas_api, secp256k1 -> kaspa_bip32::secp256k1
- Update dymension-kaspa lib.rs to expose kas_validator module
- Update validator_server.rs to use local kas_validator imports
- Remove dym-kas-validator dep from dymension-kaspa and hyperlane-base Cargo.toml
- Update libs/kaspa kms crate: remove validator dep, define KaspaSecpKeypair locally
- Update libs/kaspa tooling: depend on dymension-kaspa for signer module, add ethers feature to hyperlane-core
- Remove validator from libs/kaspa workspace members

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
…ain/dymension-kaspa (#391)

Move the bridge crate from libs/kaspa/lib/bridge to
rust/main/chains/dymension-kaspa/src/kas_bridge.

This continues the refactoring effort to separate pure Kaspa code
(libs/kaspa) from Hyperlane integration code (rust/main). The bridge
module contains HL-dependent types for deposit/withdraw/confirmation
payloads, message parsing, and user-facing deposit functionality.

Changes:
- Move bridge module files to dymension-kaspa as kas_bridge
- Update imports from prost to hyperlane_cosmos_rs::prost
- Update dym_kas_bridge:: references to crate::kas_bridge::
- Update secp256k1 imports to kaspa_bip32::secp256k1
- Add workflow-core to workspace dependencies
- Remove bridge from libs/kaspa workspace members
- Update tooling to use dymension_kaspa::kas_bridge::

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
* update docs

* restore `"accountAddressType": "Bitcoin"`
* claude: fix(dymension-kaspa): add ethers feature for signature recovery

Enable the ethers feature on hyperlane-core dependency to make the
`recover()` method available on SignedType<T>. Also clean up unused
imports and variables in validators.rs, and fix import ordering in
libs/kaspa/tooling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* claude: refactor(kaspa): extract PSKT utilities to dym_kas_core

Phase 2A of Kaspa refactoring - move pure Kaspa PSKT utilities from
rust/main to libs/kaspa/lib/core:

- input_sighash_type() and is_valid_sighash_type() for standard sighash
- PopulatedInput type alias and PopulatedInputBuilder struct
- utxo_reference_from_populated_input() for mass calculation
- estimate_mass() for transaction mass estimation

These are pure Kaspa utilities with no Hyperlane dependencies, making
libs/kaspa/lib/core more self-contained for PSKT operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* claude: chore(kaspa): remove dead code and unnecessary re-exports

Clean up after PSKT utilities extraction:
- Remove dead populated_input.rs module (only contained re-exports)
- Remove unnecessary re-exports in sweep.rs and hub_to_kaspa.rs
- Update imports to use dym_kas_core::pskt directly
- Rename test_foo to test_recipient_address_roundtrip

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* claude: refactor(kaspa): move tooling to rust/main/utils/kaspa-tools

Move the kaspa-tools crate from libs/kaspa/tooling to rust/main/utils/kaspa-tools
since it has Hyperlane dependencies and doesn't belong in the pure Kaspa library.

- Add kaspa-tools to rust/main workspace
- Update import paths (corelib -> dym_kas_core)
- Add secp256k1 to workspace dependencies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* claude: refactor(kaspa): remove tooling and HL deps from libs/kaspa

Remove tooling crate from libs/kaspa workspace and remove all Hyperlane
dependencies to keep libs/kaspa as a pure Kaspa library.

- Remove tooling from workspace members
- Remove hyperlane-cosmos-rs dependency
- Remove cometbft workspace dependencies
- Clean up comment artifacts in client.rs
- Regenerate Cargo.lock without HL packages

After this change, libs/kaspa has only one hyperlane mention remaining:
a cross-reference URL in CONTRIBUTING.md (acceptable).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Update documentation references to use the new kaspa-tools location:
- libs/kaspa/tooling -> rust/main (cargo run -p kaspa-tools --)

Affected files:
- dymension/validators/bridge/README_bridge_kaspa.md
- dymension/tests/kaspa_hub_test_kas/commands.sh

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…395)

* claude: refactor(kaspa): collapse hl-constants into dymension-kaspa

Remove the separate dymension-kaspa-hl-constants crate and move all
constants into dymension-kaspa/src/consts.rs.

- Merge domain ID constants into consts.rs
- Update all internal imports to use crate::consts
- Maintain backward compatibility via `pub use consts as hl_domains`
- Remove dymension-kaspa-hl-constants from workspace

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* claude: refactor(kaspa): move validator_server into kas_validator module

Move validator_server.rs to kas_validator/server.rs for better organization.

- Rename validator_server.rs -> kas_validator/server.rs
- Update imports to use super:: for sibling modules
- Re-export from kas_validator module

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* claude: refactor(kaspa): rename kas_bridge to bridge

Rename kas_bridge module to simply 'bridge' - clearer semantic naming
that describes the module's purpose (bridge operation data types:
DepositFXG, WithdrawFXG, ConfirmationFXG, message parsing).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* claude: refactor(kaspa): rename bridge to ops

The name 'bridge' was too generic since the entire project is about
a bridge. The module contains operation types (DepositFXG, WithdrawFXG,
ConfirmationFXG) - the wire formats exchanged between validators and
relayers. 'ops' (operations) better describes its purpose.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Delete prometheus/ module - was copy-pasted from hyperlane-cosmos
  but never used (dead code)
- Remove unused Cargo.toml dependencies: ripemd, once_cell, itertools,
  protobuf, pin-project
- Remove dangling mod withdraw_test reference in kas_validator/mod.rs
- Remove unused KEY_MESSAGE_IDS constant from libs/kaspa/core

The prometheus module contained MetricsChannel/MetricsChannelFuture
for instrumenting gRPC clients, but these were never wired up. The
identical code already exists in hyperlane-cosmos if needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…#399)

- Remove unused imports from libs/kaspa/lib/core/src/balance.rs
  (kaspa_wallet_core::prelude::*, std::sync::Arc)
- Prefix unused parameter with underscore in client.rs (_domain_kas)
- Remove #![allow(unused)] from ops/user/deposit.rs and clean up
  all unused imports in that file

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Improve logging clarity when relayer fails to build metadata due to
merkle tree sync issues. This makes it much more obvious when the
relayer cannot relay messages because the origin chain merkle tree
is not synced (typically due to RPC connectivity issues).

Changes:
- Upgrade log level from info to error when merkle tree is empty
- Add detailed error messages explaining common causes
- Include origin/destination domain context in error messages
- Add debug logging when tree count is zero

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@danwt
Copy link
Copy Markdown
Author

danwt commented Dec 10, 2025

Recreating with correct base branch (main-dym)

@danwt danwt closed this Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.