Skip to content

Conversation

@phutchins
Copy link
Contributor

@phutchins phutchins commented Oct 17, 2025

Note

Introduces P2P binding fixes and comprehensive logging/ops tooling, plus a small IPC config API enhancement and docs updates.

  • (cli) Add listen_ip to p2p config (default 0.0.0.0) and use it for resolver listen_addr; advertise external_addresses based on external_ip to fix cloud VM bindings; update node init (CLI/UI) to include listen_ip and add tests
  • (node/settings) Add BottomUpSettings with enabled flag and IpcSettings::bottomup_enabled() (defaults to enabled)
  • (docs) Expand node-init p2p networking section with external-ip vs listen-ip guidance and cloud/local examples; update CHANGELOG
  • (infra) New infra/elk-logging stack (Elasticsearch, Logstash, Kibana, Grafana) with Docker Compose, Filebeat templates, provisioning, and scripts (setup-central-server.sh, deploy-filebeat.sh, check-log-flow.sh, elk-manager.sh)
  • (faucet) Add .dockerignore, .env.example, and scripts/check-pending-txs.js
  • (scripts) Add operational guides and utilities (fix-parent-finality*, clear-mempool.sh, monitoring docs); broaden .gitignore

Written by Cursor Bugbot for commit 0e6d322. This will update automatically on new commits. Configure here.

This commit addresses a critical bug in `ipc-cli node init` that prevented libp2p from binding to network interfaces on cloud VMs (GCP, AWS, Azure). The fix ensures that `listen_addr` is set to `0.0.0.0` for proper binding, while `external_addresses` correctly advertises the public IP. This change restores functionality for parent finality voting and top-down message execution.

Changes include:
- Updated `ConnectionOverrideConfig` to include `external_addresses`.
- Modified port configuration logic to use `0.0.0.0` for `listen_addr`.
- Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect these changes.
- Added tests to verify the correct configuration behavior.

Existing deployments may need to reinitialize or manually update their configurations to apply this fix.
This commit introduces a new `listen-ip` field in the P2P configuration, allowing advanced users to specify a custom IP address for binding services, while maintaining the default of `0.0.0.0` for maximum compatibility. This enhancement addresses previous limitations in binding on cloud VMs and improves flexibility for complex network setups.

Changes include:
- Updated `P2pConfig` structure to include the `listen-ip` field.
- Adjusted port configuration logic to utilize the `listen-ip` for binding.
- Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect the new configuration options and usage examples.
- Added tests to ensure correct behavior of the new `listen-ip` functionality.

This update is fully backward compatible and does not require changes to existing configurations.
…ality issue

This commit updates the subnet configuration by changing the validator power from 1 to 3 and modifying the subnet ID to ensure compatibility with the latest deployment requirements. Additionally, a new markdown file is introduced to document the 16-hour lookback issue affecting parent finality on the Glif Calibration testnet, outlining the problem, root cause, and proposed solutions.

Changes include:
- Updated `ipc-subnet-config.yml` with new subnet ID and validator power.
- Added `PARENT-FINALITY-16H-LOOKBACK-ISSUE.md` to provide detailed insights into the parent finality issue and potential workarounds.

These updates aim to enhance the reliability and documentation of the IPC subnet management process.
…inality progress

This commit introduces a new `watch-finality` command to the IPC subnet manager, enabling users to monitor parent finality progress in real-time. The command supports continuous monitoring, target epoch tracking, and customizable refresh intervals.

Changes include:
- Added `cmd_watch_finality()` function in `ipc-subnet-manager.sh`.
- Updated usage documentation to include examples for the new command.
- Implemented `watch_parent_finality()` function in `lib/health.sh` for monitoring logic.
- Created `WATCH-FINALITY-FEATURE.md` to document usage, output, and potential use cases.

These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of parent finality and subnet health.
…onitoring

This commit adds a new `watch-blocks` command to the IPC subnet manager, enabling users to monitor block production in real-time. The command supports continuous monitoring, target height tracking, and customizable refresh intervals.

Changes include:
- Implemented `cmd_watch_blocks()` function in `ipc-subnet-manager.sh`.
- Added `watch_block_production()` function in `lib/health.sh` for monitoring logic.
- Updated usage documentation with examples for the new command.
- Created `WATCH-BLOCKS-FEATURE.md` to document usage, output, and potential use cases.
- Adjusted `ipc-subnet-config.yml` to optimize block production settings.

These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of block production and overall subnet health.
This commit introduces an extensive "Advanced Performance Tuning Guide" to optimize IPC subnet performance, detailing configuration changes and expected impacts on consensus timeouts, block production, and network performance. Additionally, a new script, `apply-advanced-tuning.sh`, is added to automate the application of these optimizations to existing nodes without reinitialization.

Changes include:
- Created `ADVANCED-TUNING-GUIDE.md` with detailed tuning parameters and expected performance improvements.
- Added `apply-advanced-tuning.sh` script for seamless configuration updates across validators.
- Updated `ipc-subnet-config.yml` with optimized settings for faster block production and parent finality.
- Introduced `OPTIMIZATION-SUMMARY.md` and `PERFORMANCE-OPTIMIZATION-RESULTS.md` to document performance improvements and configurations.
- Enhanced `TUNING-QUICK-REF.md` for quick access to tuning actions and parameters.

These enhancements significantly improve the performance and reliability of the IPC subnet, making it competitive with leading blockchain networks.
This commit introduces a comprehensive solution to address the broadcasting error encountered by validators due to incorrect address configuration. The changes include:

- Added `BOTTOMUP-CHECKPOINT-FIX.md` to document the problem, root cause, and the necessary fix for validator configurations.
- Created `fix-bottomup-checkpoint.sh` script to automate the process of disabling bottom-up checkpointing for federated subnets and updating validator configurations.
- Updated `lib/config.sh` to set the default validator key kind to "ethereum" for EVM-based subnets, preventing future issues.

These enhancements ensure that bottom-up checkpointing is operational and that validators are correctly configured for EVM compatibility, improving overall subnet reliability.
This commit adds a comprehensive live monitoring dashboard to the IPC subnet manager, enabling real-time tracking of various metrics and error categorization. Key changes include:

- Created `lib/dashboard.sh` for core dashboard functionality, including metrics collection and UI rendering.
- Added `cmd_dashboard()` function to `ipc-subnet-manager.sh` for command integration.
- Developed multiple documentation files detailing dashboard features, implementation, and quick reference guides.
- Enhanced error handling and formatting in the dashboard display for improved user experience.

These enhancements significantly improve the monitoring capabilities of the IPC subnet manager, providing users with a unified view of subnet health and activity.
This commit introduces a new `BottomUpSettings` struct to manage bottom-up checkpointing configurations, including an option to enable or disable the feature. Key changes include:

- Added `BottomUpSettings` struct with a default enabled state.
- Updated `IpcSettings` to include a configuration for bottom-up checkpointing.
- Enhanced `BottomUpManager` to accept a flag indicating whether bottom-up checkpointing is enabled.
- Implemented logic to conditionally execute bottom-up checkpointing based on the new settings.

These enhancements provide greater flexibility in managing checkpointing behavior within the IPC subnet, improving overall system reliability.
…t management

This commit introduces a comprehensive "Consensus Recovery Guide" and a "Diagnostic Tools Summary" to assist users in diagnosing and recovering from consensus issues within IPC subnets. Key changes include:

- Added `CONSENSUS-RECOVERY-GUIDE.md` detailing steps for diagnosing and resolving consensus problems, including commands for checking consensus and voting status.
- Introduced `DIAGNOSTIC-TOOLS-SUMMARY.md` outlining new commands like `consensus-status` and `voting-status`, enhancing the ability to monitor validator health and participation.
- Updated `ipc-subnet-manager.sh` to integrate new diagnostic commands.
- Enhanced `lib/health.sh` with functions to display consensus and voting statuses, improving operational visibility.

These enhancements significantly improve the operational capabilities of the IPC subnet manager, enabling targeted recovery actions without data loss and fostering better understanding of consensus dynamics.
…sting

This commit introduces several new scripts to enhance the IPC subnet manager's functionality. Key changes include:

- Added `enable-gateway-ports.sh` to enable GatewayPorts on remote VMs for SSH reverse tunneling.
- Introduced `setup-anvil-tunnels.sh` to establish SSH tunnels from local Anvil to remote validator nodes, allowing access to Anvil running on localhost.
- Created `test-anvil-connection.sh` to verify Anvil connectivity from remote VMs through the established SSH tunnels.
- Updated `ipc-subnet-config.yml` with new configuration settings for improved local and remote RPC endpoints.

These enhancements significantly improve the operational capabilities of the IPC subnet manager, facilitating better connectivity and management of validator nodes.
This commit introduces a new script, `debug-relayer-error.sh`, designed to assist in diagnosing issues related to checkpoint submission failures in the IPC subnet manager. Key features include:

- A series of connectivity checks to ensure the Anvil RPC is accessible.
- Validation of the existence of the Gateway and Subnet Actor contracts.
- Checks for the last bottom-up checkpoint height and subnet activity status.
- Recommendations for common issues encountered during relayer operations.

Additionally, new documentation files, including `FIXES-SUMMARY.md`, `IPC-CONFIG-ORDER-FIX.md`, and `RELAYER-UPDATE-SUMMARY.md`, have been added to summarize recent fixes and updates related to relayer connectivity and configuration management.

These enhancements significantly improve the operational capabilities of the IPC subnet manager, providing users with tools to effectively troubleshoot and resolve relayer-related issues.
This commit introduces a new documentation file, `INSTALL-SYSTEMD-FIX.md`, detailing fixes for common issues encountered during the installation of systemd services in the IPC subnet manager. Key changes include:

- Resolved installation issues where services were only installed on the first validator due to arithmetic expansion errors.
- Ensured the relayer service is installed correctly when requested.
- Added initialization for the `SCRIPT_DIR` variable in service generation functions to prevent template file access issues.
- Included steps to unmask services on affected validators before installation.

Additionally, improvements were made to the `ipc-subnet-manager.sh` and `lib/health.sh` scripts to enhance error handling and logging during the installation process.

These enhancements significantly improve the reliability and usability of the IPC subnet manager's systemd service installation process.
This commit updates the `ipc-subnet-config.yml` with new subnet IDs and contract addresses for improved configuration accuracy. Additionally, it introduces a `--debug` option in the `ipc-subnet-manager.sh` script to enable verbose logging during initialization and error handling, enhancing the debugging process. A new `RELAYER-AND-RESOLVER-FIX.md` documentation file is added, detailing fixes for relayer configuration issues and invalid resolver paths, ensuring better operational reliability.
… configuration improvements

This commit introduces a new command, `update-binaries`, to the `ipc-subnet-manager.sh` script, allowing users to pull the latest code, build, and install binaries on all validators. The command supports specifying a git branch for updates. Additionally, the `ipc-subnet-config.yml` file has been updated with new paths for the IPC repository, and several contract addresses have been modified for improved configuration accuracy. These enhancements streamline the process of maintaining validator binaries and ensure better operational reliability.
This commit adds functionality to convert the validator key to an Ethereum address using fendermint within the `show_subnet_info` function of `lib/health.sh`. It logs the converted address if successful, or warns if the conversion fails. This enhancement improves the visibility of validator information and aids in debugging by providing relevant Ethereum addresses alongside public keys.
This commit introduces a new script, `estimate-gas.sh`, designed to estimate gas usage for transactions between Ethereum addresses. The script utilizes JSON RPC to fetch gas estimates and provides a breakdown of costs at various gas prices. It also includes a recommendation for gas with a 20% buffer, enhancing the operational capabilities of the IPC subnet manager by aiding users in transaction cost planning.
This commit adds a newline at the end of the `estimate-gas.sh` script to ensure consistency with coding standards and improve readability. This minor adjustment helps maintain a clean file structure in the project.
This commit introduces a complete ELK (Elasticsearch, Logstash, Kibana) stack for aggregating logs from IPC validator nodes. Key components include:

- Docker Compose configuration for orchestrating the ELK stack.
- Elasticsearch for log storage and search capabilities.
- Logstash for processing and parsing logs from validators.
- Kibana for visualizing logs and creating dashboards.
- Grafana for alternative visualization options.

Additionally, comprehensive documentation is provided, including setup guides, troubleshooting tips, and monitoring instructions, ensuring a robust logging infrastructure for IPC validators.
@phutchins phutchins marked this pull request as ready for review November 13, 2025 13:16
@phutchins phutchins requested a review from a team as a code owner November 13, 2025 13:16
ssh_user: "philip"
ipc_user: "ipc"
role: "secondary"
private_key: "0xc1099a062e296366a2ac3b26ac80a409833e6a74edbf677a0bd14580d2c68ea2"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Private Keys: Repository Exposure Risk

The configuration file contains three plaintext private keys committed to the repository. These appear to be actual validator private keys rather than example placeholders, given the presence of real IP addresses, subnet IDs, and personal usernames throughout the file. Committing private keys exposes validator control and any associated funds to compromise.

Fix in Cursor Fix in Web

This commit introduces a new local deployment mode for the IPC subnet manager, allowing multiple validators to run on a single machine. Key features include:

- A new configuration file, `ipc-subnet-config-local.yml`, for local mode settings.
- Automatic management of Anvil, including starting and stopping it as needed.
- Systematic port allocation for validators to avoid conflicts.
- CLI enhancements to support local mode operations, including a `--mode` flag.
- Comprehensive documentation detailing the local mode implementation and usage instructions.

These changes enhance the flexibility and usability of the IPC subnet manager for local development and testing environments.
This commit introduces a new feature in the IPC subnet manager that automates the deployment of subnets before initializing validator nodes. Key changes include:

- A new `deploy_subnet()` function in `lib/health.sh` that handles the creation of subnets and deployment of gateway contracts.
- Updates to the `ipc-subnet-manager.sh` script to incorporate subnet deployment as a prerequisite for node initialization.
- Modifications to the `ipc-subnet-config-local.yml` to include a `deploy_subnet` flag for enabling automatic deployment.
- Enhanced error handling and logging to ensure successful subnet creation and configuration updates.

These improvements streamline the setup process for local development environments, reducing the likelihood of initialization errors related to missing subnets.
This commit updates the `ipc-subnet-config-local.yml` to change the subnet ID and adjust the Ethereum API port to avoid conflicts with Anvil. It also modifies the `ipc-subnet-manager.sh` script to streamline the genesis creation process, ensuring it works for both activated and non-activated subnets. Additionally, the `create_bootstrap_genesis` function in `lib/health.sh` is enhanced to utilize the `ipc-cli subnet create-genesis` command, improving error handling and logging for better visibility during subnet initialization. These changes enhance the reliability and usability of the IPC subnet manager for local development environments.
This commit refactors the `fetch_metrics` function in `dashboard.sh` to improve the process of gathering metrics from validator nodes. Key changes include:

- Replaced SSH commands with a new `exec_on_host` function for executing remote commands, enhancing consistency and reducing timeout complexity.
- Updated the method for fetching block height, network info, mempool status, and error logs to utilize local node paths for better compatibility with local deployments.
- Improved the extraction of parent height from logs to ensure accurate reporting.
- Added a note in the dashboard output to indicate when F3 is disabled for local development.

These enhancements improve the reliability and clarity of metrics reporting in the IPC subnet manager.
This commit refactors the `get_chain_id` function in `lib/health.sh` to replace SSH commands with the `exec_on_host` function for executing remote commands. This change enhances consistency and simplifies the process of querying the Ethereum chain ID via JSON-RPC, improving the overall reliability of the health check functionality in the IPC subnet manager.
@karlem
Copy link
Contributor

karlem commented Nov 19, 2025

The changes on current files looks good!

…onality

This commit modifies the `ipc-subnet-config-local.yml` to update the subnet ID and parent contract addresses for better alignment with local deployment requirements. Additionally, it refactors the `check_validator_health` function in `lib/health.sh` to enhance the process of checking validator health by replacing SSH commands with the `exec_on_host` function, improving consistency and reliability in health checks. These changes streamline the configuration and monitoring of validators in the IPC subnet manager.
This commit updates the Logstash configuration in `ipc-logs.conf` to extract the hostname before cleanup, allowing for the use of a new field `validator_hostname` in the index name. This change improves the organization of logs by ensuring that the index is named consistently based on the validator's hostname, enhancing log management and retrieval.
This commit updates the `draw_dashboard` function in `dashboard.sh` to calculate the expected number of peers based on the count of validators, excluding the self-validator. This change enhances the accuracy of the network health status displayed in the dashboard, improving overall monitoring capabilities.
This commit updates the `fetch_metrics` function in `dashboard.sh` to include the fetching of the mempool maximum size from the CometBFT configuration. The maximum size is now dynamically set if not already defined, improving the accuracy of mempool metrics displayed in the dashboard. Additionally, the default value for `mempool_max` is adjusted to align with this change, enhancing overall monitoring capabilities.
This commit updates the `monitor-parent-finality-simple.sh` script to enhance the method of extracting finality information from logs. The previous use of `grep -P` has been replaced with a combination of `grep` and `sed` for better portability. This change ensures more reliable parsing of log entries, improving the accuracy of finality reporting in the monitoring process.
…er function

This commit updates the `set_federated_power` function in `lib/health.sh` to dynamically determine the `--from` address for transactions based on the primary validator's private key. If the address is not specified in the configuration, it derives the address from known Anvil accounts, improving flexibility and reducing configuration errors. Additionally, it logs the address being used for transactions, enhancing visibility during execution.
This commit modifies the `bottomup_enabled` method in `lib.rs` to return true by default when the bottom-up configuration is not specified. This change aligns with the intended default behavior of enabling bottom-up checkpointing, enhancing the clarity and consistency of the settings implementation.
This commit adds a new example environment file `.env.example` for the IPC faucet, providing a template for users to configure their environment variables. It also updates the `.gitignore` to exclude `.env` files containing sensitive credentials and removes the existing `.env` file to enhance security. Additionally, a README.md file is introduced to guide users on setting up and running the faucet application.
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}
- GF_INSTALL_PLUGINS=grafana-elasticsearch-datasource
- GF_SERVER_ROOT_URL=http://localhost:3000
- GF_USERS_ALLOW_SIGN_UP=false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grafana missing ELASTIC_PASSWORD environment variable for datasource

Medium Severity

The Grafana Elasticsearch datasource provisioning file references ${ELASTIC_PASSWORD} for basicAuthPassword, but the Grafana container's environment section in docker-compose.yml does not include ELASTIC_PASSWORD. Grafana only expands environment variables that are available to its process. Since this variable is missing, the datasource authentication will fail, and Grafana won't be able to connect to Elasticsearch.

Additional Locations (1)

Fix in Cursor Fix in Web

…amic prompts

This commit updates the `clear-mempool.sh` script to accept command-line parameters for the validator IP and SSH user, defaulting to prompts if not provided. It improves user experience by ensuring required inputs are validated and dynamically retrieves the script directory for better usability when referencing the subnet manager. These changes streamline the process of diagnosing and clearing stuck transactions in the IPC subnet mempool.
console.log(` To: ${tx.to}`)
console.log(` Value: ${ethers.formatEther(tx.value || 0)} tFIL`)
console.log(` Nonce: ${parseInt(tx.nonce)}`)
console.log(` Status: ${receipt.status === 1 ? '✅ Success' : '❌ Failed'}`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential null reference when accessing transaction receipt status

Low Severity

The getTransactionReceipt call can return null in ethers.js v6, but the code accesses receipt.status directly without a null check. If the receipt is unavailable (due to timing issues or RPC inconsistencies), this will throw a TypeError: Cannot read properties of null (reading 'status'). The error is caught by the outer try-catch, but results in a misleading "Could not fetch recent transactions" message instead of properly handling the null receipt case.

Fix in Cursor Fix in Web

This commit updates the `elk-manager.sh` script to introduce a new command for deleting entire Elasticsearch indices older than a specified number of days, alongside improvements to the existing delete-old-logs command. The script now provides clearer warnings about the destructive nature of the new command and enhances user guidance with examples. Additionally, it refines log messages for better clarity during operations, improving overall usability and safety in managing ELK stack logs.
…t manager path

This commit modifies the `elk-manager.sh` script to allow the IPC subnet manager configuration path to be set via an environment variable, enhancing flexibility. It updates the filebeat status check to use this variable, providing clearer error messages and guidance for users. Additionally, it improves logging to indicate the configuration file being used, streamlining the management of IPC subnet configurations.
@phutchins phutchins changed the title Subnet management script and node init config fixes feat: subnet management script and node init config fixes Jan 14, 2026
if [ -f "$ELK_DIR/.env" ]; then
log_warn ".env file already exists. Skipping creation."
return 0
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup script fails on re-runs due to missing password

Medium Severity

When running setup-central-server.sh a second time, setup_env_file() returns early at line 76 without setting the ELASTIC_PASSWORD shell variable. Later, wait_for_services() at line 181 uses ${ELASTIC_PASSWORD:-changeme}, which falls back to the wrong password "changeme" instead of the actual password stored in .env. This causes the Elasticsearch health checks to fail with authentication errors, making the script unusable for re-runs even though it's designed to handle them.

Additional Locations (1)

Fix in Cursor Fix in Web

Adjust rustfmt alignment of inline comments for listen_ip configuration
to match project formatting standards.
Replace map_or with is_none_or as suggested by clippy lint.
This is more idiomatic and clearer in intent.
This commit addresses multiple SSH-related problems in the IPC manager when running in local mode. It replaces direct SSH calls with an abstraction layer function, ensuring commands execute locally without attempting SSH connections to localhost. Key functions affected include node management and subnet deployment, enhancing the overall functionality and reliability of the IPC manager in local environments. Additionally, new documentation files have been created to detail the fixes and verification steps.
This commit introduces a complete summary of fixes for SSH-related issues in the IPC manager, enabling full functionality in local mode on macOS. Key changes include the replacement of direct SSH calls with an abstraction layer, restoration of the `deploy_subnet` function, and updates to port checking logic for macOS compatibility. Additionally, new documentation files have been created to detail the fixes, verification steps, and technical changes, ensuring a smoother developer experience and improved command reliability in local environments.
echo " - This will clear ALL pending transactions"
echo " - You'll need to resubmit any valid transactions"
echo " - Command:"
echo " ssh $SSH_USER@$VALIDATOR_IP 'sudo systemctl stop cometbft && rm -rf ~/.cometbft/data/mempool.wal && sudo systemctl start cometbft'"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mempool script uses incorrect paths and service name

Medium Severity

The script uses ~/.cometbft/data/mempool.wal and service name cometbft, but other files in this PR consistently show the IPC node home is ~/.ipc-node/ with CometBFT data at ~/.ipc-node/cometbft/data/, and the systemd service is ipc-node.service. This causes the script to silently fail to clear the mempool (commands have || true), potentially misleading users into thinking the operation succeeded.

Additional Locations (1)

Fix in Cursor Fix in Web

…ager

This commit introduces three new documentation files that clarify the differences between Chain ID and Subnet ID, address display issues in the IPC manager, and provide guidance on configuration and verification. Key updates include improved logging for chain ID queries, clear differentiation between parent and subnet chain IDs, and recommendations for production deployments. These enhancements aim to streamline the developer experience and prevent potential confusion in local and production environments.
This commit introduces a dedicated configuration option for subnet chain IDs in the IPC manager, addressing issues with chain ID collisions between parent and subnet networks. Key changes include the addition of a `chain_id` field in `ipc-subnet-config-local.yml`, updates to the `deploy_subnet()` function to utilize this configuration, and the creation of a Python utility for calculating chain IDs. These enhancements improve clarity, security, and usability for developers working with IPC subnets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants