Skip to content

Add automated alert deployment for Cosmos chain monitoring#497

Merged
dasanchez merged 1 commit intomainfrom
alerting
Oct 30, 2025
Merged

Add automated alert deployment for Cosmos chain monitoring#497
dasanchez merged 1 commit intomainfrom
alerting

Conversation

@fastfadingviolets
Copy link
Contributor

Implement automated alertmanager configuration deployment using fragment-based approach for multi-environment support. This enables idempotent alert rule and notification management across testnet/mainnet deployments.

Changes:

  • Add alert rules for chain health, relayer health, and system monitoring
  • Implement fragment-based alertmanager.yml assembly for extensibility
  • Add handlers for prometheus and alertmanager reload/restart
  • Configure Matrix webhook integration with vault-encrypted secrets
  • Support environment-specific alert routing (testnet/mainnet)
  • Add tags for selective alert deployment

Alert rules include:

  • ValidatorProcessDown, ChainNotProducingBlocks, LowPeerCount
  • RelayerProcessDown, RelayerWalletLowBalance
  • System alerts: CPU, memory, disk, service crashes
  • Blackbox probes and SSL certificate monitoring

The fragment approach allows future PagerDuty integration without disrupting existing Matrix notifications.

🤖 Generated with Claude Code

Implement automated alertmanager configuration deployment using fragment-based
approach for multi-environment support. This enables idempotent alert rule and
notification management across testnet/mainnet deployments.

Changes:
- Add alert rules for chain health, relayer health, and system monitoring
- Implement fragment-based alertmanager.yml assembly for extensibility
- Add handlers for prometheus and alertmanager reload/restart
- Configure Matrix webhook integration with vault-encrypted secrets
- Support environment-specific alert routing (testnet/mainnet)
- Add tags for selective alert deployment

Alert rules include:
- ValidatorProcessDown, ChainNotProducingBlocks, LowPeerCount
- RelayerProcessDown, RelayerWalletLowBalance
- System alerts: CPU, memory, disk, service crashes
- Blackbox probes and SSL certificate monitoring

The fragment approach allows future PagerDuty integration without disrupting
existing Matrix notifications.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dasanchez dasanchez merged commit 8150671 into main Oct 30, 2025
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants