A comprehensive, production-ready health monitoring and diagnostics system for ROS 2 robotics applications. This system provides real-time monitoring of ROS graph health, topic statistics, system metrics, and centralized logging with alerting capabilities.
The ROS 2 Health Monitoring System is designed to provide enterprise-grade observability for robotics applications. It integrates seamlessly with ROS 2 Humble and provides:
- Real-time ROS Graph Monitoring - Detect missing nodes, orphaned topics, QoS mismatches
- Topic Statistics - Monitor message rates, latency, and deadline compliance
- System Metrics - CPU, memory, disk, network via Telegraf
- Centralized Logging - Aggregate ROS logs with Fluent Bit and Loki
- Visualization - Pre-configured Grafana dashboards
- System Control - Remote shutdown/reboot via ROS 2 services
- Alerting - Slack/Teams notifications for critical issues
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ROS 2 NETWORK β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Your β β Your β β Your β β Your β β
β β Node 1 β β Node 2 β β Node 3 β β Node N β β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β β β
β βββββββββββββββ΄βββββββ¬βββββββ΄ββββββββββββββ β
β β β
β βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ β
β β HEALTH MONITORING STACK β β
β β βββββββββββββββββββ βββββββββββββββββββββββββββ β β
β β β rosgraph_monitorβ β diagnostic_bridge β β β
β β β - Node health β β - /diagnostics β DB β β β
β β β - Topic health β βββββββββββββ¬ββββββββββββββ β β
β β β - QoS checks β β β β
β β ββββββββββ¬βββββββββ β β β
β β β β β β
β β ββββββββββ΄βββββββββ βββββββββββββ΄ββββββββββββββ β β
β β β telegraf_bridge β β system_control_node β β β
β β β - Topic stats β β - Shutdown/Reboot API β β β
β β ββββββββββ¬βββββββββ βββββββββββββββββββββββββββ β β
β βββββββββββββΌββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β OBSERVABILITY STACK (Docker) β
β βββββββββββββΌββββββββββββ β
β β Telegraf ββββββββββββ β
β β - System metrics β β β
β β - ROS topic stats β β β
β βββββββββββββββββββββββββ β β
β βΌ β
β βββββββββββββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββ β
β β Fluent Bit β β InfluxDB β β Grafana β β
β β - ROS log collection β β - Metrics DB ββββ€ - Dashboards β β
β βββββββββββββ¬ββββββββββββ βββββββββββββββββ β - Alerts β β
β β βββββββββββββββββββββββββ β
β βΌ β² β
β βββββββββββββββββββββββββ β β
β β Loki βββββββββββββββββββββββββββββββββ β
β β - Log aggregation β β
β βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Feature | Description |
|---|---|
| π Graph Monitoring | Real-time detection of missing nodes, leaf topics, dead sinks |
| π Topic Statistics | Message rates, latency percentiles, deadline compliance |
| π₯οΈ System Metrics | CPU, memory, disk, network, GPU (Jetson) |
| π Log Aggregation | Centralized ROS logs with search and filtering |
| π Dashboards | Pre-built Grafana dashboards for all metrics |
| π Alerting | Slack/Teams notifications for anomalies |
| π System Control | Remote shutdown/reboot via ROS 2 service |
| π³ Docker Ready | Single-command deployment with Docker Compose |
| π€ Jetson Support | Optimized for NVIDIA Jetson platforms |
- ROS 2 Humble
- Docker & Docker Compose
- Python 3.10+
# Clone the repository
git clone https://github.com/jj7258/ros_diag_ws.git
cd ros_diag_ws
# Build the workspace
colcon build --symlink-install
# Source the workspace
source install/setup.bash
# Set RMW environment (required)
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
export RMW_IMPLEMENTATION_WRAPPER=rmw_stats_shim
# Deploy the monitoring stack
cd docker
./deploy.sh| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin |
| InfluxDB | http://localhost:8086 | admin / admin |
| Loki | http://localhost:3100 | - |
# Launch all monitoring nodes
ros2 launch ros_health_app health_monitoring.launch.py
# With rosbag recording enabled
ros2 launch ros_health_app health_monitoring.launch.py enable_rosbag:=trueDetailed documentation is available in the docs/ directory:
| Document | Description |
|---|---|
| Architecture | System design and component details |
| Installation | Detailed setup instructions |
| Configuration | Configuration options and tuning |
| API Reference | ROS 2 interfaces and services |
| Deployment | Docker and production deployment |
| Troubleshooting | Common issues and solutions |
ros_diag_ws/
βββ config/ # Shared ROS configuration files
β βββ rosgraph_config.yaml # Graph monitor configuration
β βββ heartbeat_config.yaml # Heartbeat settings
βββ docker/ # Docker infrastructure
β βββ compose/ # Docker Compose files
β βββ ros/ # ROS container files
β βββ telegraf/ # Telegraf configuration
β βββ grafana/ # Grafana provisioning
β βββ fluentbit/ # Fluent Bit configuration
β βββ deploy.sh # Deployment script
βββ docs/ # Documentation
βββ src/
β βββ interfaces/ # ROS 2 message/service definitions
β β βββ system_interfaces/ # System control interfaces
β β βββ rosgraph_monitor_msgs/
β β βββ ...
β βββ core/ # Core monitoring components
β β βββ rosgraph_monitor/ # ROS graph health monitor
β β βββ telegraf_bridge/ # Topic stats to Telegraf
β β βββ diagnostic_bridge/ # Diagnostics to InfluxDB
β β βββ rmw_stats_shim/ # RMW statistics wrapper
β βββ apps/ # Application packages
β β βββ ros_health_app/ # Main launch files
β βββ rmw_implementation/ # Custom RMW wrapper
βββ README.md
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Need help? Open an issue or check the Troubleshooting Guide.