Skip to content

ChiratheRobotics/ROS-2-Logging-and-Monitoring-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ROS 2 Health Monitoring System

ROS 2 Docker License

A comprehensive, production-ready health monitoring and diagnostics system for ROS 2 robotics applications. This system provides real-time monitoring of ROS graph health, topic statistics, system metrics, and centralized logging with alerting capabilities.

πŸ“‹ Table of Contents

Overview

The ROS 2 Health Monitoring System is designed to provide enterprise-grade observability for robotics applications. It integrates seamlessly with ROS 2 Humble and provides:

  • Real-time ROS Graph Monitoring - Detect missing nodes, orphaned topics, QoS mismatches
  • Topic Statistics - Monitor message rates, latency, and deadline compliance
  • System Metrics - CPU, memory, disk, network via Telegraf
  • Centralized Logging - Aggregate ROS logs with Fluent Bit and Loki
  • Visualization - Pre-configured Grafana dashboards
  • System Control - Remote shutdown/reboot via ROS 2 services
  • Alerting - Slack/Teams notifications for critical issues

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           ROS 2 NETWORK                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚  β”‚  Your    β”‚  β”‚  Your    β”‚  β”‚  Your    β”‚  β”‚  Your    β”‚                     β”‚
β”‚  β”‚  Node 1  β”‚  β”‚  Node 2  β”‚  β”‚  Node 3  β”‚  β”‚  Node N  β”‚                     β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                     β”‚
β”‚       β”‚             β”‚             β”‚             β”‚                            β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β”‚                            β”‚                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚  β”‚              HEALTH MONITORING STACK               β”‚                      β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚                      β”‚
β”‚  β”‚  β”‚ rosgraph_monitorβ”‚  β”‚    diagnostic_bridge    β”‚ β”‚                      β”‚
β”‚  β”‚  β”‚  - Node health  β”‚  β”‚  - /diagnostics β†’ DB    β”‚ β”‚                      β”‚
β”‚  β”‚  β”‚  - Topic health β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚                      β”‚
β”‚  β”‚  β”‚  - QoS checks   β”‚              β”‚               β”‚                      β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚               β”‚                      β”‚
β”‚  β”‚           β”‚                       β”‚               β”‚                      β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚                      β”‚
β”‚  β”‚  β”‚ telegraf_bridge β”‚  β”‚  system_control_node    β”‚ β”‚                      β”‚
β”‚  β”‚  β”‚ - Topic stats   β”‚  β”‚  - Shutdown/Reboot API  β”‚ β”‚                      β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚                      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              β”‚              OBSERVABILITY STACK (Docker)                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                  β”‚
β”‚  β”‚       Telegraf        │──────────┐                                       β”‚
β”‚  β”‚  - System metrics     β”‚          β”‚                                       β”‚
β”‚  β”‚  - ROS topic stats    β”‚          β”‚                                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚                                       β”‚
β”‚                                     β–Ό                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚      Fluent Bit       β”‚  β”‚   InfluxDB    β”‚  β”‚       Grafana         β”‚   β”‚
β”‚  β”‚  - ROS log collection β”‚  β”‚  - Metrics DB │◄──  - Dashboards         β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  - Alerts             β”‚   β”‚
β”‚              β”‚                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚              β–Ό                                           β–²                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                               β”‚                  β”‚
β”‚  β”‚         Loki          β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚  β”‚  - Log aggregation    β”‚                                                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Features

Feature Description
πŸ” Graph Monitoring Real-time detection of missing nodes, leaf topics, dead sinks
πŸ“Š Topic Statistics Message rates, latency percentiles, deadline compliance
πŸ–₯️ System Metrics CPU, memory, disk, network, GPU (Jetson)
πŸ“ Log Aggregation Centralized ROS logs with search and filtering
πŸ“ˆ Dashboards Pre-built Grafana dashboards for all metrics
πŸ”” Alerting Slack/Teams notifications for anomalies
πŸ”„ System Control Remote shutdown/reboot via ROS 2 service
🐳 Docker Ready Single-command deployment with Docker Compose
πŸ€– Jetson Support Optimized for NVIDIA Jetson platforms

Quick Start

Prerequisites

  • ROS 2 Humble
  • Docker & Docker Compose
  • Python 3.10+

Installation

# Clone the repository
git clone https://github.com/jj7258/ros_diag_ws.git
cd ros_diag_ws

# Build the workspace
colcon build --symlink-install

# Source the workspace
source install/setup.bash

# Set RMW environment (required)
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
export RMW_IMPLEMENTATION_WRAPPER=rmw_stats_shim

# Deploy the monitoring stack
cd docker
./deploy.sh

Access Services

Service URL Credentials
Grafana http://localhost:3000 admin / admin
InfluxDB http://localhost:8086 admin / admin
Loki http://localhost:3100 -

Launch Health Monitoring

# Launch all monitoring nodes
ros2 launch ros_health_app health_monitoring.launch.py

# With rosbag recording enabled
ros2 launch ros_health_app health_monitoring.launch.py enable_rosbag:=true

Documentation

Detailed documentation is available in the docs/ directory:

Document Description
Architecture System design and component details
Installation Detailed setup instructions
Configuration Configuration options and tuning
API Reference ROS 2 interfaces and services
Deployment Docker and production deployment
Troubleshooting Common issues and solutions

Project Structure

ros_diag_ws/
β”œβ”€β”€ config/                     # Shared ROS configuration files
β”‚   β”œβ”€β”€ rosgraph_config.yaml   # Graph monitor configuration
β”‚   └── heartbeat_config.yaml  # Heartbeat settings
β”œβ”€β”€ docker/                     # Docker infrastructure
β”‚   β”œβ”€β”€ compose/               # Docker Compose files
β”‚   β”œβ”€β”€ ros/                   # ROS container files
β”‚   β”œβ”€β”€ telegraf/              # Telegraf configuration
β”‚   β”œβ”€β”€ grafana/               # Grafana provisioning
β”‚   β”œβ”€β”€ fluentbit/             # Fluent Bit configuration
β”‚   └── deploy.sh              # Deployment script
β”œβ”€β”€ docs/                       # Documentation
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ interfaces/            # ROS 2 message/service definitions
β”‚   β”‚   β”œβ”€β”€ system_interfaces/ # System control interfaces
β”‚   β”‚   β”œβ”€β”€ rosgraph_monitor_msgs/
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ core/                  # Core monitoring components
β”‚   β”‚   β”œβ”€β”€ rosgraph_monitor/  # ROS graph health monitor
β”‚   β”‚   β”œβ”€β”€ telegraf_bridge/   # Topic stats to Telegraf
β”‚   β”‚   β”œβ”€β”€ diagnostic_bridge/ # Diagnostics to InfluxDB
β”‚   β”‚   └── rmw_stats_shim/    # RMW statistics wrapper
β”‚   β”œβ”€β”€ apps/                  # Application packages
β”‚   β”‚   └── ros_health_app/    # Main launch files
β”‚   └── rmw_implementation/    # Custom RMW wrapper
└── README.md

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Need help? Open an issue or check the Troubleshooting Guide.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published