|
| 1 | +# Monitor Java-tron nodes using VictoriaMetrics |
| 2 | +This document aims to facilitate the monitoring of metrics for java-tron nodes through VictoriaMetrics. |
| 3 | + |
| 4 | +## Background |
| 5 | +Currently, Tron's full node and system monitoring infrastructure leverages a Grafana+Prometheus monitoring stack. In this setup, each full node exposes dedicated metrics ports that Prometheus uses to actively pull and store monitoring data. The metrics are then made available in Grafana by configuring Prometheus as a data source through the Grafana UI, using the appropriate IP address and port. |
| 6 | + |
| 7 | +However, exposing specific ports on full nodes for Prometheus data collection poses potential security risks. To address this concern, our research led us to VictoriaMetrics - a solution that enables secure metrics collection through a push-based model, allowing for both efficient data storage and querying capabilities. |
| 8 | + |
| 9 | +## Evolution of Pull Change Push Architecture: |
| 10 | +<img src="../images/metric_pull_to_push.png" alt="Alt Text" width="720" > |
| 11 | + |
| 12 | +## VictoriaMetrics |
| 13 | +VictoriaMetrics is a high-performance, cost-efficient time series database and monitoring solution that excels in scalability and resource optimization. |
| 14 | + |
| 15 | +Key features include: |
| 16 | + |
| 17 | +- **Prometheus Integration** |
| 18 | + - Serves as a long-term storage solution for Prometheus |
| 19 | + - Seamlessly integrates with Grafana as a Prometheus alternative through API compatibility |
| 20 | + |
| 21 | +- **Streamlined Operations** |
| 22 | + - Standalone executable with zero external dependencies |
| 23 | + - Simple configuration via command-line parameters with sensible defaults |
| 24 | + - Centralized data storage in a user-specified directory |
| 25 | + - Built-in backup/restore functionality via vmbackup/vmrestore tools |
| 26 | + |
| 27 | +- **Advanced Querying and Performance** |
| 28 | + - Enhanced PromQL compatibility through MetricsQL |
| 29 | + - Unified querying across multiple data sources via a single interface |
| 30 | + - Superior scalability with up to 20x better performance than InfluxDB/TimescaleDB |
| 31 | + - Exceptional memory efficiency: uses 90% less memory than InfluxDB and 85% less than Prometheus/Thanos/Cortex |
| 32 | + |
| 33 | +- **Storage Optimization** |
| 34 | + - Optimized for high churn rate time series |
| 35 | + - Industry-leading compression: 70x more data points per storage unit vs TimescaleDB |
| 36 | + - 7x reduced storage footprint compared to Prometheus/Thanos/Cortex |
| 37 | + - Optimized for high-latency and low IOPS storage: fully compatible with HDDs and cloud storage solutions (AWS, Google Cloud, Azure, etc.) as validated through comprehensive disk I/O testing |
| 38 | + |
| 39 | +- **Enterprise-Grade Features** |
| 40 | + - Single node deployment can effectively replace medium-sized clusters built with Thanos, M3DB, Cortex, InfluxDB, or TimescaleDB (validated through vertical scaling tests, Thanos comparative analysis, and documented in PromCon 2019) |
| 41 | + - Built-in protection against data corruption from system failures |
| 42 | + - Comprehensive protocol support: |
| 43 | + - Prometheus (exporter metrics, remote write API, exposure format) |
| 44 | + - InfluxDB (HTTP/TCP/UDP) |
| 45 | + - Graphite, OpenTSDB, JSON, CSV, Native Binary |
| 46 | + - DataDog, NewRelic, OpenTelemetry |
| 47 | + - Available in both single-node and cluster editions |
| 48 | + |
| 49 | +The following performance comparison between VictoriaMetrics and Prometheus is based on benchmark testing using `node_exporter` metrics. All measurements reflect single-server deployments for both solutions. |
| 50 | + |
| 51 | +| Feature | Prometheus | VictoriaMetrics | |
| 52 | +|:------------------------:|:---------------------------------:|:-------------------------------------------:| |
| 53 | +| **Data Collection** | Pull-based | Supports both pull-based and push-based | |
| 54 | +| **Data Ingestion** | Up to 240,000 samples per second | Up to 360,000 samples per second | |
| 55 | +| **Query Performance** | Up to 80,000 queries per second | Up to 100,000 queries per second | |
| 56 | +| **Memory Usage** | Up to 14GB RAM | Up to 4.3GB RAM | |
| 57 | +| **Data Compression** | Uses LZF compression | Uses Snappy compression | |
| 58 | +| **Disk Write Frequency** | More frequent data writes to disk | Less frequent data writes to disk | |
| 59 | +| **Disk Space Usage** | Requires more disk space | Requires less disk space | |
| 60 | +| **Query Language** | PromQL | MetricsQL (backward-compatible with PromQL) | |
| 61 | + |
| 62 | +## Deploy VictoriaMetrics |
| 63 | +Follow these steps to deploy a highly available VictoriaMetrics cluster using Docker Compose: |
| 64 | + |
| 65 | +1. **Launch the VictoriaMetrics Cluster** |
| 66 | + - Download the [docker-compose.yml](./victoria-metrics/docker-compose/docker-compose.yml) configuration file. This setup provides high availability with two storage nodes (vmstorage1 and vmstorage2) |
| 67 | + - Start the VictoriaMetrics cluster using the following command: |
| 68 | + ```shell |
| 69 | + # Enter into the directory of docker-compose.yml |
| 70 | + docker-compose up -d |
| 71 | + ``` |
| 72 | + |
| 73 | +2. **Configure Metric Collection** |
| 74 | + - Deploy the java-tron service with metrics enabled. For detailed instructions on enabling java-tron metrics collection, please refer to the [Quick Start Guide](README.md#quick-start). |
| 75 | + - Download the [clusterPush.sh](./victoria-metrics/shell/clusterPush.sh) script and deploy it on the same server as your java-tron service. This script collects java-tron metrics locally and pushes them to remote VictoriaMetrics clusters. Execute it using the following command: |
| 76 | + ```shell |
| 77 | + # Enter into the directory of clusterPush.sh |
| 78 | + # Make the script executable |
| 79 | + chmod +x ./clusterPush.sh |
| 80 | + # Run the script |
| 81 | + ./clusterPush.sh |
| 82 | + ``` |
| 83 | + |
| 84 | +3. **Verify Deployment** |
| 85 | + Test the setup by querying metrics through the vmselect API: |
| 86 | + ```shell |
| 87 | + curl 'http://localhost:8481/select/0/prometheus/api/v1/query?query={metrics}' |
| 88 | + ``` |
| 89 | + Note: The '0' in the URL represents the tenant ID |
| 90 | + |
| 91 | +### Key Storage Node Configurations |
| 92 | + |
| 93 | +The following docker-compose configuration shows the essential settings for VictoriaMetrics storage nodes: |
| 94 | +```yaml |
| 95 | +services: |
| 96 | + vmstorage1: |
| 97 | + image: victoriametrics/vmstorage:latest |
| 98 | + command: |
| 99 | + ... |
| 100 | + ports: |
| 101 | + - "8482:8482" |
| 102 | + - "8400:8400" |
| 103 | + - "8401:8401" |
| 104 | + volumes: |
| 105 | + - ./storage1-data:/vmstorage-data # Data directory: Specify the storage path through |
| 106 | + ... |
| 107 | +
|
| 108 | + vmstorage2: |
| 109 | + image: victoriametrics/vmstorage:latest |
| 110 | + command: |
| 111 | + ... |
| 112 | + ports: |
| 113 | + - "8483:8482" |
| 114 | + - "8402:8400" |
| 115 | + - "8403:8401" |
| 116 | + volumes: |
| 117 | + - ./storage2-data:/vmstorage-data # Data directory: Specify the storage path through |
| 118 | + ... |
| 119 | +``` |
| 120 | +- **Volume Configuration**: |
| 121 | + - Storage Path: Configure the data directory using `-storageDataPath` |
| 122 | + - Best Practice: Use persistent volumes for reliable data storage |
| 123 | + |
| 124 | +- **Port Configuration**: |
| 125 | + - 8480: Write port for cluster version (accepts Prometheus data) |
| 126 | + - 8481: Query port for cluster version |
| 127 | + |
| 128 | +## Integrate with Grafana |
| 129 | +After successfully pushing java-tron metrics to the VictoriaMetrics cluster, follow these steps to configure Grafana for accessing the metrics data: |
| 130 | + |
| 131 | +1. **Verify Network Connectivity** |
| 132 | + - Ensure your Grafana instance can reach the VictoriaMetrics service |
| 133 | + - Test connectivity by accessing the VictoriaMetrics query API endpoint |
| 134 | + |
| 135 | +2. **Add VictoriaMetrics Data Source** |
| 136 | + - Navigate to Configuration > Data Sources in Grafana |
| 137 | + - Click "Add data source" |
| 138 | + |
| 139 | +<img src="../images/grafana_add_datasource.png" alt="Alt Text" width="720" > |
| 140 | + |
| 141 | +3. **Select Data Source Type** |
| 142 | + - Choose "Prometheus" as the data source type (VictoriaMetrics is Prometheus-compatible) |
| 143 | + |
| 144 | +<img src="../images/grafana_select_datasource.png" alt="Alt Text" width="720" > |
| 145 | + |
| 146 | +4. **Configure Data Source Settings** |
| 147 | + - Enter the VictoriaMetrics URL (default query port: 8481) |
| 148 | + - Example URL format: `http://<victoriametrics-host>:8481` |
| 149 | + - Click "Save & Test" to verify the connection |
| 150 | + |
| 151 | +<img src="../images/grafana_cluster.png" alt="Alt Text" width="720" > |
| 152 | + |
| 153 | +5. **Verify Data Flow** |
| 154 | + - Open any Grafana dashboard panel |
| 155 | + - Click Edit > Query |
| 156 | + - Select the VictoriaMetrics data source |
| 157 | + - Check if metrics data appears in the graph |
| 158 | + |
| 159 | +<img src="../images/grafana_verify_local.png" alt="Alt Text" width="720" > |
0 commit comments