You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: metric_monitor/push_mode/README.md
+66-44Lines changed: 66 additions & 44 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,44 +1,44 @@
1
1
# Use Prometheus Remote Write with Thanos to Monitor java-tron Node
2
2
3
-
In this document, we will introduce how to use Prometheus remote-write to monitor java-tron node more securely.
3
+
In this document, we will introduce how to use Prometheus remote-write to monitor a java-tron node more securely.
4
4
5
5
## Background
6
6
The previous [README](../README.md) explains how to monitor a java-tron node using Grafana and Prometheus. It can be illustrated by the image below:
7
7

8
-
Basically, the Prometheus service pulls metrics from java-tron node through an exposed port. Subsequently, Grafana retrieves these metrics from Prometheus to provide visualized insights and alerts.
8
+
Basically, the Prometheus service pulls metrics from the java-tron node through an exposed port. Subsequently, Grafana retrieves these metrics from Prometheus to provide visualized insights and alerts.
9
9
10
10
There are some limitations to this approach. From a security perspective, it is essential to separate java-tron services and monitoring services into different network zones. Specifically, we need to isolate java-tron nodes, especially SR nodes, from external exposure to reduce risks such as Denial of Service (DoS) attacks. However, monitoring metrics and similar indicators of TRON blockchain status can be made more accessible to a broader range of users.
11
-
To address these concerns, we need to change the pull mode either from java-tron or Prometheus service to push mode. Refer to Prometheus official documentation of ["Why do you pull rather than push"](https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push) and ["When to use the Pushgateway"](https://prometheus.io/docs/practices/pushing/#when-to-use-the-pushgateway), the best practise for long-live observation target is to use Prometheus pull mode, and put java-tron and Prometheus service in the same failure domain.
11
+
To address these concerns, we need to change the pull mode either from the java-tron or Prometheus service to push mode. Refer to Prometheus official documentation of ["Why do you pull rather than push"](https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push) and ["When to use the Pushgateway"](https://prometheus.io/docs/practices/pushing/#when-to-use-the-pushgateway), the best practice for long-live observation target is to use Prometheus pull mode, and put java-tron and Prometheus service in the same failure domain.
12
12
13
13
### New Architecture
14
14
Given these considerations, we will implement a push mode for the data flow from Prometheus to Grafana. Prometheus offers a **remote-write** feature that supports push mode, facilitating this transition. We have selected [Thanos](https://github.com/thanos-io/thanos) as an intermediate component. Thanos not only supports remote write but also provides additional features such as long-term storage, high availability, and global querying, thereby improving the overall architecture and functionality of our monitoring system.
15
15
16
16
Below is the new architecture of the monitoring system. We will introduce how to set up the Prometheus remote-write feature and Thanos in the following sections.
Before we start, let's list the main components of the monitoring system:
32
-
-**TRON FullNode**: TRON FullNode service with metrics enabled.
33
-
-**Prometheus**: Monitoring service that collects metrics from java-tron node.
34
-
-**Thanos Receive**: A component of Thanos that receives data from Prometheus’s remote write write-ahead log, exposes it, and/or uploads it to cloud storage.
35
-
-**Thanos Query**: A component of Thanos that implements Prometheus’s v1 API to aggregate data from the underlying components.
36
-
-**Grafana**: Visualization service that retrieves metrics from **Thanos Query** to provide visualized insights and alerts.
31
+
The monitoring system consists of:
32
+
-**TRON FullNode**: TRON FullNode service with metrics enabled
33
+
-**Prometheus**: Monitoring service that collects metrics from the java-tron node
34
+
-**Thanos Receive**: A component of Thanos that receives data from Prometheus’s remote write write-ahead log, exposes it, and/or uploads it to cloud storage
35
+
-**Thanos Query**: A component of Thanos that implements Prometheus’s v1 API to aggregate data from the underlying components
36
+
-**Grafana**: Visualization service that retrieves metrics from **Thanos Query** to provide visualized insights and alerts
37
37
38
38
### Step 1: Set up Thanos Receive
39
-
As we can see from the above architecture, Thanos Receive is the intermediate component we need to set up first. The [Thanos Receive](https://thanos.io/tip/components/receive.md/#receiver) service implements the Prometheus Remote Write API. It builds on top of existing Prometheus TSDB and retains its usefulness while extending its functionality with long-term-storage, horizontal scalability, and downsampling.
39
+
As we can see from the above architecture, Thanos Receive is the intermediate component we need to set up first. The [Thanos Receive](https://thanos.io/tip/components/receive.md/#receiver) service implements the Prometheus Remote Write API. It builds on top of the existing Prometheus TSDB and retains its usefulness while extending its functionality with long-term-storage, horizontal scalability, and downsampling.
40
40
41
-
Run below command to start the Thanos Receive and [Minio](https://github.com/minio/minio) service for long-term metric storage:
41
+
Run the below command to start the Thanos Receive and [Minio](https://github.com/minio/minio) service for long-term metric storage:
42
42
```sh
43
43
docker-compose up -d thanos-receive minio
44
44
```
@@ -69,37 +69,37 @@ Core configuration for Thanos Receive in [docker-compose.yml](tmp/docker-compose
69
69
#### Key configuration elements:
70
70
##### 1. Storage configuration
71
71
- Local Storage:
72
-
`./receive-data:/receive/data` maps host directory for metric TSDB storage.
72
+
`./receive-data:/receive/data` maps the host directory for metric TSDB storage.
73
73
- Retention Policy: `--tsdb.retention=15d` auto-purges data older than 15 days. As observed, it takes about 0.5GB of disk space per month for one java-tron(v4.7.6) FullNode connecting Mainnet.
74
74
75
75
- External Storage:
76
-
`./conf:/receive` mounts configuration files. The `--objstore.config-file` flag enables long-term storage in MinIO/S3-compatible buckets. In this case it is [bucket_storage_minio.yml](conf/bucket_storage_minio.yml).
76
+
`./conf:/receive` mounts configuration files. The `--objstore.config-file` flag enables long-term storage in MinIO/S3-compatible buckets. In this case, it is [bucket_storage_minio.yml](conf/bucket_storage_minio.yml).
77
77
- Thanos Receive uploads TSDB blocks to an object storage bucket every 2 hours by default.
78
78
- Fallback Behavior: Omitting this flag keeps data local-only.
79
79
80
80
##### 2. Network configuration
81
81
- Remote Write `--remote-write.address=0.0.0.0:10908`: Receives Prometheus metrics. Prometheus instances are configured to continuously write metrics to it.
82
82
- Thanos Receive exposes the StoreAPI so that Thanos Query can query received metrics in **real-time**.
83
-
- The `ports` combined with flags `--grpc-address, --http-address` expose the ports for Thanos Query service.
83
+
- The `ports` combined with flags `--grpc-address, --http-address` expose the ports for the Thanos Query service.
84
84
- Security Note: `0.0.0.0` means it accepts all incoming connections from any IP address. For production, consider restricting access to specific IP addresses.
85
85
86
86
##### 3. Operational parameters
87
87
88
-
-`--label=receive_replica=.` and `--label=receive_cluster=.`: Cluster labels ensure unique identification in Thanos ecosystem. You could find these labels in Grafana dashboards. You could add any keyvalue pairs as labels.
88
+
-`--label=receive_replica=.` and `--label=receive_cluster=.`: Cluster labels ensure unique identification in the Thanos ecosystem. You could find these labels in Grafana dashboards. You could add any key-value pairs as labels.
For more flags explanation and default value can be found in official [Thanos Receive](https://thanos.io/tip/components/receive.md/#flags) documentation.
92
+
For more flags explanation and default value can be found in the official [Thanos Receive](https://thanos.io/tip/components/receive.md/#flags) documentation.
93
93
94
94
### Step 2: Set up TRON and Prometheus services
95
-
Run below command to start java-tron and Prometheus services:
95
+
Run the below command to start java-tron and Prometheus services:
96
96
```sh
97
97
docker-compose up -d tron-node prometheus
98
98
```
99
99
100
-
Review the [docker-compose.yml](docker-compose.yml) file, the command explanation of java-tron service can be found in [Run Single Node](../single_node/README.md#run-the-container).
100
+
Review the [docker-compose.yml](docker-compose.yml) file, the command explanation of the java-tron service can be found in [Run Single Node](../single_node/README.md#run-the-container).
101
101
102
-
Below are the core configurations for Prometheus service:
102
+
Below are the core configurations for the Prometheus service:
103
103
```yaml
104
104
ports:
105
105
- "9090:9090"# Used for local Prometheus status check
@@ -112,21 +112,21 @@ Below are the core configurations for Prometheus service:
112
112
- "--storage.tsdb.retention.time=30d"
113
113
- "--storage.tsdb.max-block-duration=30m"# The maximum duration for a block of time series data that can be stored in the time series database (TSDB)
114
114
- "--storage.tsdb.min-block-duration=30m"
115
-
- "--web.enable-lifecycle"# Makes Prometheus to expose the /-/reload HTTP endpoints
115
+
- "--web.enable-lifecycle"# Makes Prometheus expose the /-/reload HTTP endpoints
116
116
- "--web.enable-admin-api"
117
117
```
118
118
#### Key configuration elements:
119
119
##### 1. Storage configurations
120
120
- The volumes command `- ./prometheus_data:/prometheus` mounts a local directory used by Prometheus to store metrics data.
121
121
- Although in this case, we use Prometheus with remote-write, it also stores metrics data locally. Through http://localhost:9090/, you can check the running status of the Prometheus service and observe targets.
122
-
- The `--storage.tsdb.retention.time=30d` flag specifies the retention period for the metrics data. Prometheus will automatically delete data older than 30 days. As observed, it takes about 1GB of disk space per month for one java-tron(v4.7.6) FullNode connecting Mainnet. Notice this value is larger than the space need by Thanos Receive for the same period, as there exist compact operations.
122
+
- The `--storage.tsdb.retention.time=30d` flag specifies the retention period for the metrics data. Prometheus will automatically delete data older than 30 days. As observed, it takes about 1GB of disk space per month for one java-tron(v4.7.6) FullNode connecting Mainnet. Notice this value is larger than the space needed by Thanos Receive for the same period, as there exist compact operations.
123
123
- Other storage flags can be found in the [official documentation](https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects). For a quick start, you could use the default values.
124
124
125
125
##### 2. Prometheus remote-write configuration
126
126
127
-
Prometheus configuration file is set to use the [prometheus.yml](conf/prometheus.yml) by volume mapping `./conf/prometheus.yml:...` and flag `--config.file=...`.
128
-
It contains configuration of `scrape_configs` and `remote_write`.
129
-
You need to fill the `url` with the IP address of the Thanos Receive service started at the first step.
127
+
The Prometheus configuration file is set to use the [prometheus.yml](conf/prometheus.yml) by volume mapping `./conf/prometheus.yml:...` and flag `--config.file=...`.
128
+
It contains the configuration of `scrape_configs` and `remote_write`.
129
+
You need to fill the `url` with the IP address of the Thanos Receive service started in the first step.
130
130
Check the official documentation [remote_write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) for all configurations'
131
131
explanation.
132
132
@@ -152,7 +152,7 @@ remote_write:
152
152
send_interval: 3s # How frequently metric metadata is sent to remote Receive.
The `external_labels` defined in Prometheus configuration file are propagated with all metric data to Thanos Receive.
155
+
The `external_labels` defined in the Prometheus configuration file are propagated with all metric data to Thanos Receive.
156
156
It uniquely identifies the Prometheus instance,
157
157
acting as critical tracing metadata that ultimately correlates metrics to their originating java-tron node.
158
158
You can use it in Grafana dashboards using label-based filtering (e.g., `{monitor="java-tron-node1-remote-write"}`).
@@ -162,15 +162,15 @@ You can use it in Grafana dashboards using label-based filtering (e.g., `{monito
162
162
Notice: You could add multiple Prometheus services with remote-write to the same Receive service, just make sure the `external_labels` are unique.
163
163
164
164
### Step 3: Set up Thanos Query
165
-
So far, we have Thanos Receive、Prometheus and java-tron services running. java-tron as the origin produces metrics, then pulled by Prometheus service, which then keeping push metrics to Thanos Receive.
165
+
So far, we have Thanos Receive, Prometheus, and java-tron services running. The java-tron as the origin produces metrics, then pulled by the Prometheus service, which then keeps pushing metrics to Thanos Receive.
166
166
As Grafana cannot directly query Thanos Receive, we need Thanos Query that implements Prometheus’s v1 API to aggregate data from the underlying components.
167
167
168
-
Run below command to start Thanos Query service:
168
+
Run the below command to start the Thanos Query service:
169
169
```sh
170
170
docker-compose up -d querier
171
171
```
172
172
173
-
Below are the core configurations for Thanos Query service:
173
+
Below are the core configurations for the Thanos Query service:
174
174
```yaml
175
175
querier:
176
176
...
@@ -181,10 +181,10 @@ Below are the core configurations for Thanos Query service:
181
181
- query
182
182
- --endpoint.info-timeout=30s
183
183
- --http-address=0.0.0.0:9091
184
-
- --store=[Thanos Receive IP]:10907 # --store: The grpc-address of the Thanos Receive service,if Receive run remotely replace container name "thanos-receive" with the real ip
184
+
- --store=[Thanos Receive IP]:10907 # --store: The grpc-address of the Thanos Receive service,if Receive run remotely replace container name "thanos-receive" with the real IP
185
185
```
186
-
It will set up Thanos Query service
187
-
that listens to port 9091 and queries metrics from Thanos Receive service from `--store=[Thanos Receive IP]:10907`.
186
+
It will set up the Thanos Query service
187
+
that listens to port 9091 and queries metrics from the Thanos Receive service from `--store=[Thanos Receive IP]:10907`.
188
188
Make sure the IP address is correct.
189
189
For more complex usage, please refer to the [official Query document](https://thanos.io/tip/components/query.md/).
190
190
@@ -194,14 +194,14 @@ To start the Grafana service on the host machine, run the following command:
194
194
docker-compose up -d grafana
195
195
```
196
196
Then log in to the Grafana web UI through http://localhost:3000/ or your host machine's IP address. The initial username and password are both `admin`.
197
-
Click the **Connections** on the left side of the main page and select "Data Sources" to configure Grafana data sources. Enter the ip and port of the Query service in URL with `http://[Query service IP]:9091`.
197
+
Click the **Connections** on the left side of the main page and select "Data Sources" to configure Grafana data sources. Enter the IP and port of the Query service in URL with `http://[Query service IP]:9091`.
The commands provided above only tested on Linux and macOS. If you encounter a `KeyError: 'ContainerConfig'` while running on Linux, it may be due to an existing image with the same container name. To resolve this, remove the previously created image and try again. You can list all existing Docker containers by running:
214
-
```bash
215
-
docker ps -a
216
-
```
213
+
### Common Issues
214
+
1. **Container Config Error (Linux)**
215
+
- If you encounter a `KeyError: 'ContainerConfig'`, check for conflicting container names and remove them:
216
+
```bash
217
+
# List all containers
218
+
docker ps -a
219
+
220
+
# Remove conflicting containers
221
+
docker rm [container-name]
222
+
```
223
+
2.**Network Connectivity**
224
+
225
+
- Verify all services can communicate by checking logs:
226
+
```bash
227
+
docker-compose logs [service-name]
228
+
```
229
+
230
+
- Ensure all IP addresses are correctly configured in the compose file
231
+
232
+
3.**Storage Issues**
217
233
218
-
For other challenges during implementation, please raise an issue on [GitHub](https://github.com/tronprotocol/tron-docker/issues) following the provided template guidance.
234
+
- Check available disk space: `df -h`
235
+
- Monitor storage usage in Prometheus, Thanos Receive, and Minio (if used) directories
219
236
220
-
## At the end
237
+
### Getting Help
238
+
For additional support:
239
+
- Raise an issue on [GitHub](https://github.com/tronprotocol/tron-docker/issues)
240
+
- Consult the official Thanos documentation
241
+
- Review Docker logs for specific service issues
221
242
222
-
This guide provides tested solutions that meet specific security requirements. If these configurations do not address your customized monitoring needs, you may want to consult the [official Thanos documentation](https://thanos.io/tip/thanos/quick-tutorial.md/) for more detailed configuration options. Additionally, you can engage with the community on [GitHub](https://github.com/tronprotocol/tron-docker/issues) for further assistance.
243
+
## Conclusion
244
+
This guide provides a secure and scalable solution for monitoring java-tron nodes. For custom configurations beyond this setup, refer to the [official Thanos documentation](https://thanos.io/tip/thanos/quick-tutorial.md/) or engage with the community on [GitHub](https://github.com/tronprotocol/tron-docker/issues) Issue.
0 commit comments