Skip to content

Commit 96f25e6

Browse files
author
Sunny Jiao
committed
fix grama and format
1 parent c98c5b7 commit 96f25e6

File tree

1 file changed

+66
-44
lines changed

1 file changed

+66
-44
lines changed

metric_monitor/push_mode/README.md

Lines changed: 66 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,44 @@
11
# Use Prometheus Remote Write with Thanos to Monitor java-tron Node
22

3-
In this document, we will introduce how to use Prometheus remote-write to monitor java-tron node more securely.
3+
In this document, we will introduce how to use Prometheus remote-write to monitor a java-tron node more securely.
44

55
## Background
66
The previous [README](../README.md) explains how to monitor a java-tron node using Grafana and Prometheus. It can be illustrated by the image below:
77
![image](../../images/metric_pull_simple.png)
8-
Basically, the Prometheus service pulls metrics from java-tron node through an exposed port. Subsequently, Grafana retrieves these metrics from Prometheus to provide visualized insights and alerts.
8+
Basically, the Prometheus service pulls metrics from the java-tron node through an exposed port. Subsequently, Grafana retrieves these metrics from Prometheus to provide visualized insights and alerts.
99

1010
There are some limitations to this approach. From a security perspective, it is essential to separate java-tron services and monitoring services into different network zones. Specifically, we need to isolate java-tron nodes, especially SR nodes, from external exposure to reduce risks such as Denial of Service (DoS) attacks. However, monitoring metrics and similar indicators of TRON blockchain status can be made more accessible to a broader range of users.
11-
To address these concerns, we need to change the pull mode either from java-tron or Prometheus service to push mode. Refer to Prometheus official documentation of ["Why do you pull rather than push"](https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push) and ["When to use the Pushgateway"](https://prometheus.io/docs/practices/pushing/#when-to-use-the-pushgateway), the best practise for long-live observation target is to use Prometheus pull mode, and put java-tron and Prometheus service in the same failure domain.
11+
To address these concerns, we need to change the pull mode either from the java-tron or Prometheus service to push mode. Refer to Prometheus official documentation of ["Why do you pull rather than push"](https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push) and ["When to use the Pushgateway"](https://prometheus.io/docs/practices/pushing/#when-to-use-the-pushgateway), the best practice for long-live observation target is to use Prometheus pull mode, and put java-tron and Prometheus service in the same failure domain.
1212

1313
### New Architecture
1414
Given these considerations, we will implement a push mode for the data flow from Prometheus to Grafana. Prometheus offers a **remote-write** feature that supports push mode, facilitating this transition. We have selected [Thanos](https://github.com/thanos-io/thanos) as an intermediate component. Thanos not only supports remote write but also provides additional features such as long-term storage, high availability, and global querying, thereby improving the overall architecture and functionality of our monitoring system.
1515

1616
Below is the new architecture of the monitoring system. We will introduce how to set up the Prometheus remote-write feature and Thanos in the following sections.
1717
![image](../../images/metric_push_with_thanos.png)
1818

19-
## Use Prometheus remote write with Thanos guidance
19+
## Implementation Guide
2020
This section introduces the steps of setting up Prometheus remote write with Thanos.
2121

2222
### Prerequisites
23-
24-
- Docker and Docker Compose: Installation refers [prerequisites](../../README.md#prerequisites).
25-
- Clone the tron-docker repository, then navigate to the `push_mode` directory.
23+
Before starting, ensure you have:
24+
- Docker and Docker Compose installed (refer to [prerequisites](../../README.md#prerequisites))
25+
- The tron-docker repository cloned locally
2626
```sh
2727
git clone https://github.com/tronprotocol/tron-docker.git
2828
cd tron-docker/metric_monitor/push_mode
2929
```
3030
### Main components
31-
Before we start, let's list the main components of the monitoring system:
32-
- **TRON FullNode**: TRON FullNode service with metrics enabled.
33-
- **Prometheus**: Monitoring service that collects metrics from java-tron node.
34-
- **Thanos Receive**: A component of Thanos that receives data from Prometheus’s remote write write-ahead log, exposes it, and/or uploads it to cloud storage.
35-
- **Thanos Query**: A component of Thanos that implements Prometheus’s v1 API to aggregate data from the underlying components.
36-
- **Grafana**: Visualization service that retrieves metrics from **Thanos Query** to provide visualized insights and alerts.
31+
The monitoring system consists of:
32+
- **TRON FullNode**: TRON FullNode service with metrics enabled
33+
- **Prometheus**: Monitoring service that collects metrics from the java-tron node
34+
- **Thanos Receive**: A component of Thanos that receives data from Prometheus’s remote write write-ahead log, exposes it, and/or uploads it to cloud storage
35+
- **Thanos Query**: A component of Thanos that implements Prometheus’s v1 API to aggregate data from the underlying components
36+
- **Grafana**: Visualization service that retrieves metrics from **Thanos Query** to provide visualized insights and alerts
3737

3838
### Step 1: Set up Thanos Receive
39-
As we can see from the above architecture, Thanos Receive is the intermediate component we need to set up first. The [Thanos Receive](https://thanos.io/tip/components/receive.md/#receiver) service implements the Prometheus Remote Write API. It builds on top of existing Prometheus TSDB and retains its usefulness while extending its functionality with long-term-storage, horizontal scalability, and downsampling.
39+
As we can see from the above architecture, Thanos Receive is the intermediate component we need to set up first. The [Thanos Receive](https://thanos.io/tip/components/receive.md/#receiver) service implements the Prometheus Remote Write API. It builds on top of the existing Prometheus TSDB and retains its usefulness while extending its functionality with long-term-storage, horizontal scalability, and downsampling.
4040

41-
Run below command to start the Thanos Receive and [Minio](https://github.com/minio/minio) service for long-term metric storage:
41+
Run the below command to start the Thanos Receive and [Minio](https://github.com/minio/minio) service for long-term metric storage:
4242
```sh
4343
docker-compose up -d thanos-receive minio
4444
```
@@ -69,37 +69,37 @@ Core configuration for Thanos Receive in [docker-compose.yml](tmp/docker-compose
6969
#### Key configuration elements:
7070
##### 1. Storage configuration
7171
- Local Storage:
72-
`./receive-data:/receive/data` maps host directory for metric TSDB storage.
72+
`./receive-data:/receive/data` maps the host directory for metric TSDB storage.
7373
- Retention Policy: `--tsdb.retention=15d` auto-purges data older than 15 days. As observed, it takes about 0.5GB of disk space per month for one java-tron(v4.7.6) FullNode connecting Mainnet.
7474

7575
- External Storage:
76-
`./conf:/receive` mounts configuration files. The `--objstore.config-file` flag enables long-term storage in MinIO/S3-compatible buckets. In this case it is [bucket_storage_minio.yml](conf/bucket_storage_minio.yml).
76+
`./conf:/receive` mounts configuration files. The `--objstore.config-file` flag enables long-term storage in MinIO/S3-compatible buckets. In this case, it is [bucket_storage_minio.yml](conf/bucket_storage_minio.yml).
7777
- Thanos Receive uploads TSDB blocks to an object storage bucket every 2 hours by default.
7878
- Fallback Behavior: Omitting this flag keeps data local-only.
7979

8080
##### 2. Network configuration
8181
- Remote Write `--remote-write.address=0.0.0.0:10908`: Receives Prometheus metrics. Prometheus instances are configured to continuously write metrics to it.
8282
- Thanos Receive exposes the StoreAPI so that Thanos Query can query received metrics in **real-time**.
83-
- The `ports` combined with flags `--grpc-address, --http-address` expose the ports for Thanos Query service.
83+
- The `ports` combined with flags `--grpc-address, --http-address` expose the ports for the Thanos Query service.
8484
- Security Note: `0.0.0.0` means it accepts all incoming connections from any IP address. For production, consider restricting access to specific IP addresses.
8585

8686
##### 3. Operational parameters
8787

88-
- `--label=receive_replica=.` and `--label=receive_cluster=.`: Cluster labels ensure unique identification in Thanos ecosystem. You could find these labels in Grafana dashboards. You could add any key value pairs as labels.
88+
- `--label=receive_replica=.` and `--label=receive_cluster=.`: Cluster labels ensure unique identification in the Thanos ecosystem. You could find these labels in Grafana dashboards. You could add any key-value pairs as labels.
8989

9090
<img src="../../images/metric_pull_receive_label.png" alt="Alt Text" width="880" >
9191

92-
For more flags explanation and default value can be found in official [Thanos Receive](https://thanos.io/tip/components/receive.md/#flags) documentation.
92+
For more flags explanation and default value can be found in the official [Thanos Receive](https://thanos.io/tip/components/receive.md/#flags) documentation.
9393

9494
### Step 2: Set up TRON and Prometheus services
95-
Run below command to start java-tron and Prometheus services:
95+
Run the below command to start java-tron and Prometheus services:
9696
```sh
9797
docker-compose up -d tron-node prometheus
9898
```
9999

100-
Review the [docker-compose.yml](docker-compose.yml) file, the command explanation of java-tron service can be found in [Run Single Node](../single_node/README.md#run-the-container).
100+
Review the [docker-compose.yml](docker-compose.yml) file, the command explanation of the java-tron service can be found in [Run Single Node](../single_node/README.md#run-the-container).
101101

102-
Below are the core configurations for Prometheus service:
102+
Below are the core configurations for the Prometheus service:
103103
```yaml
104104
ports:
105105
- "9090:9090" # Used for local Prometheus status check
@@ -112,21 +112,21 @@ Below are the core configurations for Prometheus service:
112112
- "--storage.tsdb.retention.time=30d"
113113
- "--storage.tsdb.max-block-duration=30m" # The maximum duration for a block of time series data that can be stored in the time series database (TSDB)
114114
- "--storage.tsdb.min-block-duration=30m"
115-
- "--web.enable-lifecycle" # Makes Prometheus to expose the /-/reload HTTP endpoints
115+
- "--web.enable-lifecycle" # Makes Prometheus expose the /-/reload HTTP endpoints
116116
- "--web.enable-admin-api"
117117
```
118118
#### Key configuration elements:
119119
##### 1. Storage configurations
120120
- The volumes command `- ./prometheus_data:/prometheus` mounts a local directory used by Prometheus to store metrics data.
121121
- Although in this case, we use Prometheus with remote-write, it also stores metrics data locally. Through http://localhost:9090/, you can check the running status of the Prometheus service and observe targets.
122-
- The `--storage.tsdb.retention.time=30d` flag specifies the retention period for the metrics data. Prometheus will automatically delete data older than 30 days. As observed, it takes about 1GB of disk space per month for one java-tron(v4.7.6) FullNode connecting Mainnet. Notice this value is larger than the space need by Thanos Receive for the same period, as there exist compact operations.
122+
- The `--storage.tsdb.retention.time=30d` flag specifies the retention period for the metrics data. Prometheus will automatically delete data older than 30 days. As observed, it takes about 1GB of disk space per month for one java-tron(v4.7.6) FullNode connecting Mainnet. Notice this value is larger than the space needed by Thanos Receive for the same period, as there exist compact operations.
123123
- Other storage flags can be found in the [official documentation](https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects). For a quick start, you could use the default values.
124124

125125
##### 2. Prometheus remote-write configuration
126126

127-
Prometheus configuration file is set to use the [prometheus.yml](conf/prometheus.yml) by volume mapping `./conf/prometheus.yml:...` and flag `--config.file=...`.
128-
It contains configuration of `scrape_configs` and `remote_write`.
129-
You need to fill the `url` with the IP address of the Thanos Receive service started at the first step.
127+
The Prometheus configuration file is set to use the [prometheus.yml](conf/prometheus.yml) by volume mapping `./conf/prometheus.yml:...` and flag `--config.file=...`.
128+
It contains the configuration of `scrape_configs` and `remote_write`.
129+
You need to fill the `url` with the IP address of the Thanos Receive service started in the first step.
130130
Check the official documentation [remote_write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) for all configurations'
131131
explanation.
132132

@@ -152,7 +152,7 @@ remote_write:
152152
send_interval: 3s # How frequently metric metadata is sent to remote Receive.
153153
max_samples_per_send: 500 # Batch size optimization
154154
```
155-
The `external_labels` defined in Prometheus configuration file are propagated with all metric data to Thanos Receive.
155+
The `external_labels` defined in the Prometheus configuration file are propagated with all metric data to Thanos Receive.
156156
It uniquely identifies the Prometheus instance,
157157
acting as critical tracing metadata that ultimately correlates metrics to their originating java-tron node.
158158
You can use it in Grafana dashboards using label-based filtering (e.g., `{monitor="java-tron-node1-remote-write"}`).
@@ -162,15 +162,15 @@ You can use it in Grafana dashboards using label-based filtering (e.g., `{monito
162162
Notice: You could add multiple Prometheus services with remote-write to the same Receive service, just make sure the `external_labels` are unique.
163163
164164
### Step 3: Set up Thanos Query
165-
So far, we have Thanos ReceivePrometheus and java-tron services running. java-tron as the origin produces metrics, then pulled by Prometheus service, which then keeping push metrics to Thanos Receive.
165+
So far, we have Thanos Receive, Prometheus, and java-tron services running. The java-tron as the origin produces metrics, then pulled by the Prometheus service, which then keeps pushing metrics to Thanos Receive.
166166
As Grafana cannot directly query Thanos Receive, we need Thanos Query that implements Prometheus’s v1 API to aggregate data from the underlying components.
167167
168-
Run below command to start Thanos Query service:
168+
Run the below command to start the Thanos Query service:
169169
```sh
170170
docker-compose up -d querier
171171
```
172172

173-
Below are the core configurations for Thanos Query service:
173+
Below are the core configurations for the Thanos Query service:
174174
``` yaml
175175
querier:
176176
...
@@ -181,10 +181,10 @@ Below are the core configurations for Thanos Query service:
181181
- query
182182
- --endpoint.info-timeout=30s
183183
- --http-address=0.0.0.0:9091
184-
- --store=[Thanos Receive IP]:10907 # --store: The grpc-address of the Thanos Receive service,if Receive run remotely replace container name "thanos-receive" with the real ip
184+
- --store=[Thanos Receive IP]:10907 # --store: The grpc-address of the Thanos Receive service,if Receive run remotely replace container name "thanos-receive" with the real IP
185185
```
186-
It will set up Thanos Query service
187-
that listens to port 9091 and queries metrics from Thanos Receive service from `--store=[Thanos Receive IP]:10907`.
186+
It will set up the Thanos Query service
187+
that listens to port 9091 and queries metrics from the Thanos Receive service from `--store=[Thanos Receive IP]:10907`.
188188
Make sure the IP address is correct.
189189
For more complex usage, please refer to the [official Query document](https://thanos.io/tip/components/query.md/).
190190

@@ -194,14 +194,14 @@ To start the Grafana service on the host machine, run the following command:
194194
docker-compose up -d grafana
195195
```
196196
Then log in to the Grafana web UI through http://localhost:3000/ or your host machine's IP address. The initial username and password are both `admin`.
197-
Click the **Connections** on the left side of the main page and select "Data Sources" to configure Grafana data sources. Enter the ip and port of the Query service in URL with `http://[Query service IP]:9091`.
197+
Click the **Connections** on the left side of the main page and select "Data Sources" to configure Grafana data sources. Enter the IP and port of the Query service in URL with `http://[Query service IP]:9091`.
198198
<img src="../../images/metric_grafana_datasource_query.png" alt="Alt Text" width="680" >
199199

200200
Follow the same instruction as [Import Dashboard](../README.md#import-dashboard) to import the dashboard.
201201
Then you can play with it with different Thanos Receive/Query, Prometheus configurations.
202202

203203
### Step 5: Clean up
204-
To stop and remove all or part of services, you could run below commands:
204+
To stop and remove all or part of the services, you could run the below commands:
205205
```sh
206206
docker-compose down # Stop and remove all services
207207
docker-compose down thanos-receive # Thanos Receive service only
@@ -210,13 +210,35 @@ docker-compose down prometheus, tron-node, querier, grafana # Multiple Services
210210

211211
## Troubleshooting
212212

213-
The commands provided above only tested on Linux and macOS. If you encounter a `KeyError: 'ContainerConfig'` while running on Linux, it may be due to an existing image with the same container name. To resolve this, remove the previously created image and try again. You can list all existing Docker containers by running:
214-
```bash
215-
docker ps -a
216-
```
213+
### Common Issues
214+
1. **Container Config Error (Linux)**
215+
- If you encounter a `KeyError: 'ContainerConfig'`, check for conflicting container names and remove them:
216+
```bash
217+
# List all containers
218+
docker ps -a
219+
220+
# Remove conflicting containers
221+
docker rm [container-name]
222+
```
223+
2. **Network Connectivity**
224+
225+
- Verify all services can communicate by checking logs:
226+
```bash
227+
docker-compose logs [service-name]
228+
```
229+
230+
- Ensure all IP addresses are correctly configured in the compose file
231+
232+
3. **Storage Issues**
217233

218-
For other challenges during implementation, please raise an issue on [GitHub](https://github.com/tronprotocol/tron-docker/issues) following the provided template guidance.
234+
- Check available disk space: `df -h`
235+
- Monitor storage usage in Prometheus, Thanos Receive, and Minio (if used) directories
219236

220-
## At the end
237+
### Getting Help
238+
For additional support:
239+
- Raise an issue on [GitHub](https://github.com/tronprotocol/tron-docker/issues)
240+
- Consult the official Thanos documentation
241+
- Review Docker logs for specific service issues
221242

222-
This guide provides tested solutions that meet specific security requirements. If these configurations do not address your customized monitoring needs, you may want to consult the [official Thanos documentation](https://thanos.io/tip/thanos/quick-tutorial.md/) for more detailed configuration options. Additionally, you can engage with the community on [GitHub](https://github.com/tronprotocol/tron-docker/issues) for further assistance.
243+
## Conclusion
244+
This guide provides a secure and scalable solution for monitoring java-tron nodes. For custom configurations beyond this setup, refer to the [official Thanos documentation](https://thanos.io/tip/thanos/quick-tutorial.md/) or engage with the community on [GitHub](https://github.com/tronprotocol/tron-docker/issues) Issue.

0 commit comments

Comments
 (0)