Skip to content

Commit e3e5a58

Browse files
committed
doc: add docs and instructions for updated telemetry
1 parent deb02e0 commit e3e5a58

File tree

1 file changed

+169
-0
lines changed

1 file changed

+169
-0
lines changed

doc/telemetry_guide.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# Aptos Telemetry Enhancements
2+
3+
1. **Enhanced Consensus Metrics**:
4+
- Added metrics for committed blocks and transactions
5+
- Included consensus round and version information
6+
- Added metrics for consensus timing and performance
7+
- Added sync information metrics
8+
9+
2. **Enhanced Transaction Metrics**:
10+
- Added mempool transaction processing metrics
11+
- Added metrics for transaction broadcast performance
12+
- Included pending transaction counts
13+
14+
3. **Enhanced Storage Metrics**:
15+
- Added latency metrics for transaction retrieval
16+
- Added latency metrics for transaction commits
17+
- Added latency metrics for transaction saving
18+
19+
4. **Test Suite**:
20+
- Added unit tests to verify telemetry metrics collection
21+
- Added integration tests for telemetry end-to-end testing
22+
23+
## Running Locally
24+
25+
To run and test the telemetry implementation locally:
26+
27+
### 1. Install Prometheus
28+
29+
```bash
30+
brew install prometheus
31+
```
32+
33+
### 2. Install Grafana
34+
35+
```bash
36+
brew install grafana
37+
```
38+
39+
## Configuration
40+
41+
### 1. Prometheus Setup
42+
43+
Create or modify `/opt/homebrew/etc/prometheus.yml`:
44+
45+
```yaml
46+
global:
47+
scrape_interval: 15s
48+
evaluation_interval: 15s
49+
50+
scrape_configs:
51+
- job_name: 'aptos'
52+
static_configs:
53+
- targets: ['127.0.0.1:9101']
54+
metrics_path: '/metrics'
55+
scheme: 'http'
56+
```
57+
58+
### 2. Start Services
59+
60+
1. Start Prometheus:
61+
```bash
62+
brew services start prometheus
63+
```
64+
65+
2. Start Grafana:
66+
```bash
67+
brew services start grafana
68+
```
69+
70+
### 3. Grafana Dashboard Setup
71+
72+
1. Access Grafana UI:
73+
- Open `http://localhost:3000` in your browser
74+
- Default login:
75+
- Username: `admin`
76+
- Password: `admin`
77+
78+
2. Add Prometheus Data Source:
79+
- Go to Configuration (⚙️) > Data Sources
80+
- Click "Add data source"
81+
- Select "Prometheus"
82+
- Set URL to `http://localhost:9090`
83+
- Click "Save & Test"
84+
85+
3. Import Dashboard:
86+
- Click the "+" icon in the sidebar
87+
- Select "Import"
88+
- Upload the provided `aptos-dashboard.json`
89+
- Select your Prometheus data source
90+
- Click "Import"
91+
92+
## Verification
93+
94+
1. Check Prometheus targets:
95+
- Visit `http://localhost:9090/targets`
96+
- Verify the Aptos target is "UP"
97+
98+
2. Check Grafana metrics:
99+
- View the imported dashboard
100+
- Verify metrics are being displayed
101+
- Check for:
102+
- Consensus metrics
103+
- Storage metrics
104+
- Mempool metrics
105+
- State sync metrics
106+
107+
## Troubleshooting
108+
109+
1. If metrics aren't showing:
110+
- Verify Aptos node is running
111+
- Check metrics endpoint: `curl http://localhost:9101/metrics`
112+
- Verify Prometheus target status
113+
- Check Grafana data source connection
114+
115+
2. Service management:
116+
```bash
117+
# Restart services
118+
brew services restart prometheus
119+
brew services restart grafana
120+
121+
# Check service status
122+
brew services list
123+
124+
# View Prometheus logs
125+
tail -f /opt/homebrew/var/log/prometheus.log
126+
127+
# View Grafana logs
128+
tail -f /opt/homebrew/var/log/grafana.log
129+
```
130+
131+
## Stopping Services
132+
133+
```bash
134+
brew services stop prometheus
135+
brew services stop grafana
136+
```
137+
138+
## Telemetry Metrics
139+
140+
The following key metrics have been added:
141+
142+
### Consensus Metrics
143+
144+
| Metric Name | Description |
145+
|-------------|-------------|
146+
| `consensus_last_committed_version` | The last committed ledger version |
147+
| `consensus_committed_blocks_count` | Number of blocks committed since node start |
148+
| `consensus_committed_txns_count` | Number of transactions committed since node start |
149+
| `consensus_current_round` | Current consensus round |
150+
| `consensus_round_timeout_secs` | Average round timeout in seconds |
151+
| `consensus_sync_info_msg_sent_count` | Number of sync info messages sent |
152+
| `consensus_wait_duration_s` | Average wait duration in seconds |
153+
154+
### Mempool Metrics
155+
156+
| Metric Name | Description |
157+
|-------------|-------------|
158+
| `mempool_txns_processed_success` | Number of successfully processed transactions |
159+
| `mempool_txns_processed_total` | Total number of transactions received |
160+
| `mempool_avg_txn_broadcast_size` | Average transaction broadcast size |
161+
| `mempool_pending_txns` | Number of pending transactions in mempool |
162+
163+
### Storage Metrics
164+
165+
| Metric Name | Description |
166+
|-------------|-------------|
167+
| `storage_get_transaction_latency_s` | Average latency for transaction retrieval |
168+
| `storage_commit_latency_s` | Average latency for transaction commits |
169+
| `storage_save_transactions_latency_s` | Average latency for saving transactions |

0 commit comments

Comments
 (0)