|
| 1 | +# Aptos Telemetry Enhancements |
| 2 | + |
| 3 | +1. **Enhanced Consensus Metrics**: |
| 4 | + - Added metrics for committed blocks and transactions |
| 5 | + - Included consensus round and version information |
| 6 | + - Added metrics for consensus timing and performance |
| 7 | + - Added sync information metrics |
| 8 | + |
| 9 | +2. **Enhanced Transaction Metrics**: |
| 10 | + - Added mempool transaction processing metrics |
| 11 | + - Added metrics for transaction broadcast performance |
| 12 | + - Included pending transaction counts |
| 13 | + |
| 14 | +3. **Enhanced Storage Metrics**: |
| 15 | + - Added latency metrics for transaction retrieval |
| 16 | + - Added latency metrics for transaction commits |
| 17 | + - Added latency metrics for transaction saving |
| 18 | + |
| 19 | +4. **Test Suite**: |
| 20 | + - Added unit tests to verify telemetry metrics collection |
| 21 | + - Added integration tests for telemetry end-to-end testing |
| 22 | + |
| 23 | +## Running Locally |
| 24 | + |
| 25 | +To run and test the telemetry implementation locally: |
| 26 | + |
| 27 | +### 1. Install Prometheus |
| 28 | + |
| 29 | +```bash |
| 30 | +brew install prometheus |
| 31 | +``` |
| 32 | + |
| 33 | +### 2. Install Grafana |
| 34 | + |
| 35 | +```bash |
| 36 | +brew install grafana |
| 37 | +``` |
| 38 | + |
| 39 | +## Configuration |
| 40 | + |
| 41 | +### 1. Prometheus Setup |
| 42 | + |
| 43 | +Create or modify `/opt/homebrew/etc/prometheus.yml`: |
| 44 | + |
| 45 | +```yaml |
| 46 | +global: |
| 47 | + scrape_interval: 15s |
| 48 | + evaluation_interval: 15s |
| 49 | + |
| 50 | +scrape_configs: |
| 51 | + - job_name: 'aptos' |
| 52 | + static_configs: |
| 53 | + - targets: ['127.0.0.1:9101'] |
| 54 | + metrics_path: '/metrics' |
| 55 | + scheme: 'http' |
| 56 | +``` |
| 57 | +
|
| 58 | +### 2. Start Services |
| 59 | +
|
| 60 | +1. Start Prometheus: |
| 61 | +```bash |
| 62 | +brew services start prometheus |
| 63 | +``` |
| 64 | + |
| 65 | +2. Start Grafana: |
| 66 | +```bash |
| 67 | +brew services start grafana |
| 68 | +``` |
| 69 | + |
| 70 | +### 3. Grafana Dashboard Setup |
| 71 | + |
| 72 | +1. Access Grafana UI: |
| 73 | + - Open `http://localhost:3000` in your browser |
| 74 | + - Default login: |
| 75 | + - Username: `admin` |
| 76 | + - Password: `admin` |
| 77 | + |
| 78 | +2. Add Prometheus Data Source: |
| 79 | + - Go to Configuration (⚙️) > Data Sources |
| 80 | + - Click "Add data source" |
| 81 | + - Select "Prometheus" |
| 82 | + - Set URL to `http://localhost:9090` |
| 83 | + - Click "Save & Test" |
| 84 | + |
| 85 | +3. Import Dashboard: |
| 86 | + - Click the "+" icon in the sidebar |
| 87 | + - Select "Import" |
| 88 | + - Upload the provided `aptos-dashboard.json` |
| 89 | + - Select your Prometheus data source |
| 90 | + - Click "Import" |
| 91 | + |
| 92 | +## Verification |
| 93 | + |
| 94 | +1. Check Prometheus targets: |
| 95 | + - Visit `http://localhost:9090/targets` |
| 96 | + - Verify the Aptos target is "UP" |
| 97 | + |
| 98 | +2. Check Grafana metrics: |
| 99 | + - View the imported dashboard |
| 100 | + - Verify metrics are being displayed |
| 101 | + - Check for: |
| 102 | + - Consensus metrics |
| 103 | + - Storage metrics |
| 104 | + - Mempool metrics |
| 105 | + - State sync metrics |
| 106 | + |
| 107 | +## Troubleshooting |
| 108 | + |
| 109 | +1. If metrics aren't showing: |
| 110 | + - Verify Aptos node is running |
| 111 | + - Check metrics endpoint: `curl http://localhost:9101/metrics` |
| 112 | + - Verify Prometheus target status |
| 113 | + - Check Grafana data source connection |
| 114 | + |
| 115 | +2. Service management: |
| 116 | + ```bash |
| 117 | + # Restart services |
| 118 | + brew services restart prometheus |
| 119 | + brew services restart grafana |
| 120 | + |
| 121 | + # Check service status |
| 122 | + brew services list |
| 123 | + |
| 124 | + # View Prometheus logs |
| 125 | + tail -f /opt/homebrew/var/log/prometheus.log |
| 126 | + |
| 127 | + # View Grafana logs |
| 128 | + tail -f /opt/homebrew/var/log/grafana.log |
| 129 | + ``` |
| 130 | + |
| 131 | +## Stopping Services |
| 132 | + |
| 133 | +```bash |
| 134 | +brew services stop prometheus |
| 135 | +brew services stop grafana |
| 136 | +``` |
| 137 | + |
| 138 | +## Telemetry Metrics |
| 139 | + |
| 140 | +The following key metrics have been added: |
| 141 | + |
| 142 | +### Consensus Metrics |
| 143 | + |
| 144 | +| Metric Name | Description | |
| 145 | +|-------------|-------------| |
| 146 | +| `consensus_last_committed_version` | The last committed ledger version | |
| 147 | +| `consensus_committed_blocks_count` | Number of blocks committed since node start | |
| 148 | +| `consensus_committed_txns_count` | Number of transactions committed since node start | |
| 149 | +| `consensus_current_round` | Current consensus round | |
| 150 | +| `consensus_round_timeout_secs` | Average round timeout in seconds | |
| 151 | +| `consensus_sync_info_msg_sent_count` | Number of sync info messages sent | |
| 152 | +| `consensus_wait_duration_s` | Average wait duration in seconds | |
| 153 | + |
| 154 | +### Mempool Metrics |
| 155 | + |
| 156 | +| Metric Name | Description | |
| 157 | +|-------------|-------------| |
| 158 | +| `mempool_txns_processed_success` | Number of successfully processed transactions | |
| 159 | +| `mempool_txns_processed_total` | Total number of transactions received | |
| 160 | +| `mempool_avg_txn_broadcast_size` | Average transaction broadcast size | |
| 161 | +| `mempool_pending_txns` | Number of pending transactions in mempool | |
| 162 | + |
| 163 | +### Storage Metrics |
| 164 | + |
| 165 | +| Metric Name | Description | |
| 166 | +|-------------|-------------| |
| 167 | +| `storage_get_transaction_latency_s` | Average latency for transaction retrieval | |
| 168 | +| `storage_commit_latency_s` | Average latency for transaction commits | |
| 169 | +| `storage_save_transactions_latency_s` | Average latency for saving transactions | |
0 commit comments