Commit c98b2af
authored
feat(meta-service): implement structured metrics parsing with label support (#18555)
* perf(meta-service): add 200ms rate limiting to metrics subscription
Add rate limiting to subscribe_metrics() to prevent excessive CPU usage
when metrics updates arrive frequently. The loop now ensures execution
at most once every 200ms by tracking elapsed time and sleeping for the
remaining duration if processing completes faster.
* chore(meta-service): move metrics report to separate function
* feat(meta-service): implement structured metrics parsing with label support
Add comprehensive Prometheus metrics parsing that converts raw metrics
strings into structured JSON format with proper categorization and
histogram percentile calculation.
Key features:
- Parse and categorize metrics into meta_network, raft_network,
raft_storage, and server categories
- Support for labeled metrics using sub-key structure
- Convert histogram buckets to percentile arrays (p50, p90, p99, p99.9)
- Strip quotes from label values for cleaner keys
- Use BTreeMap for deterministic ordering of output
- Handle malformed metrics gracefully with fallbacks
- Comprehensive test coverage for edge cases
The parsing supports both simple metrics and complex labeled histograms,
organizing data into nested JSON structure for better readability and
programmatic access.
Example of output(output in one line in log):
```
{
"metrics": {
"meta_network": {
"req_inflights": 0.0,
"req_failed_total": 0.0,
"rpc_delay_seconds_count": 1860.0,
"req_success_total": 12086.0,
"sent_bytes_total": 605472.0,
"recv_bytes_total": 1225739.0,
"stream_list_item_sent_total": 1081.0,
"watch_change_total": 0.0,
"rpc_delay_ms": [
[
"p50",
50.0
],
[
"p90",
50.0
],
[
"p99",
100.0
],
[
"p99.9",
100.0
]
],
"watch_initialization_total": 0.0,
"stream_mget_item_sent_total": 9135.0,
"rpc_delay_ms_sum": 55465.0,
"rpc_delay_seconds_sum": 56.39243658700001,
"stream_get_item_sent_total": 0.0,
"rpc_delay_ms_count": 1860.0
},
"raft_storage": {
"snapshot_written_entries_total": 1706.0,
"snapshot_building": 0.0
},
"server": {
"is_leader": 1.0,
"snapshot_key_count": 1706.0,
"snapshot_index_size": 214.0,
"last_seq": 4298.0,
"raft_log_wal_closed_chunk_total_size": 0.0,
"snapshot_read_block": 7148.0,
"last_log_index": 1861.0,
"snapshot_avg_keys_per_block": 1706.0,
"proposals_applied": 1861.0,
"raft_log_cache_used_size": 1221253.0,
"applying_snapshot": 0.0,
"version": {
"component=metasrv,semver=v1.2.795-nightly,sha=477beea44a": 1.0
},
"snapshot_expire_index_count": 3.0,
"snapshot_primary_index_count": 1703.0,
"raft_log_wal_closed_chunk_count": 0.0,
"current_leader_id": 0.0,
"snapshot_data_size": 249673.0,
"proposals_failed_total": 0.0,
"snapshot_read_block_from_cache": 7147.0,
"raft_log_cache_items": 1862.0,
"raft_log_wal_open_chunk_size": 1099982.0,
"leader_changes_total": 1.0,
"watchers": 0.0,
"node_is_health": 1.0,
"proposals_pending": 0.0,
"raft_log_size": 1099982.0,
"raft_log_wal_offset": 1099982.0,
"snapshot_avg_block_size": 249673.0,
"current_term": 1.0,
"snapshot_read_block_from_disk": 1.0,
"snapshot_block_count": 1.0,
"read_failed_total": 0.0
}
}
}
```
* feat: enhance metrics parsing with debug logging and percentile calculations1 parent 3cbba37 commit c98b2af
File tree
2 files changed
+746
-71
lines changed- src/meta/service/src
- meta_service
- metrics
2 files changed
+746
-71
lines changed
0 commit comments