Skip to content

Commit ddc1e75

Browse files
committed
chore(metrics)!: align names with Prometheus base units
Rename Firewood metrics to follow Prometheus naming best practices: https://prometheus.io/docs/practices/naming/ Examples (old -> new): - proposals -> proposals_total - commit.latency_ms -> commit.latency_seconds_total - io.read_ms -> io.read_seconds_total - space.reused -> space.reused_bytes_total - commit_ms_bucket -> commit_duration_seconds BREAKING CHANGE: all metric names changed; update dashboards, alerts, and external consumers.
1 parent fea4e81 commit ddc1e75

File tree

26 files changed

+341
-208
lines changed

26 files changed

+341
-208
lines changed

METRICS.md

Lines changed: 30 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ See the [FFI README](ffi/README.md) for more details on FFI metrics configuratio
6262
- Labels: `success=true|false`
6363
- Use: Monitor proposal creation success rate
6464

65-
- **`proposal.create_ms`** (counter with `success` label)
66-
- Description: Time spent creating proposals in milliseconds
65+
- **`proposal.create_s`** (counter with `success` label)
66+
- Description: Time spent creating proposals in seconds (accumulated as nanoseconds)
6767
- Labels: `success=true|false`
6868
- Use: Track proposal creation latency
6969

@@ -72,8 +72,8 @@ See the [FFI README](ffi/README.md) for more details on FFI metrics configuratio
7272
- Labels: `success=true|false`
7373
- Use: Monitor commit success rate
7474

75-
- **`proposal.commit_ms`** (counter with `success` label)
76-
- Description: Time spent committing proposals in milliseconds
75+
- **`proposal.commit_s`** (counter with `success` label)
76+
- Description: Time spent committing proposals in seconds (accumulated as nanoseconds)
7777
- Labels: `success=true|false`
7878
- Use: Track commit latency and identify slow commits
7979

@@ -142,14 +142,14 @@ See the [FFI README](ffi/README.md) for more details on FFI metrics configuratio
142142
- Description: Total number of I/O read operations
143143
- Use: Track I/O operation count
144144

145-
- **`io.read_ms`** (counter)
146-
- Description: Total time spent in I/O reads in milliseconds
145+
- **`io.read_s`** (counter)
146+
- Description: Total time spent in I/O reads in seconds (accumulated as nanoseconds)
147147
- Use: Identify I/O bottlenecks and disk performance issues
148148

149149
#### Node Persistence
150150

151151
- **`flush_nodes`** (counter)
152-
- Description: Cumulative time spent flushing nodes to disk in milliseconds (counter incremented by flush duration)
152+
- Description: Cumulative time spent flushing nodes to disk in seconds (accumulated as nanoseconds)
153153
- Use: Monitor flush performance and identify slow disk writes; calculate average flush time using rate()
154154

155155
### Memory Management
@@ -202,30 +202,42 @@ These metrics are specific to the Foreign Function Interface (Go) layer:
202202
- Description: Count of batch operations completed
203203
- Use: Track FFI batch throughput
204204

205-
- **`ffi.batch_ms`** (counter)
206-
- Description: Time spent processing batches in milliseconds
205+
- **`ffi.batch_s`** (counter)
206+
- Description: Time spent processing batches in seconds (accumulated as nanoseconds)
207207
- Use: Monitor FFI batch latency
208208

209+
- **`ffi.batch_seconds_bucket`** (histogram)
210+
- Description: Histogram of batch processing durations in seconds
211+
- Use: Analyze batch latency distribution
212+
209213
#### Proposal Operations
210214

211215
- **`ffi.propose`** (counter)
212216
- Description: Count of proposal operations via FFI
213217
- Use: Track FFI proposal throughput
214218

215-
- **`ffi.propose_ms`** (counter)
216-
- Description: Time spent creating proposals via FFI in milliseconds
219+
- **`ffi.propose_s`** (counter)
220+
- Description: Time spent creating proposals via FFI in seconds (accumulated as nanoseconds)
217221
- Use: Monitor FFI proposal latency
218222

223+
- **`ffi.propose_seconds_bucket`** (histogram)
224+
- Description: Histogram of proposal creation durations in seconds
225+
- Use: Analyze proposal latency distribution
226+
219227
#### Commit Operations
220228

221229
- **`ffi.commit`** (counter)
222230
- Description: Count of commit operations via FFI
223231
- Use: Track FFI commit throughput
224232

225-
- **`ffi.commit_ms`** (counter)
226-
- Description: Time spent committing via FFI in milliseconds
233+
- **`ffi.commit_s`** (counter)
234+
- Description: Time spent committing via FFI in seconds (accumulated as nanoseconds)
227235
- Use: Monitor FFI commit latency
228236

237+
- **`ffi.commit_seconds_bucket`** (histogram)
238+
- Description: Histogram of commit durations in seconds
239+
- Use: Analyze commit latency distribution
240+
229241
#### View Caching
230242

231243
- **`ffi.cached_view.hit`** (counter)
@@ -240,12 +252,12 @@ These metrics are specific to the Foreign Function Interface (Go) layer:
240252

241253
### Performance Monitoring
242254

243-
1. **Latency Tracking**: The `*_ms` metrics track operation durations. Monitor these for:
255+
1. **Latency Tracking**: The `*_s` metrics track operation durations in seconds (accumulated as nanoseconds for precision). Monitor these for:
244256
- Sudden increases indicating performance degradation
245257
- Baseline establishment for SLA monitoring
246258
- Correlation with system load
247259

248-
2. **Throughput Monitoring**: Counter metrics without `_ms` suffix track operation counts:
260+
2. **Throughput Monitoring**: Counter metrics without `_s` suffix track operation counts:
249261
- Rate of change indicates throughput
250262
- Compare with expected load patterns
251263
- Identify anomalies in operation rates
@@ -288,8 +300,9 @@ These metrics are specific to the Foreign Function Interface (Go) layer:
288300
For Prometheus-based monitoring (note: metric names use underscores in queries):
289301

290302
```promql
291-
# Average commit latency over 5 minutes
292-
rate(firewood_proposal_commit_ms[5m]) / rate(firewood_proposal_commit[5m])
303+
# Average commit latency over 5 minutes (in seconds)
304+
# Note: counters store nanoseconds, so divide by 1e9 to get seconds
305+
rate(firewood_proposal_commit_s[5m]) / 1e9 / rate(firewood_proposal_commit[5m])
293306
294307
# Cache hit rate
295308
sum(rate(firewood_cache_node{type="hit"}[5m])) /

METRICS_UPDATE_SUMMARY.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Metrics Naming Update Summary
2+
3+
## Overview
4+
5+
All Firewood metrics have been updated to follow Prometheus naming conventions:
6+
7+
1. **Base units**: seconds (not milliseconds), bytes (not kilobytes)
8+
2. **Plural units**: `_seconds` not `_second`, `_bytes` not `_byte`
9+
3. **Accumulating counters**: Must have `_total` suffix
10+
4. **Histograms**: Named `*_duration_seconds` for timing observations
11+
12+
## Changed Metric Names
13+
14+
### Firewood Core
15+
16+
| Old Name | New Name | Type | Unit |
17+
|----------|----------|------|------|
18+
| `proposals` | `proposals_total` | counter | count |
19+
| `proposals.created` | `proposals.created_total` | counter | count |
20+
| `proposals.discarded` | `proposals.discarded_total` | counter | count |
21+
| `insert` | `insert_total` | counter | count |
22+
| `remove` | `remove_total` | counter | count |
23+
| `change_proof.next` | `change_proof.next_total` | counter | count |
24+
| `commit_latency_s` | `commit_latency_seconds_total` | counter | nanoseconds |
25+
26+
### Storage Layer
27+
28+
| Old Name | New Name | Type | Unit |
29+
|----------|----------|------|------|
30+
| `space.reused` | `space.reused_bytes_total` | counter | bytes |
31+
| `space.from_end` | `space.from_end_bytes_total` | counter | bytes |
32+
| `space.freed` | `space.freed_bytes_total` | counter | bytes |
33+
| `delete_node` | `delete_node_total` | counter | count |
34+
| `flush_nodes` | `flush_nodes_seconds_total` | counter | nanoseconds |
35+
| `read_node` | `read_node_total` | counter | count |
36+
| `cache.node` | `cache.node_total` | counter | count |
37+
| `cache.freelist` | `cache.freelist_total` | counter | count |
38+
| `io.read_s` | `io.read_seconds_total` | counter | nanoseconds |
39+
| `io.read` | `io.read_total` | counter | count |
40+
| `proposals.reparented` | `proposals.reparented_total` | counter | count |
41+
| `ring.eagain_write_retry` | `ring.eagain_write_retry_total` | counter | count |
42+
| `ring.full` | `ring.full_total` | counter | count |
43+
| `ring.sq_wait` | `ring.sq_wait_total` | counter | count |
44+
| `ring.partial_write_retry` | `ring.partial_write_retry_total` | counter | count |
45+
46+
### FFI Layer
47+
48+
| Old Name | New Name | Type | Unit |
49+
|----------|----------|------|------|
50+
| `ffi.commit_s` | `ffi.commit_seconds_total` | counter | nanoseconds |
51+
| `ffi.commit` | `ffi.commit_total` | counter | count |
52+
| `ffi.commit_seconds_bucket` | `ffi.commit_duration_seconds` | histogram | seconds |
53+
| `ffi.propose_s` | `ffi.propose_seconds_total` | counter | nanoseconds |
54+
| `ffi.propose` | `ffi.propose_total` | counter | count |
55+
| `ffi.propose_seconds_bucket` | `ffi.propose_duration_seconds` | histogram | seconds |
56+
| `ffi.batch_s` | `ffi.batch_seconds_total` | counter | nanoseconds |
57+
| `ffi.batch` | `ffi.batch_total` | counter | count |
58+
| `ffi.batch_seconds_bucket` | `ffi.batch_duration_seconds` | histogram | seconds |
59+
| `ffi.cached_view.miss` | `ffi.cached_view.miss_total` | counter | count |
60+
| `ffi.cached_view.hit` | `ffi.cached_view.hit_total` | counter | count |
61+
| `firewood.ffi.merge` | `firewood.ffi.merge_total` | counter | count |
62+
63+
### Replay Layer
64+
65+
| Old Name | New Name | Type | Unit |
66+
|----------|----------|------|------|
67+
| `replay.propose_s` | `replay.propose_seconds_total` | counter | nanoseconds |
68+
| `replay.propose` | `replay.propose_total` | counter | count |
69+
| `replay.commit_s` | `replay.commit_seconds_total` | counter | nanoseconds |
70+
| `replay.commit` | `replay.commit_total` | counter | count |
71+
72+
## Implementation Details
73+
74+
### Time Metrics
75+
76+
- **Counters** (e.g., `*_seconds_total`): Accumulate nanoseconds for precision, conceptually represent seconds
77+
- Usage: `firewood_increment!(METRIC, elapsed.as_nanos() as u64)`
78+
- Prometheus query: Divide by 1e9 to convert to seconds: `rate(metric_seconds_total[5m]) / 1e9`
79+
80+
- **Histograms** (e.g., `*_duration_seconds`): Record observations in seconds
81+
- Usage: `firewood_record!(METRIC, elapsed.as_f64(), expensive)`
82+
- Prometheus automatically creates `_bucket`, `_sum`, `_count` suffixes
83+
84+
### Size Metrics
85+
86+
- All size metrics now explicitly include `_bytes_total` suffix
87+
- Values represent actual bytes (not KB/MB)
88+
89+
## Breaking Changes
90+
91+
⚠️ **This is a breaking change** for:
92+
- Prometheus dashboards and queries
93+
- Alerting rules
94+
- Any external monitoring systems
95+
96+
### Migration Guide
97+
98+
1. Update Prometheus queries:
99+
```promql
100+
# Old
101+
rate(firewood_ffi_commit_s[5m])
102+
103+
# New
104+
rate(firewood_ffi_commit_seconds_total[5m]) / 1e9
105+
```
106+
107+
2. Update Grafana dashboards:
108+
- Replace all old metric names with new ones
109+
- Add `/1e9` division for timing counters
110+
111+
3. Update alerting rules with new metric names
112+
113+
## Testing
114+
115+
All metrics tested with:
116+
```bash
117+
cargo check --workspace --features ethhash,logger --all-targets
118+
```
119+
120+
Compilation successful ✅

ffi/metrics_test.go

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,20 +22,20 @@ import (
2222
var (
2323
metricsPort = uint16(3000)
2424
expectedMetrics = map[string]dto.MetricType{
25-
"ffi_batch": dto.MetricType_COUNTER,
26-
"proposal_commit": dto.MetricType_COUNTER,
27-
"proposal_commit_ms": dto.MetricType_COUNTER,
28-
"ffi_propose_ms": dto.MetricType_COUNTER,
29-
"ffi_commit_ms": dto.MetricType_COUNTER,
30-
"ffi_batch_ms": dto.MetricType_COUNTER,
31-
"flush_nodes": dto.MetricType_COUNTER,
32-
"insert": dto.MetricType_COUNTER,
33-
"space_from_end": dto.MetricType_COUNTER,
25+
"ffi_batch_total": dto.MetricType_COUNTER,
26+
"proposal_commit_total": dto.MetricType_COUNTER,
27+
"proposal_commit_seconds_total": dto.MetricType_COUNTER,
28+
"ffi_propose_seconds_total": dto.MetricType_COUNTER,
29+
"ffi_commit_seconds_total": dto.MetricType_COUNTER,
30+
"ffi_batch_seconds_total": dto.MetricType_COUNTER,
31+
"flush_nodes_seconds_total": dto.MetricType_COUNTER,
32+
"insert_total": dto.MetricType_COUNTER,
33+
"space_from_end_bytes_total": dto.MetricType_COUNTER,
3434
}
3535
expectedExpensiveMetrics = map[string]dto.MetricType{
36-
"ffi_commit_ms_bucket": dto.MetricType_HISTOGRAM,
37-
"ffi_propose_ms_bucket": dto.MetricType_HISTOGRAM,
38-
"ffi_batch_ms_bucket": dto.MetricType_HISTOGRAM,
36+
"ffi_commit_duration_seconds": dto.MetricType_HISTOGRAM,
37+
"ffi_propose_duration_seconds": dto.MetricType_HISTOGRAM,
38+
"ffi_batch_duration_seconds": dto.MetricType_HISTOGRAM,
3939
}
4040
initMetrics sync.Once
4141
initLogs sync.Once

ffi/src/handle.rs

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -224,20 +224,20 @@ impl DatabaseHandle {
224224
self.create_proposal_handle(values.as_ref())?;
225225

226226
let root_hash = handle.commit_proposal(|commit_time| {
227-
firewood_increment!(crate::registry::COMMIT_MS, commit_time.as_millis());
227+
firewood_increment!(crate::registry::COMMIT_SECONDS_TOTAL, commit_time.as_nanos() as u64);
228228
firewood_record!(
229-
crate::registry::COMMIT_MS_BUCKET,
230-
commit_time.as_f64() * 1000.0,
229+
crate::registry::COMMIT_DURATION_SECONDS,
230+
commit_time.as_f64(),
231231
expensive
232232
);
233233
})?;
234234

235235
let elapsed = start_time.elapsed();
236-
firewood_increment!(crate::registry::BATCH_MS, elapsed.as_millis());
237-
firewood_increment!(crate::registry::BATCH_COUNT, 1);
236+
firewood_increment!(crate::registry::BATCH_SECONDS_TOTAL, elapsed.as_nanos() as u64);
237+
firewood_increment!(crate::registry::BATCH_TOTAL, 1);
238238
firewood_record!(
239-
crate::registry::BATCH_MS_BUCKET,
240-
elapsed.as_f64() * 1000.0,
239+
crate::registry::BATCH_DURATION_SECONDS,
240+
elapsed.as_f64(),
241241
expensive
242242
);
243243

@@ -267,9 +267,9 @@ impl DatabaseHandle {
267267
})?;
268268

269269
if cache_miss {
270-
firewood_increment!(crate::registry::CACHED_VIEW_MISS, 1);
270+
firewood_increment!(crate::registry::CACHED_VIEW_MISS_TOTAL, 1);
271271
} else {
272-
firewood_increment!(crate::registry::CACHED_VIEW_HIT, 1);
272+
firewood_increment!(crate::registry::CACHED_VIEW_HIT_TOTAL, 1);
273273
}
274274

275275
Ok(view)

ffi/src/lib.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -518,11 +518,11 @@ pub extern "C" fn fwd_commit_proposal(proposal: Option<Box<ProposalHandle<'_>>>)
518518

519519
let result = invoke_with_handle(proposal, move |proposal| {
520520
proposal.commit_proposal(|commit_time| {
521-
firewood_increment!(crate::registry::COMMIT_MS, commit_time.as_millis());
522-
firewood_increment!(crate::registry::COMMIT_COUNT, 1);
521+
firewood_increment!(crate::registry::COMMIT_SECONDS_TOTAL, commit_time.as_nanos() as u64);
522+
firewood_increment!(crate::registry::COMMIT_TOTAL, 1);
523523
firewood_record!(
524-
crate::registry::COMMIT_MS_BUCKET,
525-
commit_time.as_f64() * 1000.0,
524+
crate::registry::COMMIT_DURATION_SECONDS,
525+
commit_time.as_f64(),
526526
expensive
527527
);
528528
})

ffi/src/proofs/change.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -248,8 +248,8 @@ impl<'db> ProposedChangeProofContext<'db> {
248248
};
249249

250250
let metrics_cb = |commit_time: coarsetime::Duration| {
251-
firewood_increment!(crate::registry::COMMIT_MS, commit_time.as_millis(), "change" => "commit");
252-
firewood_increment!(crate::registry::MERGE_COUNT, 1, "change" => "commit");
251+
firewood_increment!(crate::registry::COMMIT_SECONDS_TOTAL, commit_time.as_nanos() as u64, "change" => "commit");
252+
firewood_increment!(crate::registry::MERGE_TOTAL, 1, "change" => "commit");
253253
};
254254

255255
let result = proposal_handle.commit_proposal(metrics_cb);

ffi/src/proofs/range.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -225,8 +225,8 @@ impl<'db> RangeProofContext<'db> {
225225
};
226226

227227
let metrics_cb = |commit_time: coarsetime::Duration| {
228-
firewood_increment!(crate::registry::COMMIT_MS, commit_time.as_millis());
229-
firewood_increment!(crate::registry::MERGE_COUNT, 1);
228+
firewood_increment!(crate::registry::COMMIT_SECONDS_TOTAL, commit_time.as_nanos() as u64);
229+
firewood_increment!(crate::registry::MERGE_TOTAL, 1);
230230
};
231231

232232
let result = proposal_handle.commit_proposal(metrics_cb);

ffi/src/proposal.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -130,11 +130,11 @@ impl<'db> CreateProposalResult<'db> {
130130
let start_time = coarsetime::Instant::now();
131131
let proposal = f()?;
132132
let propose_time = start_time.elapsed();
133-
firewood_increment!(crate::registry::PROPOSE_MS, propose_time.as_millis());
134-
firewood_increment!(crate::registry::PROPOSE_COUNT, 1);
133+
firewood_increment!(crate::registry::PROPOSE_SECONDS_TOTAL, propose_time.as_nanos() as u64);
134+
firewood_increment!(crate::registry::PROPOSE_TOTAL, 1);
135135
firewood_record!(
136-
crate::registry::PROPOSE_MS_BUCKET,
137-
propose_time.as_f64() * 1000.0,
136+
crate::registry::PROPOSE_DURATION_SECONDS,
137+
propose_time.as_f64(),
138138
expensive
139139
);
140140

0 commit comments

Comments
 (0)