Skip to content

Commit 6569643

Browse files
committed
release notes: document Ra Metrics -> Ra Counters transition
1 parent cea6dc9 commit 6569643

File tree

1 file changed

+90
-0
lines changed

1 file changed

+90
-0
lines changed

release-notes/4.2.0.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,14 @@ In other words, if the responder publishes to only this queue name, then the mes
2828
`*.cacerts` (not to be confused with `cacertfile`) settings in `rabbitmq.conf` did not have the expected effect and were removed
2929
to eliminate confusion.
3030

31+
### Quorum Queue Metric Changes
32+
33+
Metrics emitted for Ra-based components (quorum queues, Khepri, Stream Coordinator)
34+
have changed. Some metrics were removed, many were added, some changed their names.
35+
Users relying on Prometheus metrics starting with `rabbitmq_raft` or `rabbitmq_detailed_raft`
36+
will need to update their dashboards and/or alerts. If you are using the
37+
[RabbitMQ-Quorum-Queues-Raft dashboard](https://grafana.com/grafana/dashboards/11340-rabbitmq-quorum-queues-raft/),
38+
please update it to the latest version for RabbitMQ 4.2 compatibility.
3139

3240
## Release Highlights
3341

@@ -407,6 +415,88 @@ compared to other versions.
407415
* `cuttlefish` was upgraded to [`3.5.0`](https://github.com/kyorai/cuttlefish/releases)
408416

409417

418+
## Ra Metric Changes
419+
420+
Metrics emitted for Ra-based components (quorum queues, Khepri, Stream Coordinator)
421+
have changed. Some metrics were removed, many were added, some changed their names.
422+
For most users this should not require any action. However, users relying on Prometheus
423+
metrics starting with `rabbitmq_raft` or `rabbitmq_detailed_raft` will need to update
424+
their dashboards and/or alerts. If you are using the
425+
[RabbitMQ-Quorum-Queues-Raft dashboard](https://grafana.com/grafana/dashboards/11340-rabbitmq-quorum-queues-raft/),
426+
please update it to the latest version for RabbitMQ 4.2 compatibility.
427+
428+
#### More Accurate and Detailed Ra Metrics
429+
430+
Ra is an internal component implementing the Raft protocol. It's the basis
431+
for quorum queues, as well as some internal components (currently Khepri
432+
and Stream Coordinator). For quite some time, Ra metrics were tracked in two places
433+
but RabbitMQ relied on the old metric subsystem. In RabbitMQ 4.2, the old
434+
Ra metrics subsystem has been removed and RabbitMQ now reports Ra metrics
435+
from the new subsystem (implemented using [Seshat](https://github.com/rabbitmq/seshat) library).
436+
This migration has the following benefits:
437+
438+
* lower overhead, since only one subsystem is used
439+
* more up-to-date information - the old subsystem was only refreshed every 5 seconds,
440+
the new subsystem always returns the latest values
441+
* additional metrics are exposed, making it easier to debug the system if necessary
442+
443+
### Aggregated metrics (/metrics endpoint)
444+
445+
* `rabbitmq_raft_num_segments` was added; it reports the number of segment files of the internal components
446+
447+
* `rabbitmq_raft_max_num_segments` was added; it reports the highest number of segment
448+
files of any of the quorum queues; per-object metrics can be used to find which queue
449+
has a high number of segment files
450+
451+
* `rabbitmq_raft_term_total` has been removed
452+
this metric was emitted accidentally as a side effect of metric aggregation;
453+
the sum of Raft terms across all Raft clusters is a meaningless number
454+
455+
* some metrics contained the `_log_` substring in their name, even though they are not related to the Raft log;
456+
hence, they were renamed to avoid the misleading part:
457+
* `rabbitmq_raft_log_snapshot_index` -> `rabbitmq_raft_snapshot_index`
458+
* `rabbitmq_raft_log_last_applied_index` -> `rabbitmq_raft_last_applied`
459+
* `rabbitmq_raft_log_commit_index` -> `rabbitmq_raft_commit_index`
460+
* `rabbitmq_raft_log_last_written_index` -> `rabbitmq_raft_last_written_index`
461+
462+
* `rabbitmq_raft_entry_commit_latency_seconds` has been removed; it was an average latency across all Ra clusters
463+
in all Ra systems (RabbitMQ currently uses two separate Ra systems: one for quorum queues and one for internal
464+
components, currently Khepri and Stream Coordinator); it was therefore not very useful, since different
465+
components can have very different latencies
466+
467+
* `rabbitmq_raft_commit_latency_seconds` was added; in case of aggregated metrics, it is only reported for
468+
internal components (currently Khepri and Stream Coordinator)
469+
470+
* `rabbitmq_raft_max_commit_latency_seconds` has been added; it's the highest commit latency reported by any
471+
of the quorum queues. When it's high, per-object can be used to find which specific queue reports high commit latency
472+
473+
### Per-object metrics (/metrics/per-object endpoint)
474+
475+
More metrics are reported for each queue than in older versions.
476+
477+
Incorrect metric names were corrected as described above.
478+
479+
Additionally:
480+
* `rabbitmq_raft_term_total` has been renamed to `rabbitmq_raft_term` (the "total" suffix
481+
was incorrect and misleading, since the metrics is reported for each specific Ra cluster)
482+
483+
* `rabbitmq_raft_num_segments` was added; it reports the number of segment files of the internal components
484+
and for each quorum queue
485+
486+
### Detailed metrics (/metrics/detailed endpoint)
487+
488+
When the detailed endpoints is scraped with `family=ra_metrics` parameter,
489+
more metrics are reported for each queue than in older versions.
490+
491+
Incorrect metric names were corrected as described above.
492+
493+
Additionally:
494+
* `rabbitmq_raft_term_total` has been renamed to `rabbitmq_raft_term` (the "total" suffix
495+
was incorrect and misleading, since the metrics is reported for each specific Ra cluster)
496+
497+
* `rabbitmq_raft_num_segments` was added; it reports the number of segment files of the internal components
498+
and for each quorum queue
499+
410500
## Source Code Archives
411501

412502
To obtain source code of the entire distribution, please download the archive named `rabbitmq-server-4.2.0.tar.xz`

0 commit comments

Comments
 (0)