@@ -28,6 +28,14 @@ In other words, if the responder publishes to only this queue name, then the mes
2828` *.cacerts ` (not to be confused with ` cacertfile ` ) settings in ` rabbitmq.conf ` did not have the expected effect and were removed
2929to eliminate confusion.
3030
31+ ### Quorum Queue Metric Changes
32+
33+ Metrics emitted for Ra-based components (quorum queues, Khepri, Stream Coordinator)
34+ have changed. Some metrics were removed, many were added, some changed their names.
35+ Users relying on Prometheus metrics starting with ` rabbitmq_raft ` or ` rabbitmq_detailed_raft `
36+ will need to update their dashboards and/or alerts. If you are using the
37+ [ RabbitMQ-Quorum-Queues-Raft dashboard] ( https://grafana.com/grafana/dashboards/11340-rabbitmq-quorum-queues-raft/ ) ,
38+ please update it to the latest version for RabbitMQ 4.2 compatibility.
3139
3240## Release Highlights
3341
@@ -407,6 +415,88 @@ compared to other versions.
407415 * ` cuttlefish ` was upgraded to [ ` 3.5.0 ` ] ( https://github.com/kyorai/cuttlefish/releases )
408416
409417
418+ ## Ra Metric Changes
419+
420+ Metrics emitted for Ra-based components (quorum queues, Khepri, Stream Coordinator)
421+ have changed. Some metrics were removed, many were added, some changed their names.
422+ For most users this should not require any action. However, users relying on Prometheus
423+ metrics starting with ` rabbitmq_raft ` or ` rabbitmq_detailed_raft ` will need to update
424+ their dashboards and/or alerts. If you are using the
425+ [ RabbitMQ-Quorum-Queues-Raft dashboard] ( https://grafana.com/grafana/dashboards/11340-rabbitmq-quorum-queues-raft/ ) ,
426+ please update it to the latest version for RabbitMQ 4.2 compatibility.
427+
428+ #### More Accurate and Detailed Ra Metrics
429+
430+ Ra is an internal component implementing the Raft protocol. It's the basis
431+ for quorum queues, as well as some internal components (currently Khepri
432+ and Stream Coordinator). For quite some time, Ra metrics were tracked in two places
433+ but RabbitMQ relied on the old metric subsystem. In RabbitMQ 4.2, the old
434+ Ra metrics subsystem has been removed and RabbitMQ now reports Ra metrics
435+ from the new subsystem (implemented using [ Seshat] ( https://github.com/rabbitmq/seshat ) library).
436+ This migration has the following benefits:
437+
438+ * lower overhead, since only one subsystem is used
439+ * more up-to-date information - the old subsystem was only refreshed every 5 seconds,
440+ the new subsystem always returns the latest values
441+ * additional metrics are exposed, making it easier to debug the system if necessary
442+
443+ ### Aggregated metrics (/metrics endpoint)
444+
445+ * ` rabbitmq_raft_num_segments ` was added; it reports the number of segment files of the internal components
446+
447+ * ` rabbitmq_raft_max_num_segments ` was added; it reports the highest number of segment
448+ files of any of the quorum queues; per-object metrics can be used to find which queue
449+ has a high number of segment files
450+
451+ * ` rabbitmq_raft_term_total ` has been removed
452+ this metric was emitted accidentally as a side effect of metric aggregation;
453+ the sum of Raft terms across all Raft clusters is a meaningless number
454+
455+ * some metrics contained the ` _log_ ` substring in their name, even though they are not related to the Raft log;
456+ hence, they were renamed to avoid the misleading part:
457+ * ` rabbitmq_raft_log_snapshot_index ` -> ` rabbitmq_raft_snapshot_index `
458+ * ` rabbitmq_raft_log_last_applied_index ` -> ` rabbitmq_raft_last_applied `
459+ * ` rabbitmq_raft_log_commit_index ` -> ` rabbitmq_raft_commit_index `
460+ * ` rabbitmq_raft_log_last_written_index ` -> ` rabbitmq_raft_last_written_index `
461+
462+ * ` rabbitmq_raft_entry_commit_latency_seconds ` has been removed; it was an average latency across all Ra clusters
463+ in all Ra systems (RabbitMQ currently uses two separate Ra systems: one for quorum queues and one for internal
464+ components, currently Khepri and Stream Coordinator); it was therefore not very useful, since different
465+ components can have very different latencies
466+
467+ * ` rabbitmq_raft_commit_latency_seconds ` was added; in case of aggregated metrics, it is only reported for
468+ internal components (currently Khepri and Stream Coordinator)
469+
470+ * ` rabbitmq_raft_max_commit_latency_seconds ` has been added; it's the highest commit latency reported by any
471+ of the quorum queues. When it's high, per-object can be used to find which specific queue reports high commit latency
472+
473+ ### Per-object metrics (/metrics/per-object endpoint)
474+
475+ More metrics are reported for each queue than in older versions.
476+
477+ Incorrect metric names were corrected as described above.
478+
479+ Additionally:
480+ * ` rabbitmq_raft_term_total ` has been renamed to ` rabbitmq_raft_term ` (the "total" suffix
481+ was incorrect and misleading, since the metrics is reported for each specific Ra cluster)
482+
483+ * ` rabbitmq_raft_num_segments ` was added; it reports the number of segment files of the internal components
484+ and for each quorum queue
485+
486+ ### Detailed metrics (/metrics/detailed endpoint)
487+
488+ When the detailed endpoints is scraped with ` family=ra_metrics ` parameter,
489+ more metrics are reported for each queue than in older versions.
490+
491+ Incorrect metric names were corrected as described above.
492+
493+ Additionally:
494+ * ` rabbitmq_raft_term_total ` has been renamed to ` rabbitmq_raft_term ` (the "total" suffix
495+ was incorrect and misleading, since the metrics is reported for each specific Ra cluster)
496+
497+ * ` rabbitmq_raft_num_segments ` was added; it reports the number of segment files of the internal components
498+ and for each quorum queue
499+
410500## Source Code Archives
411501
412502To obtain source code of the entire distribution, please download the archive named ` rabbitmq-server-4.2.0.tar.xz `
0 commit comments