Skip to content

Commit 691a232

Browse files
committed
Address missing metrics and tracing mention
1 parent b965764 commit 691a232

File tree

2 files changed

+44
-11
lines changed

2 files changed

+44
-11
lines changed

articles/iot-operations/manage-mqtt-broker/howto-broker-diagnostics.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,35 +16,33 @@ ms.date: 11/14/2024
1616
> [!IMPORTANT]
1717
> This setting requires modifying the Broker resource and can only be configured at initial deployment time using the Azure CLI or Azure Portal. A new deployment is required if Broker configuration changes are needed. To learn more, see [Customize default Broker](./overview-broker.md#customize-default-broker).
1818
19-
Diagnostic settings allow you to configure metrics, tracing, logging, and self-check for the MQTT broker.
19+
Diagnostic settings allow you to configure metrics, logs, and self-check for the MQTT broker.
2020

2121
## Metrics
2222

2323
Metrics provide information about the current and past health and status of the MQTT broker. These metrics are emitted in OpenTelemetry format (OTLP). They can be converted to Prometheus format using an OpenTelemetry Collector and routed to Azure Managed Grafana Dashboards using Azure Monitor Managed Service for Prometheus. To learn more, see [Configure observability and monitoring](../configure-observability-monitoring/howto-configure-observability.md).
2424

2525
For the full list of metrics available, see [MQTT broker metrics](../reference/observability-metrics-mqtt-broker.md).
2626

27-
## Logs and traces
27+
## Logs
2828

2929
Logs provide information about the operations performed by MQTT broker. These logs are available in the Kubernetes cluster as container logs. They can be configured to be sent to Azure Monitor Logs with Container Insights.
3030

31-
Traces are for [distributed tracing](https://opentelemetry.io/docs/concepts/signals/traces/) and provide detailed information about the requests and responses handled by MQTT broker. These traces can be sent to Azure Monitor through OpenTelemetry Collector.
32-
3331
To learn more, see [Configure observability and monitoring](../configure-observability-monitoring/howto-configure-observability.md).
3432

3533
## Self-check
3634

37-
The MQTT Broker's self-check mechanism is enabled by default. It uses the Diagnostics Probe and OpenTelemetry (OTel) traces to monitor the broker. The probe sends test messages to check the system's behavior and timing.
35+
The MQTT broker's self-check mechanism is enabled by default. It uses a diagnostics probe and OpenTelemetry (OTel) traces to monitor the broker. The probe sends test messages to check the system's behavior and timing.
3836

3937
The validation process checks if the system works correctly by comparing the test results with expected outcomes. These outcomes include:
4038

4139
1. The paths messages take through the system.
4240
2. The system's timing behavior.
4341

44-
The Diagnostics Probe periodically executes MQTT operations (PING, CONNECT, PUBLISH, SUBSCRIBE, UNSUBSCRIBE) on the AIO Broker and monitors the corresponding ACKs and traces to check for latency, message loss, and correctness of the replication protocol.
42+
The diagnostics probe periodically executes MQTT operations (PING, CONNECT, PUBLISH, SUBSCRIBE, UNSUBSCRIBE) on the MQTT broker and monitors the corresponding ACKs and traces to check for latency, message loss, and correctness of the replication protocol.
4543

4644
> [!IMPORTANT]
47-
> The self-check Diagnostics Probe publishes messages to the `azedge/dmqtt/selftest` topic. Don't publish or subscribe to diagnostic probe topics that start with `azedge/dmqtt/selftest`. Publishing or subscribing to these topics might affect the probe or self-test checks resulting in invalid results. Invalid results might be listed in diagnostic probe logs, metrics, or dashboards. For example, you might see the issue *Path verification failed for probe event with operation type 'Publish'* in the diagnostics-probe logs. For more information, see [Known Issues](../troubleshoot/known-issues.md#mqtt-broker).
45+
> The self-check diagnostics probe publishes messages to the `azedge/dmqtt/selftest` topic. Don't publish or subscribe to diagnostic probe topics that start with `azedge/dmqtt/selftest`. Publishing or subscribing to these topics might affect the probe or self-test checks resulting in invalid results. Invalid results might be listed in diagnostic probe logs, metrics, or dashboards. For example, you might see the issue *Path verification failed for probe event with operation type 'Publish'* in the diagnostics-probe logs. For more information, see [Known Issues](../troubleshoot/known-issues.md#mqtt-broker).
4846
4947
## Change diagnostics settings
5048

articles/iot-operations/reference/observability-metrics-mqtt-broker.md

Lines changed: 39 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: kgremban
66
ms.topic: reference
77
ms.custom:
88
- ignite-2023
9-
ms.date: 11/14/2024
9+
ms.date: 11/15/2024
1010

1111
# CustomerIntent: As an IT admin or operator, I want to be able to monitor and visualize data
1212
# on the health of my industrial assets and edge environment.
@@ -26,7 +26,7 @@ For example, if the self-check probe connects with `metriccategory=broker_selfte
2626

2727
This feature helps dashboards show traffic sources without the high cardinality issues of tagging metrics with topics.
2828

29-
Sessions without a `metriccategory` are tagged as "category=uncategorized".
29+
Sessions without a `metriccategory` are tagged as "category=uncategorized."
3030

3131
## Messaging metrics
3232

@@ -49,12 +49,47 @@ All metrics include the `hostname` tag to identify the pod that generated the me
4949
| aio_broker_store_retained_bytes | This metric counts how many bytes are stored via retained messages on the broker. | |
5050
| aio_broker_store_will_messages | This metric counts how many will messages are stored on the broker. | |
5151
| aio_broker_store_will_bytes | This metric counts how many bytes are stored via will messages on the broker. | |
52+
| aio_broker_number_of_routes | Counts number of routes. | |
53+
| aio_broker_connect_route_replication_correctness | Describes if a connection request from a self test client is replicated correctly along a specific route. | |
54+
| aio_broker_connect_latency_route_ms | Describes the time interval between a self test client sending a CONNECT packet and receiving a CONNACK packet. This metric is generated per route. The metric is generated only if a CONNECT is successful. | |
55+
| aio_broker_connect_latency_last_value_ms | An estimated p99 of `connect_latency_route_ms`. | |
56+
| aio_broker_connect_latency_mu_ms | The mean value of `connect_latency_route_ms`. | |
57+
| aio_broker_connect_latency_sigma_ms | The standard deviation of `connect_latency_route_ms`. | |
58+
| aio_broker_subscribe_route_replication_correctness | Describes if a subscribe request from a self test client is replicated correctly along a specific route. | |
59+
| aio_broker_subscribe_latency_route_ms | Describes time interval between a self test client sending a SUBSCRIBE packet and receiving a SUBACK packet. This metric is generated per route. The metric is generated only if a SUBSCRIBE is successful. | |
60+
| aio_broker_subscribe_latency_last_value_ms | An estimated p99 of `subscribe_latency_route_ms`. | |
61+
| aio_broker_subscribe_latency_mu_ms | The mean value of `subscribe_latency_route_ms`. | |
62+
| aio_broker_subscribe_latency_sigma_ms | The standard deviation of `subscribe_latency_route_ms`. | |
63+
| aio_broker_unsubscribe_route_replication_correctness | Describes if an unsubscribe request from a self test client is replicated correctly along a specific route. | |
64+
| aio_broker_unsubscribe_latency_route_ms | Describes the time interval between a self test client sending a UNSUBSCRIBE packet and receiving a UNSUBACK packet. This metric is generated per route. The metric is generated only if a UNSUBSCRIBE is successful. | |
65+
| aio_broker_unsubscribe_latency_last_value_ms | An estimated p99 of `unsubscribe_latency_route_ms`. | |
66+
| aio_broker_unsubscribe_latency_mu_ms | The mean value of `unsubscribe_latency_route_ms`. | |
67+
| aio_broker_unsubscribe_latency_sigma_ms | The standard deviation of `subscribe_latency_route_ms`. | |
68+
| aio_broker_publish_route_replication_correctness | Describes if an unsubscribe request from a self test client is replicated correctly along a specific route. | |
69+
| aio_broker_publish_latency_route_ms | Describes the time interval between a self test client sending a PUBLISH packet and receiving a PUBACK packet. This metric is generated per route. The metric is generated only if a PUBLISH is successful. | |
70+
| aio_broker_publish_latency_last_value_ms | An estimated p99 of `publish_latency_route_ms`. | |
71+
| aio_broker_publish_latency_mu_ms | The mean value of `publish_latency_route_ms`. | |
72+
| aio_broker_publish_latency_sigma_ms | The standard deviation of `publish_latency_route_ms`. | |
73+
| aio_broker_payload_check_latency_last_value_ms | An estimated p99 of latency check of the last value. | |
74+
| aio_broker_payload_check_latency_mu_ms | The mean value of latency check. | |
75+
| aio_broker_payload_check_latency_sigma_ms | The standard deviation of latency of the payload. | |
76+
| aio_broker_payload_check_total_messages_lost | The count of payload total message lost. | |
77+
| aio_broker_payload_check_total_messages_received | The count of total number of messages received. | |
78+
| aio_broker_payload_check_total_messages_sent | The count of total number of messages sent. | |
79+
| aio_broker_ping_correctness | Describes whether the ping from self-test client works correctly. | |
80+
| aio_broker_ping_latency_last_value_ms | An estimated p99 of ping operation of the last value. | |
81+
| aio_broker_ping_latency_mu_ms | The mean value of ping check. | |
82+
| aio_broker_ping_latency_route_ms | The ping latency in milliseconds for a specific route. | |
83+
| aio_broker_ping_latency_sigma_ms | The standard deviation of latency of the ping operation. | |
84+
| aio_broker_publishes_processed_count | Describes the processed counts of message published. | |
85+
| aio_broker_publishes_received_per_second | Counts the number of published messages received per second. | |
86+
| aio_broker_publishes_sent_per_second | Counts the number of sent messages received per second. | |
5287

5388
## Broker operator health metrics
5489

5590
This set of metrics tracks the [cardinality state of the broker](../manage-mqtt-broker/howto-configure-availability-scale.md). Each desired metric is paired with a reported metric to show the current state. These metrics indicate the number of healthy pods from the broker's perspective, which might differ from Kubernetes' reports.
5691

57-
For example, if a backend node restarts but doesn't reconnect to its chain, there can be a discrepancy in health reports. Kubernetes might report the pod as healthy, while the broker reports it as down because it is not functioning properly.
92+
For example, if a backend node restarts but doesn't reconnect to its chain, there can be a discrepancy in health reports. Kubernetes might report the pod as healthy, while the broker reports it as down because it isn't functioning properly.
5893

5994
| Desired Metric | Reported Metric |
6095
|----------------|-----------------|
@@ -88,7 +123,7 @@ This set of metrics tracks the overall state of the [state store](../create-edge
88123
| aio_broker_state_store_insertions | This metric counts the number of new key insert requests received, including both successful insertions and errors. | |
89124
| aio_broker_state_store_keynotify_requests | This metric counts the number of requests to monitor key changes (KEYNOTIFY) received, including both successful modifications and errors. | |
90125
| aio_broker_state_store_modifications | This metric counts the number of modify key requests received, including both successful modifications and errors. | |
91-
| aio_broker_state_store_notifications_sent | This metric counts the number of notification messages the state store sends when a key value changes and a client is registered via KEYNOTIFY. | |
126+
| aio_broker_state_store_notifications_sent | This metric counts the number of notification messages the state store sends when a key value changes and a client are registered via KEYNOTIFY. | |
92127
| aio_broker_state_store_retrievals | This metric counts the number of key value retrieval requests received, including both successful retrievals and errors. | |
93128

94129
## Disk-backed message buffer metrics

0 commit comments

Comments
 (0)