You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-cache-for-redis/cache-how-to-monitor.md
+24-20Lines changed: 24 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Monitor Azure Cache for Redis
3
-
description: Learn how to monitor the health and performance your Azure Cache for Redis instances
3
+
description: Learn how to monitor the health and performance your Azure Cache for Redis instances.
4
4
author: flang-msft
5
5
ms.author: franlanglois
6
6
ms.service: cache
@@ -18,7 +18,7 @@ Use Azure Monitor to:
18
18
- pin metrics charts to the dashboard
19
19
- customize the date and time range of monitoring charts
20
20
- add and remove metrics from the charts
21
-
-and set alerts when certain conditions are met
21
+
- set alerts when certain conditions are met
22
22
23
23
Metrics for Azure Cache for Redis instances are collected using the Redis [`INFO`](https://redis.io/commands/info) command. Metrics are collected approximately two times per minute and automatically stored for 30 days so they can be displayed in the metrics charts and evaluated by alert rules.
24
24
@@ -58,7 +58,7 @@ For scenarios where you don't need the full flexibility of Azure Monitor for Azu
58
58
59
59
## Use Insights for predefined charts
60
60
61
-
The **Monitoring** section in the Resource menu contains **Insights**. When you select **Insights**, you see groupings of three types of charts: **Overview**, **Performance** and **Operations**.
61
+
The **Monitoring** section in the Resource menu contains **Insights**. When you select **Insights**, you see groupings of three types of charts: **Overview**, **Performance**, and **Operations**.
62
62
63
63
:::image type="content" source="./media/cache-how-to-monitor/cache-monitoring-part.png" alt-text="Screenshot showing Monitoring Insights selected in the Resource menu.":::
64
64
@@ -83,6 +83,7 @@ Configure a storage account to use with to store your metrics. The storage accou
83
83
1. Under the table heading **metric**, check box beside the line items you want to store, such as **AllMetrics**. Specify a **Retention (days)** policy. The maximum days retention you can specify is **365 days**. However, if you want to keep the metrics data forever, set **Retention (days)** to **0**.
@@ -110,8 +111,8 @@ In the Resource menu on the left, select **Metrics** under **Monitoring**. Here,
110
111
When you're seeing the aggregation type:
111
112
112
113
-**Count** show 2, it indicates the metric received 2 data points for your time granularity (1 minute).
113
-
-**Max** shows the maximum value of a data point in the time granularity,
114
-
-**Min** shows the minimum value of a data point in the time granularity,
114
+
-**Max** shows the maximum value of a data point in the time granularity.
115
+
-**Min** shows the minimum value of a data point in the time granularity.
115
116
-**Average** shows the average value of all data points in the time granularity.
116
117
-**Sum** shows the sum of all data points in the time granularity and might be misleading depending on the specific metric.
117
118
@@ -135,7 +136,7 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
135
136
- Depicts the worst-case (99th percentile) latency of server-side commands. Measured by issuing `PING` commands from the load balancer to the Redis server and tracking the time to respond.
136
137
- Useful for tracking the health of your Redis instance. Latency increases if the cache is under heavy load or if there are long running commands that delay the execution of the `PING` command.
137
138
- This metric is only available in Standard and Premium tier caches.
138
-
- This metric is not available for caches that are affected by Cloud Service retirement. See more information [here](cache-faq.yml#caches-with-a-dependency-on-cloud-services--classic)
139
+
- This metric isn't available for caches that are affected by Cloud Service retirement. See more information [here](cache-faq.yml#caches-with-a-dependency-on-cloud-services--classic)
139
140
- Cache Latency (preview)
140
141
- The latency of the cache calculated using the internode latency of the cache. This metric is measured in microseconds, and has three dimensions: `Avg`, `Min`, and `Max`. The dimensions represent the average, minimum, and maximum latency of the cache during the specified reporting interval.
141
142
- Cache Misses
@@ -158,17 +159,20 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
158
159
- The CPU utilization of the Azure Cache for Redis server as a percentage during the specified reporting interval. This value maps to the operating system `\Processor(_Total)\% Processor Time` performance counter. Note: This metric can be noisy due to low priority background security processes running on the node, so we recommend monitoring Server Load metric to track load on a Redis server.
159
160
- Errors
160
161
- Specific failures and performance issues that the cache could be experiencing during a specified reporting interval. This metric has eight dimensions representing different error types, but could add more in the future. The error types represented now are as follows:
161
-
-**Failover** – when a cache fails over (subordinate promotes to primary)
162
-
-**Dataloss** – when there's data loss on the cache
163
-
-**UnresponsiveClients** – when the clients aren't reading data from the server fast enough, and specifically, when the number of bytes in the Redis server output buffer for a client goes over 1,000,000 bytes
164
-
-**AOF** – when there's an issue related to AOF persistence
165
-
-**RDB** – when there's an issue related to RDB persistence
166
-
-**Import** – when there's an issue related to Import RDB
167
-
-**Export** – when there's an issue related to Export RDB
168
-
-**AADAuthenticationFailure** (preview) - when there's an authentication failure using Microsoft Entra access token
169
-
-**AADTokenExpired** (preview) - when a Microsoft Entra access token used for authentication isn't renewed and it expires.
162
+
-**Failover** – when a cache fails over (subordinate promotes to primary).
163
+
-**Dataloss** – when there's data loss on the cache.
164
+
-**UnresponsiveClients** – when the clients aren't reading data from the server fast enough, and specifically, when the number of bytes in the Redis server output buffer for a client goes over 1,000,000 bytes.
165
+
-**AOF** – when there's an issue related to AOF persistence.
166
+
-**RDB** – when there's an issue related to RDB persistence.
167
+
-**Import** – when there's an issue related to Import RDB.
168
+
-**Export** – when there's an issue related to Export RDB.
169
+
-**AADAuthenticationFailure** (preview) - when there's an authentication failure using Microsoft Entra access token. Not recommended. Use **MicrosoftEntraAuthenticationFailure** instead.
170
+
-**AADTokenExpired** (preview) - when a Microsoft Entra access token used for authentication isn't renewed and it expires. Not recommended. Use **MicrosoftEntraTokenExpired** instead.
171
+
-**MicrosoftEntraAuthenticationFailure** (preview) - when there's an authentication failure using Microsoft Entra access token.
172
+
-**MicrosoftEntraTokenExpired** (preview) - when a Microsoft Entra access token used for authentication isn't renewed and it expires.
173
+
170
174
> [!NOTE]
171
-
> Metrics for errors aren't available when using the Enterprise Tiers.
175
+
> Metrics for errors aren't available when using the Enterprise tiers.
172
176
173
177
- Evicted Keys
174
178
- The number of items evicted from the cache during the specified reporting interval because of the `maxmemory` limit.
@@ -215,9 +219,9 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
215
219
- If the geo-replication link is unhealthy for over an hour, [file a support request](../azure-portal/supportability/how-to-create-azure-support-request.md).
216
220
217
221
- Gets
218
-
- Sum of the number of get commands run on the cache during the specified reporting interval. This is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _get_ family, including `GET`, `HGET` , `MGET`, and others. This value can differ from the total number of hits and misses because some individual commands access multiple keys. For example: `MGET key1 key2 key3` only increments the number of gets by one but increments the combined number of hits and misses by three.
222
+
- Sum of the number of get commands run on the cache during the specified reporting interval. The sum is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _get_ family, including `GET`, `HGET` , `MGET`, and others. This value can differ from the total number of hits and misses because some individual commands access multiple keys. For example: `MGET key1 key2 key3` only increments the number of gets by one but increments the combined number of hits and misses by three.
219
223
- Operations per Second
220
-
- The total number of commands processed per second by the cache server during the specified reporting interval. This value maps to "instantaneous_ops_per_sec" from the Redis INFO command.
224
+
- The total number of commands processed per second by the cache server during the specified reporting interval. This value maps to "instantaneous_ops_per_sec" from the Redis INFO command.
221
225
- Server Load
222
226
- The percentage of CPU cycles in which the Redis server is busy processing and _not waiting idle_ for messages. If this counter reaches 100, the Redis server has hit a performance ceiling, and the CPU can't process work any faster. You can expect a large latency effect. If you're seeing a high Redis Server Load, such as 100, because you're sending many expensive commands to the server, then you might see timeout exceptions in the client. In this case, you should consider scaling up, scaling out to a Premium cluster, or partitioning your data into multiple caches. When _Server Load_ is only moderately high, such as 50 to 80 percent, then average latency usually remains low, and timeout exceptions could have other causes than high server latency.
223
227
- The _Server Load_ metric is sensitive to other processes on the machine using the existing CPU cycles that reduce the Redis server's idle time. For example, on the _C1_ tier, background tasks such as virus scanning cause _Server Load_ to spike higher for no obvious reason. We recommended that you pay attention to other metrics such as operations, latency, and CPU, in addition to _Server Load_.
@@ -226,7 +230,7 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
226
230
> The _Server Load_ metric can present incorrect data for Enterprise and Enterprise Flash tier caches. Sometimes _Server Load_ is represented as being over 100. We are investigating this issue. We recommend using the CPU metric instead in the meantime.
227
231
228
232
- Sets
229
-
- Sum of the number of set commands run on the cache during the specified reporting interval. This is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _set_ family, including `SET`, `HSET` , `MSET`, and others.
233
+
- Sum of the number of set commands run on the cache during the specified reporting interval. This sum is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _set_ family, including `SET`, `HSET` , `MSET`, and others.
230
234
- Total Keys
231
235
- The maximum number of keys in the cache during the past reporting time period. This number maps to `keyspace` from the Redis INFO command. Because of a limitation in the underlying metrics system for caches with clustering enabled, Total Keys return the maximum number of keys of the shard that had the maximum number of keys during the reporting interval.
232
236
- Total Operations
@@ -266,7 +270,7 @@ The two workbooks provided are:
266
270
-**Azure Cache For Redis Resource Overview** combines many of the most commonly used metrics so that the health and performance of the cache instance can be viewed at a glance.
267
271
:::image type="content" source="media/cache-how-to-monitor/cache-monitoring-resource-overview.png" alt-text="Screenshot of graphs showing a resource overview for the cache.":::
268
272
269
-
-**Geo-Replication Dashboard** pulls geo-replication health and status metrics from both the geo-primary and geo-secondary cache instances to give a complete picture of geo-replcation health. Using this dashboard is recommended, as some geo-replication metrics are only emitted from either the geo-primary or geo-secondary.
273
+
-**Geo-Replication Dashboard** pulls geo-replication health and status metrics from both the geo-primary and geo-secondary cache instances to give a complete picture of geo-replication health. Using this dashboard is recommended, as some geo-replication metrics are only emitted from either the geo-primary or geo-secondary.
270
274
:::image type="content" source="media/cache-how-to-monitor/cache-monitoring-geo-dashboard.png" alt-text="Screenshot showing the geo-replication dashboard with a geo-primary and geo-secondary cache set.":::
0 commit comments