Skip to content

Commit 5a38c74

Browse files
Merge pull request #267636 from shpathak-msft/patch-28
Update cache-how-to-monitor.md
2 parents 7211d34 + e4df43f commit 5a38c74

File tree

1 file changed

+24
-20
lines changed

1 file changed

+24
-20
lines changed

articles/azure-cache-for-redis/cache-how-to-monitor.md

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Monitor Azure Cache for Redis
3-
description: Learn how to monitor the health and performance your Azure Cache for Redis instances
3+
description: Learn how to monitor the health and performance your Azure Cache for Redis instances.
44
author: flang-msft
55
ms.author: franlanglois
66
ms.service: cache
@@ -18,7 +18,7 @@ Use Azure Monitor to:
1818
- pin metrics charts to the dashboard
1919
- customize the date and time range of monitoring charts
2020
- add and remove metrics from the charts
21-
- and set alerts when certain conditions are met
21+
- set alerts when certain conditions are met
2222

2323
Metrics for Azure Cache for Redis instances are collected using the Redis [`INFO`](https://redis.io/commands/info) command. Metrics are collected approximately two times per minute and automatically stored for 30 days so they can be displayed in the metrics charts and evaluated by alert rules.
2424

@@ -58,7 +58,7 @@ For scenarios where you don't need the full flexibility of Azure Monitor for Azu
5858

5959
## Use Insights for predefined charts
6060

61-
The **Monitoring** section in the Resource menu contains **Insights**. When you select **Insights**, you see groupings of three types of charts: **Overview**, **Performance** and **Operations**.
61+
The **Monitoring** section in the Resource menu contains **Insights**. When you select **Insights**, you see groupings of three types of charts: **Overview**, **Performance**, and **Operations**.
6262

6363
:::image type="content" source="./media/cache-how-to-monitor/cache-monitoring-part.png" alt-text="Screenshot showing Monitoring Insights selected in the Resource menu.":::
6464

@@ -83,6 +83,7 @@ Configure a storage account to use with to store your metrics. The storage accou
8383
1. Under the table heading **metric**, check box beside the line items you want to store, such as **AllMetrics**. Specify a **Retention (days)** policy. The maximum days retention you can specify is **365 days**. However, if you want to keep the metrics data forever, set **Retention (days)** to **0**.
8484

8585
1. Select **Save**.
86+
8687
:::image type="content" source="./media/cache-how-to-monitor/cache-diagnostics.png" alt-text="Redis diagnostics":::
8788

8889
>[!NOTE]
@@ -110,8 +111,8 @@ In the Resource menu on the left, select **Metrics** under **Monitoring**. Here,
110111
When you're seeing the aggregation type:
111112

112113
- **Count** show 2, it indicates the metric received 2 data points for your time granularity (1 minute).
113-
- **Max** shows the maximum value of a data point in the time granularity,
114-
- **Min** shows the minimum value of a data point in the time granularity,
114+
- **Max** shows the maximum value of a data point in the time granularity.
115+
- **Min** shows the minimum value of a data point in the time granularity.
115116
- **Average** shows the average value of all data points in the time granularity.
116117
- **Sum** shows the sum of all data points in the time granularity and might be misleading depending on the specific metric.
117118

@@ -135,7 +136,7 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
135136
- Depicts the worst-case (99th percentile) latency of server-side commands. Measured by issuing `PING` commands from the load balancer to the Redis server and tracking the time to respond.
136137
- Useful for tracking the health of your Redis instance. Latency increases if the cache is under heavy load or if there are long running commands that delay the execution of the `PING` command.
137138
- This metric is only available in Standard and Premium tier caches.
138-
- This metric is not available for caches that are affected by Cloud Service retirement. See more information [here](cache-faq.yml#caches-with-a-dependency-on-cloud-services--classic)
139+
- This metric isn't available for caches that are affected by Cloud Service retirement. See more information [here](cache-faq.yml#caches-with-a-dependency-on-cloud-services--classic)
139140
- Cache Latency (preview)
140141
- The latency of the cache calculated using the internode latency of the cache. This metric is measured in microseconds, and has three dimensions: `Avg`, `Min`, and `Max`. The dimensions represent the average, minimum, and maximum latency of the cache during the specified reporting interval.
141142
- Cache Misses
@@ -158,17 +159,20 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
158159
- The CPU utilization of the Azure Cache for Redis server as a percentage during the specified reporting interval. This value maps to the operating system `\Processor(_Total)\% Processor Time` performance counter. Note: This metric can be noisy due to low priority background security processes running on the node, so we recommend monitoring Server Load metric to track load on a Redis server.
159160
- Errors
160161
- Specific failures and performance issues that the cache could be experiencing during a specified reporting interval. This metric has eight dimensions representing different error types, but could add more in the future. The error types represented now are as follows:
161-
- **Failover** – when a cache fails over (subordinate promotes to primary)
162-
- **Dataloss** – when there's data loss on the cache
163-
- **UnresponsiveClients** – when the clients aren't reading data from the server fast enough, and specifically, when the number of bytes in the Redis server output buffer for a client goes over 1,000,000 bytes
164-
- **AOF** – when there's an issue related to AOF persistence
165-
- **RDB** – when there's an issue related to RDB persistence
166-
- **Import** – when there's an issue related to Import RDB
167-
- **Export** – when there's an issue related to Export RDB
168-
- **AADAuthenticationFailure** (preview) - when there's an authentication failure using Microsoft Entra access token
169-
- **AADTokenExpired** (preview) - when a Microsoft Entra access token used for authentication isn't renewed and it expires.
162+
- **Failover** – when a cache fails over (subordinate promotes to primary).
163+
- **Dataloss** – when there's data loss on the cache.
164+
- **UnresponsiveClients** – when the clients aren't reading data from the server fast enough, and specifically, when the number of bytes in the Redis server output buffer for a client goes over 1,000,000 bytes.
165+
- **AOF** – when there's an issue related to AOF persistence.
166+
- **RDB** – when there's an issue related to RDB persistence.
167+
- **Import** – when there's an issue related to Import RDB.
168+
- **Export** – when there's an issue related to Export RDB.
169+
- **AADAuthenticationFailure** (preview) - when there's an authentication failure using Microsoft Entra access token. Not recommended. Use **MicrosoftEntraAuthenticationFailure** instead.
170+
- **AADTokenExpired** (preview) - when a Microsoft Entra access token used for authentication isn't renewed and it expires. Not recommended. Use **MicrosoftEntraTokenExpired** instead.
171+
- **MicrosoftEntraAuthenticationFailure** (preview) - when there's an authentication failure using Microsoft Entra access token.
172+
- **MicrosoftEntraTokenExpired** (preview) - when a Microsoft Entra access token used for authentication isn't renewed and it expires.
173+
170174
> [!NOTE]
171-
> Metrics for errors aren't available when using the Enterprise Tiers.
175+
> Metrics for errors aren't available when using the Enterprise tiers.
172176
173177
- Evicted Keys
174178
- The number of items evicted from the cache during the specified reporting interval because of the `maxmemory` limit.
@@ -215,9 +219,9 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
215219
- If the geo-replication link is unhealthy for over an hour, [file a support request](../azure-portal/supportability/how-to-create-azure-support-request.md).
216220

217221
- Gets
218-
- Sum of the number of get commands run on the cache during the specified reporting interval. This is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _get_ family, including `GET`, `HGET` , `MGET`, and others. This value can differ from the total number of hits and misses because some individual commands access multiple keys. For example: `MGET key1 key2 key3` only increments the number of gets by one but increments the combined number of hits and misses by three.
222+
- Sum of the number of get commands run on the cache during the specified reporting interval. The sum is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _get_ family, including `GET`, `HGET` , `MGET`, and others. This value can differ from the total number of hits and misses because some individual commands access multiple keys. For example: `MGET key1 key2 key3` only increments the number of gets by one but increments the combined number of hits and misses by three.
219223
- Operations per Second
220-
- The total number of commands processed per second by the cache server during the specified reporting interval. This value maps to "instantaneous_ops_per_sec" from the Redis INFO command.
224+
- The total number of commands processed per second by the cache server during the specified reporting interval. This value maps to "instantaneous_ops_per_sec" from the Redis INFO command.
221225
- Server Load
222226
- The percentage of CPU cycles in which the Redis server is busy processing and _not waiting idle_ for messages. If this counter reaches 100, the Redis server has hit a performance ceiling, and the CPU can't process work any faster. You can expect a large latency effect. If you're seeing a high Redis Server Load, such as 100, because you're sending many expensive commands to the server, then you might see timeout exceptions in the client. In this case, you should consider scaling up, scaling out to a Premium cluster, or partitioning your data into multiple caches. When _Server Load_ is only moderately high, such as 50 to 80 percent, then average latency usually remains low, and timeout exceptions could have other causes than high server latency.
223227
- The _Server Load_ metric is sensitive to other processes on the machine using the existing CPU cycles that reduce the Redis server's idle time. For example, on the _C1_ tier, background tasks such as virus scanning cause _Server Load_ to spike higher for no obvious reason. We recommended that you pay attention to other metrics such as operations, latency, and CPU, in addition to _Server Load_.
@@ -226,7 +230,7 @@ In contrast, for clustered caches, we recommend using the metrics with the suffi
226230
> The _Server Load_ metric can present incorrect data for Enterprise and Enterprise Flash tier caches. Sometimes _Server Load_ is represented as being over 100. We are investigating this issue. We recommend using the CPU metric instead in the meantime.
227231
228232
- Sets
229-
- Sum of the number of set commands run on the cache during the specified reporting interval. This is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _set_ family, including `SET`, `HSET` , `MSET`, and others.
233+
- Sum of the number of set commands run on the cache during the specified reporting interval. This sum is a combined total of the increases in the `cmdstat` counts reported by the Redis INFO all command for all commands in the _set_ family, including `SET`, `HSET` , `MSET`, and others.
230234
- Total Keys
231235
- The maximum number of keys in the cache during the past reporting time period. This number maps to `keyspace` from the Redis INFO command. Because of a limitation in the underlying metrics system for caches with clustering enabled, Total Keys return the maximum number of keys of the shard that had the maximum number of keys during the reporting interval.
232236
- Total Operations
@@ -266,7 +270,7 @@ The two workbooks provided are:
266270
- **Azure Cache For Redis Resource Overview** combines many of the most commonly used metrics so that the health and performance of the cache instance can be viewed at a glance.
267271
:::image type="content" source="media/cache-how-to-monitor/cache-monitoring-resource-overview.png" alt-text="Screenshot of graphs showing a resource overview for the cache.":::
268272

269-
- **Geo-Replication Dashboard** pulls geo-replication health and status metrics from both the geo-primary and geo-secondary cache instances to give a complete picture of geo-replcation health. Using this dashboard is recommended, as some geo-replication metrics are only emitted from either the geo-primary or geo-secondary.
273+
- **Geo-Replication Dashboard** pulls geo-replication health and status metrics from both the geo-primary and geo-secondary cache instances to give a complete picture of geo-replication health. Using this dashboard is recommended, as some geo-replication metrics are only emitted from either the geo-primary or geo-secondary.
270274
:::image type="content" source="media/cache-how-to-monitor/cache-monitoring-geo-dashboard.png" alt-text="Screenshot showing the geo-replication dashboard with a geo-primary and geo-secondary cache set.":::
271275

272276
## Related content

0 commit comments

Comments
 (0)