Skip to content

Commit 52b3cd2

Browse files
authored
Merge pull request #181282 from flang-msft/fxl---fix-up-troubleshooting-documents
Fxl---fix up troubleshooting documents v2
2 parents 15a7c15 + ea02039 commit 52b3cd2

10 files changed

+384
-203
lines changed

articles/azure-cache-for-redis/TOC.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -172,8 +172,6 @@
172172
href: cache-how-to-monitor.md#operations-and-alerts
173173
- name: Monitor with diagnostic logs
174174
href: cache-monitor-diagnostic-settings.md
175-
- name: Monitoring FAQs
176-
href: cache-monitor-troubleshoot-faq.yml
177175
- name: Scale
178176
items:
179177
- name: Update to a different size and tier
@@ -182,12 +180,14 @@
182180
href: cache-how-to-premium-clustering.md
183181
- name: Diagnose and troubleshoot
184182
items:
183+
- name: Troubleshoot connectivity issues
184+
href: cache-troubleshoot-connectivity.md
185+
- name: Troubleshoot latency and timeouts
186+
href: cache-troubleshoot-timeouts.md
187+
- name: Troubleshoot client
188+
href: cache-troubleshoot-client.md
185189
- name: Troubleshoot Redis server
186190
href: cache-troubleshoot-server.md
187-
- name: Troubleshoot Redis client
188-
href: cache-troubleshoot-client.md
189-
- name: Troubleshoot timeouts
190-
href: cache-troubleshoot-timeouts.md
191191
- name: Troubleshoot data loss
192192
href: cache-troubleshoot-data-loss.md
193193
- name: Troubleshooting FAQs

articles/azure-cache-for-redis/cache-best-practices-development.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,13 @@ This request/response is a difficult one to measure. You could instrument your c
3939
Resolutions for large response sizes are varied but include:
4040

4141
- Optimize your application for a large number of small values, rather than a few large values.
42-
- The preferred solution is to break up your data into related smaller values.
43-
- See the post [What is the ideal value size range for redis? Is 100 KB too large?](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details on why smaller values are recommended.
42+
- The preferred solution is to break up your data into related smaller values.
43+
- See the post [What is the ideal value size range for redis? Is 100 KB too large?](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details on why smaller values are recommended.
4444
- Increase the size of your VM to get higher bandwidth capabilities
45-
- More bandwidth on your client or server VM may reduce data transfer times for larger responses.
46-
- Compare your current network usage on both machines to the limits of your current VM size. More bandwidth on only the server or only on the client may not be enough.
45+
- More bandwidth on your client or server VM may reduce data transfer times for larger responses.
46+
- Compare your current network usage on both machines to the limits of your current VM size. More bandwidth on only the server or only on the client may not be enough.
4747
- Increase the number of connection objects your application uses.
48-
- Use a round-robin approach to make requests over different connection objects.
48+
- Use a round-robin approach to make requests over different connection objects.
4949

5050
## Key distribution
5151

@@ -57,7 +57,7 @@ Try to choose a Redis client that supports [Redis pipelining](https://redis.io/t
5757

5858
## Avoid expensive operations
5959

60-
Some Redis operations, like the [KEYS](https://redis.io/commands/keys) command, are expensive and should be avoided. For some considerations around long running commands, see [long-running commands](cache-troubleshoot-server.md#long-running-commands)
60+
Some Redis operations, like the [KEYS](https://redis.io/commands/keys) command, are expensive and should be avoided. For some considerations around long running commands, see [long-running commands](cache-troubleshoot-timeouts.md#long-running-commands).
6161

6262
## Choose an appropriate tier
6363

articles/azure-cache-for-redis/cache-faq.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ summary: |
1515
* [Planning FAQs](cache-planning-faq.yml)
1616
* [Development FAQs](cache-development-faq.yml)
1717
* [Management FAQs](cache-management-faq.yml)
18-
* [Monitoring and troubleshooting FAQs](cache-monitor-troubleshoot-faq.yml)
18+
* [Common monitoring and troubleshooting FAQs](cache-monitor-troubleshoot-faq.yml)
1919
2020
sections:
2121
- name: Ignored

articles/azure-cache-for-redis/cache-monitor-troubleshoot-faq.yml

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
### YamlMime:FAQ
22
metadata:
3-
title: Azure Cache for Redis monitoring and troubleshooting FAQs
4-
description: Learn the answers to common questions that help you monitor and troubleshoot Azure Cache for Redis
3+
title: Azure Cache for Redis common error FAQs
4+
description: Learn the answers to common questions that help you monitor and troubleshoot Azure Cache for Redis.
55
author: flang-msft
66
ms.author: franlanglois
77
ms.service: cache
88
ms.topic: conceptual
9-
ms.date: 08/06/2020
9+
ms.date: 12/01/2021
1010

1111
title: Azure Cache for Redis monitoring and troubleshooting FAQs
1212
summary: This article provides answers to common questions about how to monitor and troubleshoot Azure Cache for Redis.
@@ -49,6 +49,16 @@ sections:
4949
* Server-side causes
5050
* On the standard cache offering, the Azure Cache for Redis service started a fail-over from the primary node to the replica node.
5151
* Azure was patching the instance where the cache was deployed during a Redis server update or general VM maintenance.
52+
53+
- question: |
54+
Why am I seeing "Cache is busy processing a previous update request or is undergoing system maintenance. As such, it is currently unable to accept the update request. Please try again later."
55+
answer: |
56+
This message indicates that a management operation, like scaling or patching, is in progress on your cache. All other management operations are blocked until the ongoing operation is completed. During this time, you can expect your Azure Cache For Redis to be fully functional for client operations.
57+
- question: |
58+
Why is my cache in "Failed" state?
59+
answer: |
60+
Azure Cache For Redis can end up in a *Failed* state if a management operation fails. Despite this state, you can expect your Azure Cache For Redis to be fully functional for client operations.
61+
5262
5363
additionalContent: |
5464
Lines changed: 14 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: Troubleshoot Azure Cache for Redis client-side issues
3-
description: Learn how to resolve common client-side issues with Azure Cache for Redis such as Redis client memory pressure, traffic burst, high CPU, limited bandwidth, large requests or large response size.
2+
title: Troubleshoot Azure Cache for Redis client issues
3+
description: Learn how to resolve common client issues, such as client memory pressure, traffic burst, high CPU, limited bandwidth, large requests, or large response size, when using Azure Cache for Redis.
44
author: flang-msft
55
ms.author: franlanglois
66
ms.service: cache
77
ms.topic: troubleshooting
8-
ms.date: 10/18/2019
8+
ms.date: 12/31/2021
99
---
1010
# Troubleshoot Azure Cache for Redis client-side issues
1111

@@ -15,11 +15,10 @@ This section discusses troubleshooting issues that occur because of a condition
1515
- [Traffic burst](#traffic-burst)
1616
- [High client CPU usage](#high-client-cpu-usage)
1717
- [Client-side bandwidth limitation](#client-side-bandwidth-limitation)
18-
<!-- [Large request or response size](#large-request-or-response-size) -->
1918

2019
## Memory pressure on Redis client
2120

22-
Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of responses from the cache. When memory pressure hits, the system may page data to disk. This _page faulting_ causes the system to slow down significantly.
21+
Memory pressure on the client can lead to performance problems that can delay processing of responses from the cache. When memory pressure hits, the system might page data to disk. This _page faulting_ causes the system to slow down significantly.
2322

2423
To detect memory pressure on the client:
2524

@@ -33,85 +32,28 @@ High memory pressure on the client can be mitigated several ways:
3332

3433
## Traffic burst
3534

36-
Bursts of traffic combined with poor `ThreadPool` settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
37-
38-
Monitor how your `ThreadPool` statistics change over time using [an example `ThreadPoolLogger`](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can use `TimeoutException` messages from StackExchange.Redis like below to further investigate:
39-
40-
```output
41-
System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
42-
IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
43-
```
44-
45-
In the preceding exception, there are several issues that are interesting:
46-
47-
- Notice that in the `IOCP` section and the `WORKER` section you have a `Busy` value that is greater than the `Min` value. This difference means your `ThreadPool` settings need adjusting.
48-
- You can also see `in: 64221`. This value indicates that 64,211 bytes have been received at the client's kernel socket layer but haven't been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn't reading data from the network as quickly as the server is sending it to you.
49-
50-
You can [configure your `ThreadPool` Settings](cache-management-faq.yml#important-details-about-threadpool-growth) to make sure that your thread pool scales up quickly under burst scenarios.
35+
This section was moved. For more information, see [Traffic burst and thread pool configuration](cache-troubleshoot-timeouts.md#traffic-burst-and-thread-pool-configuration).
5136

5237
## High client CPU usage
5338

54-
High client CPU usage indicates the system can't keep up with the work it's been asked to do. Even though the cache sent the response quickly, the client may fail to process the response in a timely fashion.
55-
56-
Monitor the client's system-wide CPU usage using metrics available in the Azure portal or through performance counters on the machine. Be careful not to monitor *process* CPU because a single process can have low CPU usage but the system-wide CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. High CPU may also cause high `in: XXX` values in `TimeoutException` error messages as described in the [Traffic burst](#traffic-burst) section.
57-
58-
> [!NOTE]
59-
> StackExchange.Redis 1.1.603 and later includes the `local-cpu` metric in `TimeoutException` error messages. Ensure you using the latest version of the [StackExchange.Redis NuGet package](https://www.nuget.org/packages/StackExchange.Redis/). There are bugs constantly being fixed in the code to make it more robust to timeouts so having the latest version is important.
60-
>
61-
62-
To mitigate a client's high CPU usage:
63-
64-
- Investigate what is causing CPU spikes.
65-
- Upgrade your client to a larger VM size with more CPU capacity.
39+
This section was moved. For more information, see [High CPU on client hosts](cache-troubleshoot-timeouts.md#high-cpu-on-client-hosts).
6640

6741
## Client-side bandwidth limitation
6842

69-
Depending on the architecture of client machines, they may have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth by overloading network capacity, then data isn't processed on the client side as quickly as the server is sending it. This situation can lead to timeouts.
70-
71-
Monitor how your Bandwidth usage change over time using [an example `BandwidthLogger`](https://github.com/JonCole/SampleCode/blob/master/BandWidthMonitor/BandwidthLogger.cs). This code may not run successfully in some environments with restricted permissions (like Azure web sites).
43+
This section was moved. For more information, see [Network bandwidth limitation on client hosts](cache-troubleshoot-timeouts.md#network-bandwidth-limitation-on-client-hosts).
7244

73-
To mitigate, reduce network bandwidth consumption or increase the client VM size to one with more network capacity.
74-
75-
<!--
76-
## Large request or response Size
77-
78-
A large request/response can cause timeouts. As an example, suppose your timeout value configured on your client is 1 second. Your application requests two keys (for example, 'A' and 'B') at the same time (using the same physical network connection). Most clients support request "pipelining", where both requests 'A' and 'B' are sent one after the other without waiting for their responses. The server sends the responses back in the same order. If response 'A' is large, it can eat up most of the timeout for later requests.
79-
80-
In the following example, request 'A' and 'B' are sent quickly to the server. The server starts sending responses 'A' and 'B' quickly. Because of data transfer times, response 'B' must wait behind response 'A' times out even though the server responded quickly.
81-
82-
```console
83-
|-------- 1 Second Timeout (A)----------|
84-
|-Request A-|
85-
|-------- 1 Second Timeout (B) ----------|
86-
|-Request B-|
87-
|- Read Response A --------|
88-
|- Read Response B-| (**TIMEOUT**)
89-
```
90-
91-
This request/response is a difficult one to measure. You could instrument your client code to track large requests and responses.
92-
93-
Resolutions for large response sizes are varied but include:
94-
95-
1. Optimize your application for a large number of small values, rather than a few large values.
96-
- The preferred solution is to break up your data into related smaller values.
97-
- See the post [What is the ideal value size range for redis? Is 100 KB too large?](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details on why smaller values are recommended.
98-
1. Increase the size of your VM to get higher bandwidth capabilities
99-
- More bandwidth on your client or server VM may reduce data transfer times for larger responses.
100-
- Compare your current network usage on both machines to the limits of your current VM size. More bandwidth on only the server or only on the client may not be enough.
101-
1. Increase the number of connection objects your application uses.
102-
- Use a round-robin approach to make requests over different connection objects.
103-
104-
-->
105-
10645
## High client connections
10746

108-
Client connections reaching the maximum for the cache can cause failures in client requests for connections beyond the maximum, and can also cause high server CPU usage on the cache due to processing repeated reconnection attempts.
47+
When client connections reach the maximum for the cache, you can have failures in client requests for connections beyond the maximum. High client connections can also cause high server load when processing repeated reconnection attempts.
10948

110-
High client connections may indicate a connection leak in client code. Connections may not be getting re-used or closed properly. Review client code for connection use.
49+
High client connections might indicate a connection leak in client code. Connections might not be getting reused or closed properly. Review client code for connection use.
11150

112-
If the high connections are all legitimate and required client connections, upgrading your cache to a size with a higher connection limit may be required.
51+
If the high connections are all legitimate and required client connections, upgrading your cache to a size with a higher connection limit might be required. Check if the `Max aggregate for Connected Clients` metric is close or higher than the maximum number of allowed connections for a particular cache size. For more information on sizing per client connections, see [Azure Cache for Redis performance](cache-planning-faq.yml#azure-cache-for-redis-performance).
11352

11453
## Additional information
11554

116-
- [Troubleshoot Azure Cache for Redis server-side issues](cache-troubleshoot-server.md)
55+
These articles provide more information on troubleshooting and performance testing:
56+
57+
- [Troubleshoot Azure Cache for Redis server issues](cache-troubleshoot-server.md)
58+
- [Troubleshoot Azure Cache for Redis latency and timeouts](cache-troubleshoot-timeouts.md)
11759
- [How can I benchmark and test the performance of my cache?](cache-management-faq.yml#how-can-i-benchmark-and-test-the-performance-of-my-cache-)

0 commit comments

Comments
 (0)