Timeout errors and new connections spiking

Hello. We are seeing a strange behavior with our Redis cache cluster where we occasionally get a burst of timeout errors on one node, and at the same time we see a huge spike in new connections on that node. We're not sure yet in what order those things are happening. The cluster is hosted in AWS, and we usually see a "network bandwidth exceeded" metric firing on the node in question at the same time.

I'm reaching out because the errors have been pretty confounding to debug, so if this behavior makes sense to any contributors it might help steer me in the right direction.

We use a `socket_connect_timeout` of 1s and a `socket_timeout` of 100ms. We have `retry` set to to an exponential backoff with a base of 10ms, a cap of 200ms, but currently only a single retry attempt.

Some questions/thoughts I have:
* Does the retry policy apply to establishing a connection, or just to executing commands?
* Does the client try to re-establish connections when a certain number of errors have been encountered? I'm trying to track down if the new connections we're seeing are a result of the timeouts or vice-versa. If the errors are **not** causing re-connection, are there any other situations that the client tries to re-connect?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timeout errors and new connections spiking #3800

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeout errors and new connections spiking #3800

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions