Lack of request retries

We have a few Buck2 users who often face this class of issue:

```
Error: (status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }: transport error: connection error: stream closed because of a broken pipe: stream closed because of a broken pipe)
...
Error: (Failed to make BatchReadBlobs request: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }: transport error: connection error: connection reset: connection reset) 
```

The root cause of this is that the server/load balancer, which was serving the requests, went through a rotation and disrupted the grpc request/tcp connection. In this case, we typically expect the remote cache client to initiate retries several times to get a new remote cache server instance to serve the request.

It seems that currently Buck2 remote cache client currently does not have such retry logic implemented. I assume this is because Meta operates a node/host-level cache proxy and has buck2 daemons connect to the local proxy instead of connecting to the cache server directly?

It would be nice to get the retry logic implemented for all remote cache / RBE requests so that buck2 daemon can operate reliably with other cache server implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of request retries #1136

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lack of request retries #1136

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions