Skip to content

Lack of request retries #1136

@sluongng

Description

@sluongng

We have a few Buck2 users who often face this class of issue:

Error: (status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }: transport error: connection error: stream closed because of a broken pipe: stream closed because of a broken pipe)
...
Error: (Failed to make BatchReadBlobs request: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }: transport error: connection error: connection reset: connection reset) 

The root cause of this is that the server/load balancer, which was serving the requests, went through a rotation and disrupted the grpc request/tcp connection. In this case, we typically expect the remote cache client to initiate retries several times to get a new remote cache server instance to serve the request.

It seems that currently Buck2 remote cache client currently does not have such retry logic implemented. I assume this is because Meta operates a node/host-level cache proxy and has buck2 daemons connect to the local proxy instead of connecting to the cache server directly?

It would be nice to get the retry logic implemented for all remote cache / RBE requests so that buck2 daemon can operate reliably with other cache server implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions