-
Notifications
You must be signed in to change notification settings - Fork 333
Description
We have a few Buck2 users who often face this class of issue:
Error: (status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }: transport error: connection error: stream closed because of a broken pipe: stream closed because of a broken pipe)
...
Error: (Failed to make BatchReadBlobs request: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }: transport error: connection error: connection reset: connection reset)
The root cause of this is that the server/load balancer, which was serving the requests, went through a rotation and disrupted the grpc request/tcp connection. In this case, we typically expect the remote cache client to initiate retries several times to get a new remote cache server instance to serve the request.
It seems that currently Buck2 remote cache client currently does not have such retry logic implemented. I assume this is because Meta operates a node/host-level cache proxy and has buck2 daemons connect to the local proxy instead of connecting to the cache server directly?
It would be nice to get the retry logic implemented for all remote cache / RBE requests so that buck2 daemon can operate reliably with other cache server implementations.