You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pickfirstleaf: Avoid getting stuck in IDLE on connection breakage (#8615)
Related issue: b/415354418
## Problem
On connection breakage, the pickfirst leaf balancer enters idle and
returns an `Idle picker` that calls the balancer's `ExitIdle` method
only the first time `Pick` is called. The following sequence of events
will cause the balancer to get stuck in `Idle` state:
1. Existing connection breaks, SubConn [requests re-resolution and
reports
IDLE](https://github.com/grpc/grpc-go/blob/bb71072094cf533965450c44890f8f51c671c393/clientconn.go#L1388-L1393).
In turn PF updates the ClientConn state to IDLE with an `Idle picker`.
1. An RPC is made, triggering `balancer.ExitIdle` through the idle
picker. The balancer attempts to re-connect the failed SubConn.
1. The resolver produces a new endpoint list, removing the endpoint used
by the existing SubConn. PF removes the existing SubConn. Since the
balancer didn't update the ClientConn state to CONNECTING yet, pickfirst
thinks that it's still in IDLE and doesn't start connecting to the new
endpoints.
1. New RPC requests trigger the idle picker, but it's a no-op since it
only [triggers the balancer's ExitIdle method
once](https://github.com/grpc/grpc-go/blob/bb71072094cf533965450c44890f8f51c671c393/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go#L663https://github.com/grpc/grpc-go/blob/bb71072094cf533965450c44890f8f51c671c393/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go#L663).
## Fix
This change moves the ClientConn into Connecting immediately when the
`ExitIdle` method is called. This ensures that the balancer continues to
re-connect when a new endpoint list is produced by the resolver.
RELEASE NOTES:
* balancer/pickfirst: Fix bug that can cause balancer to get stuck in
`IDLE` state on connection failure.
0 commit comments