-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Overview of the Issue
When a client connects to VTGate using @replica (global routing without specifying a keyspace) and executes a SET query, the query fails with no healthy tablet available if the alphabetically first serving keyspace has no REPLICA tablets.
There's different scenarios how users can run into this error. This can either happen on every SET query, if the alphabetically first keyspace does not have tablets of the requested type. Or it can happen only during specific conditions like a PRS / ERS / downtime, where the alphabetically first keyspace has tablets of the requested type but is unavailable for a short moment - thus causing the next keyspace in the list to be selected (which then might not have tablets of the requested type).
The issue is in how AnyKeyspace() selects a keyspace when no specific keyspace is targeted:
-
GetServingKeyspaces()(go/vt/discovery/keyspace_events.go:763-775) returns all keyspaces that have at least one serving shard, regardless of tablet type:func (kew *KeyspaceEventWatcher) GetServingKeyspaces() []string { for ksName, state := range kew.keyspaces { if state.isServing() { // Only checks if ANY shard is serving servingKeyspaces = append(servingKeyspaces, ksName) } } return servingKeyspaces }
-
isServing()(go/vt/discovery/keyspace_events.go:606-616) returnstrueif any shard hasserving: true, without considering tablet type:func (kss *keyspaceState) isServing() bool { for _, state := range kss.shards { if state.serving { return true // Doesn't check tablet type! } } return false }
-
AnyKeyspace()(go/vt/vtgate/executorcontext/vcursor_impl.go:600-623) selects the first keyspace from the sorted list:func (vc *VCursorImpl) AnyKeyspace() (*vindexes.Keyspace, error) { keyspaces := vc.getSortedServingKeyspaces() // Sorted alphabetically for _, ks := range keyspaces { if ks.Sharded { return ks, nil // Returns first sharded keyspace } } return keyspaces[0], nil // Or first keyspace if none sharded }
-
SET query execution uses the VCursor's tablet type (REPLICA in this case) when resolving destinations:
// vcursor_impl.go:1003 rss, values, err := vc.resolver.ResolveDestinations(ctx, keyspace, vc.tabletType, ids, destinations)
Reproduction Steps
N/A
Binary Version
N/AOperating System and Environment details
N/ALog Fragments
N/A