Skip to content

Bug Report: SET Query Fails on @replica When AnyKeyspace Selects Keyspace Without Replica Tablets #19243

@arthurschreiber

Description

@arthurschreiber

Overview of the Issue

When a client connects to VTGate using @replica (global routing without specifying a keyspace) and executes a SET query, the query fails with no healthy tablet available if the alphabetically first serving keyspace has no REPLICA tablets.

There's different scenarios how users can run into this error. This can either happen on every SET query, if the alphabetically first keyspace does not have tablets of the requested type. Or it can happen only during specific conditions like a PRS / ERS / downtime, where the alphabetically first keyspace has tablets of the requested type but is unavailable for a short moment - thus causing the next keyspace in the list to be selected (which then might not have tablets of the requested type).


The issue is in how AnyKeyspace() selects a keyspace when no specific keyspace is targeted:

  1. GetServingKeyspaces() (go/vt/discovery/keyspace_events.go:763-775) returns all keyspaces that have at least one serving shard, regardless of tablet type:

    func (kew *KeyspaceEventWatcher) GetServingKeyspaces() []string {
        for ksName, state := range kew.keyspaces {
            if state.isServing() {  // Only checks if ANY shard is serving
                servingKeyspaces = append(servingKeyspaces, ksName)
            }
        }
        return servingKeyspaces
    }
  2. isServing() (go/vt/discovery/keyspace_events.go:606-616) returns true if any shard has serving: true, without considering tablet type:

    func (kss *keyspaceState) isServing() bool {
        for _, state := range kss.shards {
            if state.serving {
                return true  // Doesn't check tablet type!
            }
        }
        return false
    }
  3. AnyKeyspace() (go/vt/vtgate/executorcontext/vcursor_impl.go:600-623) selects the first keyspace from the sorted list:

    func (vc *VCursorImpl) AnyKeyspace() (*vindexes.Keyspace, error) {
        keyspaces := vc.getSortedServingKeyspaces()  // Sorted alphabetically
        for _, ks := range keyspaces {
            if ks.Sharded {
                return ks, nil  // Returns first sharded keyspace
            }
        }
        return keyspaces[0], nil  // Or first keyspace if none sharded
    }
  4. SET query execution uses the VCursor's tablet type (REPLICA in this case) when resolving destinations:

    // vcursor_impl.go:1003
    rss, values, err := vc.resolver.ResolveDestinations(ctx, keyspace, vc.tabletType, ids, destinations)

Reproduction Steps

N/A

Binary Version

N/A

Operating System and Environment details

N/A

Log Fragments

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions