Skip to content

Fix AnyKeyspace tablet type selection#19399

Open
Devanshusharma2005 wants to merge 5 commits intovitessio:mainfrom
Devanshusharma2005:fix/anykeyspace-replica-tablet-selection
Open

Fix AnyKeyspace tablet type selection#19399
Devanshusharma2005 wants to merge 5 commits intovitessio:mainfrom
Devanshusharma2005:fix/anykeyspace-replica-tablet-selection

Conversation

@Devanshusharma2005
Copy link

@Devanshusharma2005 Devanshusharma2005 commented Feb 16, 2026

Description

This PR fixes a VTGate routing bug where "replica" queries without an explicit keyspace could fail with "no healthy tablet available".
The root cause was that AnyKeyspace() picked the first alphabetical serving keyspace without checking whether it actually had tablets for the requested type. So if the first keyspace only had PRIMARY tablets, an "replica" query would immediately fail even if another keyspace had healthy replicas.
The fix makes global routing tablet-type aware by reusing ResolveDestinations() through a small helper (canResolveKeyspace()). We filter serving keyspaces based on whether they can actually resolve for vc.tabletType, and then continue with the existing selection logic.
If filtering returns nothing, we gracefully fall back to the original behavior to preserve backwards compatibility.
Scope is intentionally small (2 files), no interface changes, and tests cover the main edge cases
Minimal change, aligns global routing with explicit routing behavior.

Related Issue(s)

Fixes : #19243

TEST RESULT

image

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

No deployment changes required.
This is purely a VTGate routing logic fix. No changes to topology, VSchema, or tabletmanager.

AI Disclosure

I leveraged perplexity in generating tests. I Manually wrote code using grep commands to find the exact bug file and make changes. Fully understood all changes. @arthurschreiber gave enough hints in the issue.
(gofumpt command is the work of god btw ^^)

@github-actions github-actions bot added this to the v24.0.0 milestone Feb 16, 2026
@vitess-bot vitess-bot bot added NeedsWebsiteDocsUpdate What it says NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Feb 16, 2026
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Feb 16, 2026

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@Devanshusharma2005
Copy link
Author

@mhamza15 lets have a look.

Comment on lines 601 to 603
// anyShardDestination is reused across canResolveKeyspace calls to avoid
// allocating a new slice on every invocation.
anyShardDestination = []key.ShardDestination{key.DestinationAnyShard{}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's used only within canResolveKeyspace, then it won't cause an allocation. We should colocate it there if it's only used there.

return true
}
_, _, err := vc.resolver.ResolveDestinations(
context.Background(), ksName, vc.tabletType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use context.Background(). We'll need to funnel in the appropriate context, or at the very least use a short timeout.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something like this here ? func (vc VCursorImpl) canResolveKeyspace(ksName string) bool {
if vc.resolver == nil {
return true
}
anyShardDestination := []key.ShardDestination{key.DestinationAnyShard{}}
ctx, cancel := context.WithTimeout(context.TODO(), 50
time.Millisecond)
defer cancel()
_, _, err := vc.resolver.ResolveDestinations(ctx, ksName, vc.tabletType, nil, anyShardDestination)
return err == nil
} , ig we can use context.TODO when we're unclear and the surrounding is not ready for a ctx param ?

Comment on lines 635 to 637
// canResolveKeyspace checks whether the given keyspace has a SrvKeyspace partition
// for vc.tabletType. Uses ResolveDestinations which reads cached SrvKeyspace data,
// following the same code path as explicit keyspace routing (Resolver.GetKeyspaceShards).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// canResolveKeyspace checks whether the given keyspace has a SrvKeyspace partition
// for vc.tabletType. Uses ResolveDestinations which reads cached SrvKeyspace data,
// following the same code path as explicit keyspace routing (Resolver.GetKeyspaceShards).
// canResolveKeyspace checks whether the given keyspace has a SrvKeyspace partition
// for vc.tabletType.

@mattlord mattlord added Type: Bug Component: Query Serving and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Feb 26, 2026
Signed-off-by: Devanshu Sharma <devanshusharma658@gmail.com>
Signed-off-by: Devanshu Sharma <devanshusharma658@gmail.com>
nt and context fix

Signed-off-by: Devanshu Sharma <devanshusharma658@gmail.com>
Signed-off-by: Devanshu Sharma <devanshusharma658@gmail.com>
Signed-off-by: Devanshu Sharma <devanshusharma658@gmail.com>
@Devanshusharma2005 Devanshusharma2005 force-pushed the fix/anykeyspace-replica-tablet-selection branch from 6f726e0 to 5a4f14a Compare February 26, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: SET Query Fails on @replica When AnyKeyspace Selects Keyspace Without Replica Tablets

3 participants