vtorc: add timeout helpers for remaining recovery topo/tmc calls#19520
Draft
mhamza15 wants to merge 1 commit intovitessio:mainfrom
Draft
vtorc: add timeout helpers for remaining recovery topo/tmc calls#19520mhamza15 wants to merge 1 commit intovitessio:mainfrom
mhamza15 wants to merge 1 commit intovitessio:mainfrom
Conversation
This follows up on vitessio#19519. We still had a few recovery paths passing bare contexts to topo/tmc calls, which means a stuck RPC can block a recovery indefinitely and keep the recovery row active. This changes those call sites to use helper wrappers with `topo.RemoteOperationTimeout`: - `restartDirectReplicas` - `ts.GetTabletsByShard` -> `getShardTablets` - `tmc.StopReplication` -> `stopReplication` - `tmc.StartReplication` -> `startReplication` - `refreshTabletsInKeyspaceShard` - `ts.GetTabletsByShardCell` -> `getShardTabletsByCell` This also generates a `MockTabletManagerClient` and adds a `synctest` that uses it to simulate a replication RPC hanging forever and verifies we return with `context.DeadlineExceeded` instead of blocking. Fixes vitessio#19519 Signed-off-by: Mohamed Hamza <mhamza@fastmail.com>
Contributor
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19520 +/- ##
===========================================
- Coverage 69.67% 33.14% -36.53%
===========================================
Files 1614 13 -1601
Lines 216793 2453 -214340
===========================================
- Hits 151044 813 -150231
+ Misses 65749 1640 -64109
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
We still have a few recovery paths passing bare contexts to topo/tmc calls, which means a stuck RPC can block a recovery indefinitely and keep the recovery row active. This changes those call sites to use helper wrappers with
topo.RemoteOperationTimeout:restartDirectReplicasts.GetTabletsByShard->getShardTabletstmc.StopReplication->stopReplicationtmc.StartReplication->startReplicationrefreshTabletsInKeyspaceShardts.GetTabletsByShardCell->getShardTabletsByCellThis also generates a
MockTabletManagerClientand adds a test that uses it to simulate a replication RPC hanging forever and verifies we return withcontext.DeadlineExceededinstead of blocking.Don't be alarmed by the lines added number, that's the generated mock client 😅.
Related Issue(s)
Fixes #19519
Checklist
Deployment Notes
AI Disclosure
Help from Codex.