Fix Redis to reconnect in Sentinel (Chris Staite)#2190
Fix Redis to reconnect in Sentinel (Chris Staite)#2190MarcusSorealheis merged 11 commits intomainfrom
Conversation
|
We clearly need to roll something back or fix some test that I broke. My apologies @chrisstaite |
The switch from Fred to Redis-rs caused auto-reconnect in Sentinel to not occur when the primary fails over. Instead, we have to manually take care of detecting a ReadOnly error and performing the re-discovery of the primary and re-connecting. Re-work the Redis store to catch the ReadOnly error in update_data and reconnect. Take a UUID for each connection to ensure we don't spam reconnect attempts from multiple tasks at the same time. Switch the pubsub over to using the ConnectionManager and manage the psubscribe there.
|
@chrisstaite-menlo FYI, this starts from your work and then adds in some extra testing I've built that demos the Sentinel fallover case. My test calls into |
|
@chrisstaite-menlo finally, this looks like it works. It's never worked fully until now. Next up, several hot path optimizations. |
chrisstaite-menlo
left a comment
There was a problem hiding this comment.
Awesome jobs with the tests, thanks.
@chrisstaite-menlo made 2 comments.
Reviewable status: 0 of 1 LGTMs obtained, and 0 of 16 files reviewed.
nativelink-store/src/redis_store.rs line 266 at r11 (raw file):
}; for subscription in subscriptions { connection_manager.psubscribe(&subscription).await?;
One subscription failing can cause issues with half reconnected. Probably should have better error handling here.
chrisstaite-menlo
left a comment
There was a problem hiding this comment.
@chrisstaite-menlo reviewed 16 files and all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed.
|
Thank your for the review brother @chrisstaite-menlo |
|
I'm going to cut a new release |
I'll try to fix this on the weekend because we have a Chrome customers that need the PR, though that shouldn't break anyone. |
Description
The switch from Fred to Redis-rs caused auto-reconnect in Sentinel to not occur when the primary fails over. Instead, we have to manually take care of detecting a ReadOnly error and performing the re-discovery of the primary and re-connecting.
Re-work the Redis store to catch the ReadOnly error in update_data and reconnect. Take a UUID for each connection to ensure we don't spam reconnect attempts from multiple tasks at the same time. Switch the pubsub over to using the ConnectionManager and manage the psubscribe there.
Type of change
Please delete options that aren't relevant.
How Has This Been Tested?
bazel test //...redis_store_testerwithsrc/bin/docker-compose.store-tester.yamland knocking out the primary nodeChecklist
bazel test //...passes locallygit amendsee some docsThis change is