-
Notifications
You must be signed in to change notification settings - Fork 45
Handle failures in GracefulMasterTakeover #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
o-fedorov
wants to merge
6
commits into
percona:master
Choose a base branch
from
o-fedorov:feature/restore-master-status
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
747fa26
Handle failures in GracefulMasterTakeover
o-fedorov 1e58191
Add tests
o-fedorov 3dba639
Address comments: update logs and log SetReadOnly error
o-fedorov 2955418
Reset error for ForceExecuteRecovery in gracefulMasterTakeover
o-fedorov 2bc6879
Move PostUnsuccessfulGracefulTakeoverProcesses config to a dedicated …
o-fedorov 736a169
Clear the error in a proper place.
o-fedorov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 2 additions & 0 deletions
2
tests/system/graceful-master-takeover-fail-replication-stopped/01-record-log-position/run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # Store the current number of Orchestrator log lines | ||
| wc -l </var/log/journal/orchestrator.service.log > /tmp/orchestrator.log.lines |
1 change: 1 addition & 0 deletions
1
...ystem/graceful-master-takeover-fail-replication-stopped/02-stop-replication/expect_output
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| 127.0.0.1:10112 |
1 change: 1 addition & 0 deletions
1
tests/system/graceful-master-takeover-fail-replication-stopped/02-stop-replication/run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| orchestrator-client -c stop-replica -i 127.0.0.1:10112 |
1 change: 1 addition & 0 deletions
1
tests/system/graceful-master-takeover-fail-replication-stopped/03-takeover/expect_failure
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| WaitForExecBinlogCoordinatesToReach: reached maxWait 20s on 127.0.0.1:10112 |
2 changes: 2 additions & 0 deletions
2
tests/system/graceful-master-takeover-fail-replication-stopped/03-takeover/run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| sleep 3 | ||
| orchestrator-client -c graceful-master-takeover -i 127.0.0.1:10111 -d 127.0.0.1:10112 |
4 changes: 4 additions & 0 deletions
4
tests/system/graceful-master-takeover-fail-replication-stopped/04-topology/expect_output
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| 127.0.0.1:10111 |ok |rw | ||
| - 127.0.0.1:10112 |nonreplicating|ro | ||
| + 127.0.0.1:10113|ok |ro | ||
| + 127.0.0.1:10114|ok |ro |
1 change: 1 addition & 0 deletions
1
tests/system/graceful-master-takeover-fail-replication-stopped/04-topology/run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| orchestrator-client -c topology-tabulated -alias ci | cut -d'|' -f 1,3,5 |
2 changes: 2 additions & 0 deletions
2
tests/system/graceful-master-takeover-fail-replication-stopped/05-check-logs/expect_output
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| ERROR GracefulMasterTakeover: promotion failed. Will unset read-only on 127.0.0.1:10111 | ||
| INFO topology_recovery: Running PostUnsuccessfulGracefulTakeoverProcesses hook 1 of 1: echo 'Planned takeover failed for 127.0.0.1:10111' |
3 changes: 3 additions & 0 deletions
3
tests/system/graceful-master-takeover-fail-replication-stopped/05-check-logs/run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Read the logs generated after the test started, and check for the expected messages | ||
| tail -n +$(cat /tmp/orchestrator.log.lines) /var/log/journal/orchestrator.service.log \ | ||
| | grep -oP '(ERROR GracefulMasterTakeover: promotion failed|INFO topology_recovery: Running PostUnsuccessfulGracefulTakeoverProcesses hook).*' |
19 changes: 19 additions & 0 deletions
19
tests/system/graceful-master-takeover-fail-replication-stopped/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| # Test the error handling when the master fails to take over gracefully and replication is stopped | ||
|
|
||
| This error occurred in production when two takeovers were executed in a row. | ||
| The first take over was successful, but the second one failed. | ||
|
|
||
| This error is reproducable by stopping replication on the replica that | ||
| is supposed to take over. Another way to reproduce this error is to | ||
| run two takeovers one immediately after the other: | ||
|
|
||
| ```sh | ||
| orchestrator-client -c graceful-master-takeover -i 127.0.0.1:10111 -d 127.0.0.1:10112 | ||
| orchestrator-client -c graceful-master-takeover -i 127.0.0.1:10112 -d 127.0.0.1:10111 | ||
| ``` | ||
|
|
||
| In the end the topology will be in a partially failed state, with | ||
| replication stopped for replica `127.0.0.1:10112`, and the other two | ||
| replicas placed behind it. Though, `master` will still be writable, | ||
| and `PostUnsuccessfulGracefulTakeoverProcesses` hooks will be executed | ||
| to help the cluster recover. |
Empty file.
3 changes: 3 additions & 0 deletions
3
tests/system/graceful-master-takeover-fail-replication-stopped/teardown
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| orchestrator-client -c start-replica -i 127.0.0.1:10112 | ||
| orchestrator-client -c relocate -i 127.0.0.1:10113 -d 127.0.0.1:10111 | ||
| orchestrator-client -c relocate -i 127.0.0.1:10114 -d 127.0.0.1:10111 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @o-fedorov , I'm not sure about this error propagation and handling. In the original flow, if
ForceExecuteRecoveryreturned err, but also returnedrecoveryAttemptedandtopologyRecovery,GracefulMasterTakeovercontinued.Now it makes master R/W (which is fine), but then returns.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @kamil-holubicki , I forgot to clear the error at the end of the function. Updated now.
Note that in the original code,
erris ignored below the refactored part, and is eventually overridden.