Commit fff42f2
md-cluster: fix hanging issue while a new disk adding
The commit 1bbe254 ("md-cluster: check for timeout while a
new disk adding") is correct in terms of code syntax but not
suite real clustered code logic.
When a timeout occurs while adding a new disk, if recv_daemon()
bypasses the unlock for ack_lockres:CR, another node will be waiting
to grab EX lock. This will cause the cluster to hang indefinitely.
How to fix:
1. In dlm_lock_sync(), change the wait behaviour from forever to a
timeout, This could avoid the hanging issue when another node
fails to handle cluster msg. Another result of this change is
that if another node receives an unknown msg (e.g. a new msg_type),
the old code will hang, whereas the new code will timeout and fail.
This could help cluster_md handle new msg_type from different
nodes with different kernel/module versions (e.g. The user only
updates one leg's kernel and monitors the stability of the new
kernel).
2. The old code for __sendmsg() always returns 0 (success) under the
design (must successfully unlock ->message_lockres). This commit
makes this function return an error number when an error occurs.
Fixes: 1bbe254 ("md-cluster: check for timeout while a new disk adding")
Signed-off-by: Heming Zhao <[email protected]>
Reviewed-by: Su Yue <[email protected]>
Acked-by: Yu Kuai <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]1 parent 3c1743a commit fff42f2
1 file changed
+12
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
133 | | - | |
| 134 | + | |
| 135 | + | |
134 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
135 | 141 | | |
136 | 142 | | |
137 | 143 | | |
| |||
743 | 749 | | |
744 | 750 | | |
745 | 751 | | |
746 | | - | |
| 752 | + | |
747 | 753 | | |
748 | 754 | | |
749 | 755 | | |
750 | 756 | | |
751 | 757 | | |
752 | 758 | | |
753 | 759 | | |
754 | | - | |
| 760 | + | |
755 | 761 | | |
756 | 762 | | |
757 | 763 | | |
| |||
781 | 787 | | |
782 | 788 | | |
783 | 789 | | |
784 | | - | |
785 | | - | |
| 790 | + | |
786 | 791 | | |
787 | | - | |
788 | | - | |
789 | | - | |
790 | | - | |
791 | | - | |
| 792 | + | |
| 793 | + | |
792 | 794 | | |
793 | 795 | | |
794 | 796 | | |
| |||
0 commit comments