Commit c0fedb5
committed
mds/MDSDaemon: unlock
This fixes a deadlock bug during MDS shutdown:
- the "signal_handler" thread receives the shutdown signal and invokes
MDSDaemon::suicide() while holding `mds_lock`
- MDSDaemon::suicide() invokes Beacon::send_and_wait() while still
holding `mds_lock`
- meanwhile, all "ms_dispatch" threads get stuck waiting for
`mds_lock`, for example in MDCache::upkeep_main() or
MDSDaemon::ms_dispatch2()
- Beacon::send_and_wait() waits for a `MSG_MDS_BEACON` packet to be
dispatched (via `cvar` with a timeout)
At this point, even if a `MSG_MDS_BEACON` packet is received by one of
the worker threads, they will put it in the `DispatchQueue`, but no
dispatcher thread will be able to handle it because they are all
stuck. The cvar.wait_for() call in Beacon::send_and_wait() will
therefore time out and the `MSG_MDS_BEACON` will never be processed.
The proper solution is to unlock `mds_lock` to avoid the dispatchers
from getting stuck. And in general, we should be holding a lock
strictly only when it is needed and never do blocking calls while
holding a lock.
Fixes: https://tracker.ceph.com/issues/68760
Signed-off-by: Max Kellermann <[email protected]>mds_lock while shutting down Beacon and others1 parent 5b828b6 commit c0fedb5
1 file changed
+9
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
923 | 923 | | |
924 | 924 | | |
925 | 925 | | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
926 | 933 | | |
927 | 934 | | |
928 | 935 | | |
| |||
931 | 938 | | |
932 | 939 | | |
933 | 940 | | |
| 941 | + | |
| 942 | + | |
934 | 943 | | |
935 | 944 | | |
936 | 945 | | |
| |||
0 commit comments