Commit 11c9180
multirpc/subpub: fix potential goroutine deadlocks
When the connection to a peer is lost,
broadcastHandler errors in its SendMessage call,
and the entire goroutine stops.
No goroutine will continue receiving on the write channel,
and sooner than later, sends to the write channel will start blocking.
This starts causing deadlocks further up in IPFSsync.
SubPub.Subscribe and SubPub.PeerStreamWrite can now block forever,
and further up the chain in IPFSsync,
that can mean some goroutines hold onto mutexes forever.
On one hand, this chain of events can hang IPFSsync,
stopping it from doing anything useful until a restart.
On the other hand, it causes goroutine leaks.
When more calls to IPFSsync.Handle come through,
using new goroutines via the router,
those try to grab the deadlocked mutexes and hang forever.
First, fix the root cause: peerSub now has a "closed" channel,
which gets closed by peersManager when the peer is dropped.
Its goroutines, both for reading and writing messages,
keep running until that happens.
Second, make the symptom of the deadlock less severe:
prevent blocking on channel sends forever.
Any send on the "write" channel now stops on "closed".
And the send on BroadcastWriter, which could also block forever,
now has a fallback timeout of five minutes.
Updates #243. Perhaps not a total fix, as there might be other leaks.1 parent d8c83c6 commit 11c9180
File tree
4 files changed
+52
-12
lines changed- multirpc
- subpub
- transports/subpubtransport
4 files changed
+52
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| 61 | + | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
67 | 68 | | |
| 69 | + | |
68 | 70 | | |
69 | 71 | | |
70 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
| 17 | + | |
17 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
18 | 24 | | |
19 | 25 | | |
20 | 26 | | |
| |||
31 | 37 | | |
32 | 38 | | |
33 | 39 | | |
34 | | - | |
35 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
36 | 48 | | |
37 | 49 | | |
38 | 50 | | |
| |||
101 | 113 | | |
102 | 114 | | |
103 | 115 | | |
| 116 | + | |
| 117 | + | |
104 | 118 | | |
105 | 119 | | |
106 | 120 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
13 | 15 | | |
14 | 16 | | |
15 | | - | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
| |||
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
30 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
31 | 37 | | |
32 | 38 | | |
33 | 39 | | |
34 | 40 | | |
35 | | - | |
| 41 | + | |
36 | 42 | | |
37 | 43 | | |
38 | | - | |
| 44 | + | |
39 | 45 | | |
40 | 46 | | |
41 | 47 | | |
42 | 48 | | |
| 49 | + | |
| 50 | + | |
43 | 51 | | |
44 | 52 | | |
45 | 53 | | |
46 | | - | |
| 54 | + | |
47 | 55 | | |
48 | 56 | | |
49 | 57 | | |
50 | | - | |
| 58 | + | |
51 | 59 | | |
52 | 60 | | |
53 | 61 | | |
54 | 62 | | |
55 | 63 | | |
56 | | - | |
| 64 | + | |
57 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
58 | 70 | | |
59 | 71 | | |
60 | 72 | | |
61 | 73 | | |
| 74 | + | |
| 75 | + | |
62 | 76 | | |
63 | 77 | | |
64 | 78 | | |
| |||
67 | 81 | | |
68 | 82 | | |
69 | 83 | | |
70 | | - | |
71 | 84 | | |
72 | 85 | | |
73 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
90 | 101 | | |
91 | 102 | | |
92 | 103 | | |
| |||
0 commit comments