Commit b6ebcdf
authored
fix(master): avoid rebalancing to disconnected CS (#763)
At the instant of looking for a server to replicate a part when
rebalancing, the current implementation uses the sortedServers_ or
labeledSortedServers_ variables. These containers may have stale
chunkserver entries that have been recently switched to KILL mode
or entries completely released. Those entries must not be used as a
destination for the replicate parts, if the master does not crashes.
These commit fixes those issues.
The test test_kill_cs_while_writing_small_files should stop being flaky
after the merge.
The master crash was one of the causes for the falure of the test.
Another possible outcome of those replication targetting disconnected
CSs is that the replication read counter of the source CS (the
remaining alive CS) remain increased indefinately, thus blocking future
replications that needed to retrieve data from those servers. This very
dangerous behavior was also causing failures in the previously
mentioned test.
Signed-off-by: Dave <dave@leil.io>1 parent 882ea90 commit b6ebcdf
3 files changed
+44
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2306 | 2306 | | |
2307 | 2307 | | |
2308 | 2308 | | |
| 2309 | + | |
2309 | 2310 | | |
2310 | 2311 | | |
2311 | 2312 | | |
| |||
2364 | 2365 | | |
2365 | 2366 | | |
2366 | 2367 | | |
2367 | | - | |
| 2368 | + | |
2368 | 2369 | | |
2369 | 2370 | | |
2370 | 2371 | | |
2371 | 2372 | | |
2372 | 2373 | | |
2373 | 2374 | | |
2374 | 2375 | | |
| 2376 | + | |
| 2377 | + | |
| 2378 | + | |
| 2379 | + | |
2375 | 2380 | | |
2376 | 2381 | | |
2377 | 2382 | | |
| |||
2712 | 2717 | | |
2713 | 2718 | | |
2714 | 2719 | | |
| 2720 | + | |
| 2721 | + | |
2715 | 2722 | | |
2716 | 2723 | | |
2717 | 2724 | | |
| |||
2784 | 2791 | | |
2785 | 2792 | | |
2786 | 2793 | | |
| 2794 | + | |
| 2795 | + | |
2787 | 2796 | | |
2788 | 2797 | | |
2789 | 2798 | | |
| |||
2942 | 2951 | | |
2943 | 2952 | | |
2944 | 2953 | | |
2945 | | - | |
2946 | 2954 | | |
2947 | 2955 | | |
2948 | 2956 | | |
| |||
2989 | 2997 | | |
2990 | 2998 | | |
2991 | 2999 | | |
| 3000 | + | |
| 3001 | + | |
| 3002 | + | |
| 3003 | + | |
| 3004 | + | |
| 3005 | + | |
| 3006 | + | |
2992 | 3007 | | |
2993 | 3008 | | |
2994 | 3009 | | |
| |||
3026 | 3041 | | |
3027 | 3042 | | |
3028 | 3043 | | |
| 3044 | + | |
3029 | 3045 | | |
3030 | 3046 | | |
3031 | 3047 | | |
| |||
3052 | 3068 | | |
3053 | 3069 | | |
3054 | 3070 | | |
| 3071 | + | |
3055 | 3072 | | |
3056 | 3073 | | |
3057 | 3074 | | |
| |||
3065 | 3082 | | |
3066 | 3083 | | |
3067 | 3084 | | |
| 3085 | + | |
3068 | 3086 | | |
3069 | 3087 | | |
3070 | 3088 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
| |||
169 | 170 | | |
170 | 171 | | |
171 | 172 | | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
172 | 181 | | |
173 | 182 | | |
174 | 183 | | |
| |||
1586 | 1595 | | |
1587 | 1596 | | |
1588 | 1597 | | |
| 1598 | + | |
| 1599 | + | |
1589 | 1600 | | |
1590 | 1601 | | |
1591 | 1602 | | |
| |||
1655 | 1666 | | |
1656 | 1667 | | |
1657 | 1668 | | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
1658 | 1673 | | |
1659 | 1674 | | |
1660 | 1675 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
89 | 92 | | |
90 | 93 | | |
91 | 94 | | |
92 | 95 | | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
93 | 102 | | |
94 | 103 | | |
95 | 104 | | |
| |||
0 commit comments