Skip to content

Commit 3ff3a4d

Browse files
committed
fix(master): avoid rebalancing to disconnected CS
At the instant of looking for a server to replicate a part when rebalancing, the current implementation uses the sortedServers_ or labeledSortedServers_ variables. These containers may have stale chunkserver entries that have been recently switched to KILL mode. Those entries must not be used as a destination for the replicate parts. These commit fixes that issue. The test test_kill_cs_while_writing_small_files should stop being flaky after the merge. Signed-off-by: Dave <dave@leil.io>
1 parent 5c742a6 commit 3ff3a4d

File tree

3 files changed

+11
-1
lines changed

3 files changed

+11
-1
lines changed

src/master/chunks.cc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2712,6 +2712,10 @@ bool ChunkWorker::rebalanceChunkParts(Chunk *c, ChunkCopiesCalculator &calc, boo
27122712
: labeledSortedServers_[current_copy_label];
27132713

27142714
for (const auto &empty_server : sorted_servers) {
2715+
if (matocsserv_is_killed(empty_server.server)) {
2716+
continue;
2717+
}
2718+
27152719
if (!only_todel && gAvoidSameIpChunkservers) {
27162720
auto empty_server_ip = matocsserv_get_servip(empty_server.server);
27172721
auto it = ip_counter.find(empty_server_ip);
@@ -2942,7 +2946,6 @@ void ChunkWorker::doChunkJobs(Chunk *c, uint16_t serverCount) {
29422946
if (rebalanceChunkParts(c, calc, false, ip_occurrence)) {
29432947
return;
29442948
}
2945-
29462949
}
29472950

29482951
bool ChunkWorker::deleteUnusedChunks() {

src/master/matocsserv.cc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1655,6 +1655,10 @@ void matocsserv_reload() {
16551655
lsock = newlsock;
16561656
}
16571657

1658+
bool matocsserv_is_killed(matocsserventry* eptr) {
1659+
return eptr->mode == ChunkserverConnectionMode::KILL;
1660+
}
1661+
16581662
uint32_t matocsserv_get_version(matocsserventry *eptr) {
16591663
return eptr->version;
16601664
}

src/master/matocsserv.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,9 @@ double matocsserv_get_usage(matocsserventry* eptr);
8686
/*! \brief Get chunkservers ordered by disk usage. */
8787
std::vector<ServerWithUsage> matocsserv_getservers_sorted();
8888

89+
/*! \brief Check if chunkserver is killed. */
90+
bool matocsserv_is_killed(matocsserventry* eptr);
91+
8992
uint32_t matocsserv_get_version(matocsserventry* eptr);
9093
void matocsserv_usagedifference(double *minusage, double *maxusage, uint16_t *usablescount,
9194
uint16_t *totalscount);

0 commit comments

Comments
 (0)