Skip to content

Commit 29a0cfb

Browse files
committed
libceph: don't allow bidirectional swap of pg-upmap-items
This reverts most of commit f53b766 ("libceph: upmap semantic changes"). We need to prevent duplicates in the final result. For example, we can currently take [1,2,3] and apply [(1,2)] and get [2,2,3] or [1,2,3] and apply [(3,2)] and get [1,2,2] The rest of the system is not prepared to handle duplicates in the result set like this. The reverted piece was intended to allow [1,2,3] and [(1,2),(2,1)] to get [2,1,3] to reorder primaries. First, this bidirectional swap is hard to implement in a way that also prevents dups. For example, [1,2,3] and [(1,4),(2,3),(3,4)] would give [4,3,4] but would we just drop the last step we'd have [4,3,3] which is also invalid, etc. Simpler to just not handle bidirectional swaps. In practice, they are not needed: if you just want to choose a different primary then use primary_affinity, or pg_upmap (not pg_upmap_items). Cc: [email protected] # 4.13 Link: http://tracker.ceph.com/issues/21410 Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
1 parent 2bd6bf0 commit 29a0cfb

File tree

1 file changed

+25
-10
lines changed

1 file changed

+25
-10
lines changed

net/ceph/osdmap.c

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2445,19 +2445,34 @@ static void apply_upmap(struct ceph_osdmap *osdmap,
24452445

24462446
pg = lookup_pg_mapping(&osdmap->pg_upmap_items, pgid);
24472447
if (pg) {
2448-
for (i = 0; i < raw->size; i++) {
2449-
for (j = 0; j < pg->pg_upmap_items.len; j++) {
2450-
int from = pg->pg_upmap_items.from_to[j][0];
2451-
int to = pg->pg_upmap_items.from_to[j][1];
2452-
2453-
if (from == raw->osds[i]) {
2454-
if (!(to != CRUSH_ITEM_NONE &&
2455-
to < osdmap->max_osd &&
2456-
osdmap->osd_weight[to] == 0))
2457-
raw->osds[i] = to;
2448+
/*
2449+
* Note: this approach does not allow a bidirectional swap,
2450+
* e.g., [[1,2],[2,1]] applied to [0,1,2] -> [0,2,1].
2451+
*/
2452+
for (i = 0; i < pg->pg_upmap_items.len; i++) {
2453+
int from = pg->pg_upmap_items.from_to[i][0];
2454+
int to = pg->pg_upmap_items.from_to[i][1];
2455+
int pos = -1;
2456+
bool exists = false;
2457+
2458+
/* make sure replacement doesn't already appear */
2459+
for (j = 0; j < raw->size; j++) {
2460+
int osd = raw->osds[j];
2461+
2462+
if (osd == to) {
2463+
exists = true;
24582464
break;
24592465
}
2466+
/* ignore mapping if target is marked out */
2467+
if (osd == from && pos < 0 &&
2468+
!(to != CRUSH_ITEM_NONE &&
2469+
to < osdmap->max_osd &&
2470+
osdmap->osd_weight[to] == 0)) {
2471+
pos = j;
2472+
}
24602473
}
2474+
if (!exists && pos >= 0)
2475+
raw->osds[pos] = to;
24612476
}
24622477
}
24632478
}

0 commit comments

Comments
 (0)