Commit 21d0992
committed
mds: skip scrubbing damaged dirfrag
This only happens when the omap fetch fails or the fnode is corrupt. MDS can't
presently repair that damage. Without this change, the MDS enters an infinite loop of repair:
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 MDSContext::complete: 12C_RetryScrub
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs: state=RUNNING
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs entering with 0 in progress and 1 in the stack
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 ap=1+0 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=1 scrubqueue=1 0x55b1a50fa880]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.den(0x10000000000 dir_xx) scrubbing [dentry #0x1/dir_x/dir_xx [2,head] auth (dversion lock) pv=0 v=8 ino=0x10000000001 state=1073741824 0x55b1a50eaf00] next_seq = 2
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b1a50da240) get_snaps (seq 1 cached_seq 1)
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack _enqueue with {[inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680]}, top=0
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.ino(0x10000000001) scrub_initialize with scrub_version 6
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.ino(0x10000000001) uninline_initialize with scrub_version 6
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack enqueue [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680] to bottom of ScrubStack
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) get_num_head_items() = 1; fnode.fragstat.nfiles=0 fnode.fragstat.nsubdirs=1
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) total of child dentries: n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) my rstats: n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000000) check_rstats complete on 0x55b1a50fa880
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) scrub_finished
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000000) auth_unpin by 0x55b1a4f7b600 on [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] count now 0
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag done
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs dirfrag, done
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack dequeue [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] from ScrubStack
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs examining [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) can_auth_pin: auth!
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dir_inode [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode recursive mode, frags [*]
2025-01-28T19:25:46.153+0000 7f9626cc5640 15 mds.0.cache.ino(0x10000000001) maybe_export_pin update=0 [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000001) can_auth_pin: auth!
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode barebones [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000001) fetch_keys 0 keys on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000001) auth_pin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180] count now 1
2025-01-28T19:25:46.153+0000 7f9626cc5640 1 -- [v2:172.21.10.4:6867/526112796,v1:172.21.10.4:6872/526112796] --> [v2:172.21.10.4:6802/3852331191,v1:172.21.10.4:6803/3852331191] -- osd_op(unknown.0.340:50 42.7 42:e2e07930:::10000000001.00000000:head [omap-get-header,omap-get-vals-by-keys in=4b,getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e564) -- 0x55b1a50d8c00 con 0x55b1a50d9000
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.bal hit_dir 3 pop is 1, frag * size 0 [pop IRD:[C 0.00e+00] IWR:[C 0.00e+00] RDR:[C 0.00e+00] FET:[C 1.00e+00] STR:[C 0.00e+00] *LOAD:2.0]
2025-01-28T19:25:46.153+0000 7f962ecd5640 1 -- [v2:172.21.10.4:6867/526112796,v1:172.21.10.4:6872/526112796] <== osd.0 v2:172.21.10.4:6802/3852331191 3 ==== osd_op_reply(50 10000000001.00000000 [omap-get-header,omap-get-vals-by-keys,getxattr] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) ==== 248+0+0 (crc 0 0 0) 0x55b1a4444280 con 0x55b1a50d9000
2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched
2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) _fetched header 0 bytes 0 keys for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f96254c2640 0 mds.0.cache.dir(0x10000000001) _fetched missing object for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f96254c2640 -1 log_channel(cluster) log [ERR] : dir 0x10000000001 object missing on disk; some files may be lost (/dir_x/dir_xx)
2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) go_bad *
2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) auth_unpin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180] count now 0
2025-01-28T19:25:46.153+0000 7f96254c2640 11 mds.0.cache.dir(0x10000000001) finish_waiting mask 2 result -5 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSContext::complete: 12C_RetryScrub
Note that this partially reverts 5b56098. That commit incorrectly marked a
dirfrag as repaired when it may not even exist in the metadata pool.
Fixes: 5b56098
Signed-off-by: Patrick Donnelly <[email protected]>1 parent 9c83f6c commit 21d0992
3 files changed
+22
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4999 | 4999 | | |
5000 | 5000 | | |
5001 | 5001 | | |
5002 | | - | |
5003 | | - | |
5004 | | - | |
| 5002 | + | |
| 5003 | + | |
| 5004 | + | |
| 5005 | + | |
| 5006 | + | |
| 5007 | + | |
| 5008 | + | |
| 5009 | + | |
| 5010 | + | |
5005 | 5011 | | |
5006 | 5012 | | |
5007 | 5013 | | |
| |||
5015 | 5021 | | |
5016 | 5022 | | |
5017 | 5023 | | |
5018 | | - | |
5019 | | - | |
5020 | | - | |
5021 | 5024 | | |
5022 | 5025 | | |
5023 | 5026 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13534 | 13534 | | |
13535 | 13535 | | |
13536 | 13536 | | |
| 13537 | + | |
| 13538 | + | |
| 13539 | + | |
| 13540 | + | |
13537 | 13541 | | |
13538 | 13542 | | |
13539 | 13543 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
385 | | - | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
386 | 394 | | |
387 | 395 | | |
388 | 396 | | |
| |||
0 commit comments