Skip to content

Commit 1d854e4

Browse files
adam900710kdave
authored andcommitted
btrfs: fix false alert on bad tree level check
[BUG] There is a bug report that on a RAID0 NVMe btrfs system, under heavy write load the filesystem can flip RO randomly. With extra debugging, it shows some tree blocks failed to pass their level checks, and if that happens at critical path of a transaction, we abort the transaction: BTRFS error (device nvme0n1p3): level verify failed on logical 5446121209856 mirror 1 wanted 0 found 1 BTRFS error (device nvme0n1p3: state A): Transaction aborted (error -5) BTRFS: error (device nvme0n1p3: state A) in btrfs_finish_ordered_io:3343: errno=-5 IO failure BTRFS info (device nvme0n1p3: state EA): forced readonly [CAUSE] The reporter has already bisected to commit 947a629 ("btrfs: move tree block parentness check into validate_extent_buffer()"). And with extra debugging, it shows we can have btrfs_tree_parent_check filled with all zeros in the following call trace: submit_one_bio+0xd4/0xe0 submit_extent_page+0x142/0x550 read_extent_buffer_pages+0x584/0x9c0 ? __pfx_end_bio_extent_readpage+0x10/0x10 ? folio_unlock+0x1d/0x50 btrfs_read_extent_buffer+0x98/0x150 read_tree_block+0x43/0xa0 read_block_for_search+0x266/0x370 btrfs_search_slot+0x351/0xd30 ? lock_is_held_type+0xe8/0x140 btrfs_lookup_csum+0x63/0x150 btrfs_csum_file_blocks+0x197/0x6c0 ? sched_clock_cpu+0x9f/0xc0 ? lock_release+0x14b/0x440 ? _raw_read_unlock+0x29/0x50 btrfs_finish_ordered_io+0x441/0x860 btrfs_work_helper+0xfe/0x400 ? lock_is_held_type+0xe8/0x140 process_one_work+0x294/0x5b0 worker_thread+0x4f/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 Currently we only copy the btrfs_tree_parent_check structure into bbio at read_extent_buffer_pages() after we have assembled the bbio. But as shown above, submit_extent_page() itself can already submit the bbio, leaving the bbio->parent_check uninitialized, and cause the false alert. [FIX] Instead of copying @check into bbio after bbio is assembled, we pass @check in btrfs_bio_ctrl::parent_check, and copy the content of parent_check in submit_one_bio() for metadata read. By this we should be able to pass the needed info for metadata endio verification, and fix the false alert. Reported-by: Mikhail Gavrilov <[email protected]> Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/ Fixes: 947a629 ("btrfs: move tree block parentness check into validate_extent_buffer()") Tested-by: Mikhail Gavrilov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 77177ed commit 1d854e4

File tree

1 file changed

+25
-5
lines changed

1 file changed

+25
-5
lines changed

fs/btrfs/extent_io.c

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,15 @@ struct btrfs_bio_ctrl {
103103
u32 len_to_oe_boundary;
104104
btrfs_bio_end_io_t end_io_func;
105105

106+
/*
107+
* This is for metadata read, to provide the extra needed verification
108+
* info. This has to be provided for submit_one_bio(), as
109+
* submit_one_bio() can submit a bio if it ends at stripe boundary. If
110+
* no such parent_check is provided, the metadata can hit false alert at
111+
* endio time.
112+
*/
113+
struct btrfs_tree_parent_check *parent_check;
114+
106115
/*
107116
* Tell writepage not to lock the state bits for this range, it still
108117
* does the unlocking.
@@ -133,13 +142,24 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
133142

134143
btrfs_bio(bio)->file_offset = page_offset(bv->bv_page) + bv->bv_offset;
135144

136-
if (!is_data_inode(&inode->vfs_inode))
145+
if (!is_data_inode(&inode->vfs_inode)) {
146+
if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
147+
/*
148+
* For metadata read, we should have the parent_check,
149+
* and copy it to bbio for metadata verification.
150+
*/
151+
ASSERT(bio_ctrl->parent_check);
152+
memcpy(&btrfs_bio(bio)->parent_check,
153+
bio_ctrl->parent_check,
154+
sizeof(struct btrfs_tree_parent_check));
155+
}
137156
btrfs_submit_metadata_bio(inode, bio, mirror_num);
138-
else if (btrfs_op(bio) == BTRFS_MAP_WRITE)
157+
} else if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
139158
btrfs_submit_data_write_bio(inode, bio, mirror_num);
140-
else
159+
} else {
141160
btrfs_submit_data_read_bio(inode, bio, mirror_num,
142161
bio_ctrl->compress_type);
162+
}
143163

144164
/* The bio is owned by the end_io handler now */
145165
bio_ctrl->bio = NULL;
@@ -4829,6 +4849,7 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
48294849
struct extent_state *cached_state = NULL;
48304850
struct btrfs_bio_ctrl bio_ctrl = {
48314851
.mirror_num = mirror_num,
4852+
.parent_check = check,
48324853
};
48334854
int ret = 0;
48344855

@@ -4878,7 +4899,6 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
48784899
*/
48794900
atomic_dec(&eb->io_pages);
48804901
}
4881-
memcpy(&btrfs_bio(bio_ctrl.bio)->parent_check, check, sizeof(*check));
48824902
submit_one_bio(&bio_ctrl);
48834903
if (ret || wait != WAIT_COMPLETE) {
48844904
free_extent_state(cached_state);
@@ -4905,6 +4925,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
49054925
unsigned long num_reads = 0;
49064926
struct btrfs_bio_ctrl bio_ctrl = {
49074927
.mirror_num = mirror_num,
4928+
.parent_check = check,
49084929
};
49094930

49104931
if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
@@ -4996,7 +5017,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
49965017
}
49975018
}
49985019

4999-
memcpy(&btrfs_bio(bio_ctrl.bio)->parent_check, check, sizeof(*check));
50005020
submit_one_bio(&bio_ctrl);
50015021

50025022
if (ret || wait != WAIT_COMPLETE)

0 commit comments

Comments
 (0)