Skip to content

Commit 820cfae

Browse files
adam900710gregkh
authored andcommitted
btrfs: fix the incorrect max_bytes value for find_lock_delalloc_range()
[ Upstream commit 7b26da4 ] [BUG] With my local branch to enable bs > ps support for btrfs, sometimes I hit the following ASSERT() inside submit_one_sector(): ASSERT(block_start != EXTENT_MAP_HOLE); Please note that it's not yet possible to hit this ASSERT() in the wild yet, as it requires btrfs bs > ps support, which is not even in the development branch. But on the other hand, there is also a very low chance to hit above ASSERT() with bs < ps cases, so this is an existing bug affect not only the incoming bs > ps support but also the existing bs < ps support. [CAUSE] Firstly that ASSERT() means we're trying to submit a dirty block but without a real extent map nor ordered extent map backing it. Furthermore with extra debugging, the folio triggering such ASSERT() is always larger than the fs block size in my bs > ps case. (8K block size, 4K page size) After some more debugging, the ASSERT() is trigger by the following sequence: extent_writepage() | We got a 32K folio (4 fs blocks) at file offset 0, and the fs block | size is 8K, page size is 4K. | And there is another 8K folio at file offset 32K, which is also | dirty. | So the filemap layout looks like the following: | | "||" is the filio boundary in the filemap. | "//| is the dirty range. | | 0 8K 16K 24K 32K 40K | |////////| |//////////////////////||////////| | |- writepage_delalloc() | |- find_lock_delalloc_range() for [0, 8K) | | Now range [0, 8K) is properly locked. | | | |- find_lock_delalloc_range() for [16K, 40K) | | |- btrfs_find_delalloc_range() returned range [16K, 40K) | | |- lock_delalloc_folios() locked folio 0 successfully | | | | | | The filemap range [32K, 40K) got dropped from filemap. | | | | | |- lock_delalloc_folios() failed with -EAGAIN on folio 32K | | | As the folio at 32K is dropped. | | | | | |- loops = 1; | | |- max_bytes = PAGE_SIZE; | | |- goto again; | | | This will re-do the lookup for dirty delalloc ranges. | | | | | |- btrfs_find_delalloc_range() called with @max_bytes == 4K | | | This is smaller than block size, so | | | btrfs_find_delalloc_range() is unable to return any range. | | \- return false; | | | \- Now only range [0, 8K) has an OE for it, but for dirty range | [16K, 32K) it's dirty without an OE. | This breaks the assumption that writepage_delalloc() will find | and lock all dirty ranges inside the folio. | |- extent_writepage_io() |- submit_one_sector() for [0, 8K) | Succeeded | |- submit_one_sector() for [16K, 24K) Triggering the ASSERT(), as there is no OE, and the original extent map is a hole. Please note that, this also exposed the same problem for bs < ps support. E.g. with 64K page size and 4K block size. If we failed to lock a folio, and falls back into the "loops = 1;" branch, we will re-do the search using 64K as max_bytes. Which may fail again to lock the next folio, and exit early without handling all dirty blocks inside the folio. [FIX] Instead of using the fixed size PAGE_SIZE as @max_bytes, use @sectorsize, so that we are ensured to find and lock any remaining blocks inside the folio. And since we're here, add an extra ASSERT() to before calling btrfs_find_delalloc_range() to make sure the @max_bytes is at least no smaller than a block to avoid false negative. Cc: [email protected] # 5.15+ Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 24b760c commit 820cfae

File tree

1 file changed

+11
-3
lines changed

1 file changed

+11
-3
lines changed

fs/btrfs/extent_io.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,13 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
355355
/* step one, find a bunch of delalloc bytes starting at start */
356356
delalloc_start = *start;
357357
delalloc_end = 0;
358+
359+
/*
360+
* If @max_bytes is smaller than a block, btrfs_find_delalloc_range() can
361+
* return early without handling any dirty ranges.
362+
*/
363+
ASSERT(max_bytes >= fs_info->sectorsize);
364+
358365
found = btrfs_find_delalloc_range(tree, &delalloc_start, &delalloc_end,
359366
max_bytes, &cached_state);
360367
if (!found || delalloc_end <= *start || delalloc_start > orig_end) {
@@ -385,13 +392,14 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
385392
delalloc_end);
386393
ASSERT(!ret || ret == -EAGAIN);
387394
if (ret == -EAGAIN) {
388-
/* some of the folios are gone, lets avoid looping by
389-
* shortening the size of the delalloc range we're searching
395+
/*
396+
* Some of the folios are gone, lets avoid looping by
397+
* shortening the size of the delalloc range we're searching.
390398
*/
391399
free_extent_state(cached_state);
392400
cached_state = NULL;
393401
if (!loops) {
394-
max_bytes = PAGE_SIZE;
402+
max_bytes = fs_info->sectorsize;
395403
loops = 1;
396404
goto again;
397405
} else {

0 commit comments

Comments
 (0)