Skip to content

Commit 1ce433a

Browse files
adam900710kdave
authored andcommitted
btrfs: reloc: unconditionally invalidate the page cache for each cluster
Commit 9d9ea1e ("btrfs: subpage: fix relocation potentially overwriting last page data") fixed a bug when relocating data block groups for subpage cases. However for the incoming large folios for data reloc inode, we can hit the same situation where block size is the same as page size, but the folio we got is still larger than a block. In that case, the old subpage specific check is no longer reliable. Here we have to enhance the handling by: - Unconditionally invalidate the page cache for the current cluster We set the @flush to true so that any dirty folios are properly written back first. And this time instead of dropping the whole page cache, just drop the range covered by the current cluster. This will bring some minor performance drop, as for a large folio, the heading half will be read twice (read by previous cluster, then invalidated, then read again by the current cluster). However that is required to support large folios, and this gets rid of the kinda tricky manual uptodate flag clearing for each block. - Remove the special handling of writing back the whole page cache filemap_invalidate_inode() handles the write back already, and since we're invalidating all pages in the range, we no longer need to manually clear the uptodate flags for involved blocks. Thus there is no need to manually write back the whole page cache. Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 130f33d commit 1ce433a

File tree

1 file changed

+8
-50
lines changed

1 file changed

+8
-50
lines changed

fs/btrfs/relocation.c

Lines changed: 8 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -2666,66 +2666,24 @@ static noinline_for_stack int prealloc_file_extent_cluster(struct reloc_control
26662666
u64 num_bytes;
26672667
int nr;
26682668
int ret = 0;
2669-
u64 i_size = i_size_read(&inode->vfs_inode);
26702669
u64 prealloc_start = cluster->start - offset;
26712670
u64 prealloc_end = cluster->end - offset;
26722671
u64 cur_offset = prealloc_start;
26732672

26742673
/*
2675-
* For subpage case, previous i_size may not be aligned to PAGE_SIZE.
2676-
* This means the range [i_size, PAGE_END + 1) is filled with zeros by
2677-
* btrfs_do_readpage() call of previously relocated file cluster.
2674+
* For blocksize < folio size case (either bs < page size or large folios),
2675+
* beyond i_size, all blocks are filled with zero.
26782676
*
2679-
* If the current cluster starts in the above range, btrfs_do_readpage()
2677+
* If the current cluster covers the above range, btrfs_do_readpage()
26802678
* will skip the read, and relocate_one_folio() will later writeback
26812679
* the padding zeros as new data, causing data corruption.
26822680
*
2683-
* Here we have to manually invalidate the range (i_size, PAGE_END + 1).
2681+
* Here we have to invalidate the cache covering our cluster.
26842682
*/
2685-
if (!PAGE_ALIGNED(i_size)) {
2686-
struct address_space *mapping = inode->vfs_inode.i_mapping;
2687-
struct btrfs_fs_info *fs_info = inode->root->fs_info;
2688-
const u32 sectorsize = fs_info->sectorsize;
2689-
struct folio *folio;
2690-
2691-
ASSERT(sectorsize < PAGE_SIZE);
2692-
ASSERT(IS_ALIGNED(i_size, sectorsize));
2693-
2694-
/*
2695-
* Subpage can't handle page with DIRTY but without UPTODATE
2696-
* bit as it can lead to the following deadlock:
2697-
*
2698-
* btrfs_read_folio()
2699-
* | Page already *locked*
2700-
* |- btrfs_lock_and_flush_ordered_range()
2701-
* |- btrfs_start_ordered_extent()
2702-
* |- extent_write_cache_pages()
2703-
* |- lock_page()
2704-
* We try to lock the page we already hold.
2705-
*
2706-
* Here we just writeback the whole data reloc inode, so that
2707-
* we will be ensured to have no dirty range in the page, and
2708-
* are safe to clear the uptodate bits.
2709-
*
2710-
* This shouldn't cause too much overhead, as we need to write
2711-
* the data back anyway.
2712-
*/
2713-
ret = filemap_write_and_wait(mapping);
2714-
if (ret < 0)
2715-
return ret;
2716-
2717-
folio = filemap_lock_folio(mapping, i_size >> PAGE_SHIFT);
2718-
/*
2719-
* If page is freed we don't need to do anything then, as we
2720-
* will re-read the whole page anyway.
2721-
*/
2722-
if (!IS_ERR(folio)) {
2723-
btrfs_subpage_clear_uptodate(fs_info, folio, i_size,
2724-
round_up(i_size, PAGE_SIZE) - i_size);
2725-
folio_unlock(folio);
2726-
folio_put(folio);
2727-
}
2728-
}
2683+
ret = filemap_invalidate_inode(&inode->vfs_inode, true, prealloc_start,
2684+
prealloc_end);
2685+
if (ret < 0)
2686+
return ret;
27292687

27302688
BUG_ON(cluster->start != cluster->boundary[0]);
27312689
ret = btrfs_alloc_data_chunk_ondemand(inode,

0 commit comments

Comments
 (0)