Skip to content

Commit 8f2ecee

Browse files
masonclgregkh
authored andcommitted
Btrfs: don't clean dirty pages during buffered writes
commit 7703bdd upstream. During buffered writes, we follow this basic series of steps: again: lock all the pages wait for writeback on all the pages Take the extent range lock wait for ordered extents on the whole range clean all the pages if (copy_from_user_in_atomic() hits a fault) { drop our locks goto again; } dirty all the pages release all the locks The extra waiting, cleaning and locking are there to make sure we don't modify pages in flight to the drive, after they've been crc'd. If some of the pages in the range were already dirty when the write began, and we need to goto again, we create a window where a dirty page has been cleaned and unlocked. It may be reclaimed before we're able to lock it again, which means we'll read the old contents off the drive and lose any modifications that had been pending writeback. We don't actually need to clean the pages. All of the other locking in place makes sure we don't start IO on the pages, so we can just leave them dirty for the duration of the write. Fixes: 73d5931 (the original btrfs merge) CC: [email protected] # v4.4+ Signed-off-by: Chris Mason <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 2974abf commit 8f2ecee

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

fs/btrfs/file.c

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -531,6 +531,14 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages,
531531

532532
end_of_last_block = start_pos + num_bytes - 1;
533533

534+
/*
535+
* The pages may have already been dirty, clear out old accounting so
536+
* we can set things up properly
537+
*/
538+
clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, end_of_last_block,
539+
EXTENT_DIRTY | EXTENT_DELALLOC |
540+
EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached);
541+
534542
if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
535543
if (start_pos >= isize &&
536544
!(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) {
@@ -1500,18 +1508,27 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
15001508
}
15011509
if (ordered)
15021510
btrfs_put_ordered_extent(ordered);
1503-
clear_extent_bit(&inode->io_tree, start_pos, last_pos,
1504-
EXTENT_DIRTY | EXTENT_DELALLOC |
1505-
EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
1506-
0, 0, cached_state);
1511+
15071512
*lockstart = start_pos;
15081513
*lockend = last_pos;
15091514
ret = 1;
15101515
}
15111516

1517+
/*
1518+
* It's possible the pages are dirty right now, but we don't want
1519+
* to clean them yet because copy_from_user may catch a page fault
1520+
* and we might have to fall back to one page at a time. If that
1521+
* happens, we'll unlock these pages and we'd have a window where
1522+
* reclaim could sneak in and drop the once-dirty page on the floor
1523+
* without writing it.
1524+
*
1525+
* We have the pages locked and the extent range locked, so there's
1526+
* no way someone can start IO on any dirty pages in this range.
1527+
*
1528+
* We'll call btrfs_dirty_pages() later on, and that will flip around
1529+
* delalloc bits and dirty the pages as required.
1530+
*/
15121531
for (i = 0; i < num_pages; i++) {
1513-
if (clear_page_dirty_for_io(pages[i]))
1514-
account_page_redirty(pages[i]);
15151532
set_page_extent_mapped(pages[i]);
15161533
WARN_ON(!PageLocked(pages[i]));
15171534
}

0 commit comments

Comments
 (0)