Skip to content

Commit 90a71da

Browse files
Brian Fostercmaiolino
authored andcommitted
xfs: skip background cowblock trims on inodes open for write
The background blockgc scanner runs on a 5m interval by default and trims preallocation (post-eof and cow fork) from inodes that are otherwise idle. Idle effectively means that iolock can be acquired without blocking and that the inode has no dirty pagecache or I/O in flight. This simple mechanism and heuristic has worked fairly well for post-eof speculative preallocations. Support for reflink and COW fork preallocations came sometime later and plugged into the same mechanism, with similar heuristics. Some recent testing has shown that COW fork preallocation may be notably more sensitive to blockgc processing than post-eof preallocation, however. For example, consider an 8GB reflinked file with a COW extent size hint of 1MB. A worst case fully randomized overwrite of this file results in ~8k extents of an average size of ~1MB. If the same workload is interrupted a couple times for blockgc processing (assuming the file goes idle), the resulting extent count explodes to over 100k extents with an average size <100kB. This is significantly worse than ideal and essentially defeats the COW extent size hint mechanism. While this particular test is instrumented, it reflects a fairly reasonable pattern in practice where random I/Os might spread out over a large period of time with varying periods of (in)activity. For example, consider a cloned disk image file for a VM or container with long uptime and variable and bursty usage. A background blockgc scan that races and processes the image file when it happens to be clean and idle can have a significant effect on the future fragmentation level of the file, even when still in use. To help combat this, update the heuristic to skip cowblocks inodes that are currently opened for write access during non-sync blockgc scans. This allows COW fork preallocations to persist for as long as possible unless otherwise needed for functional purposes (i.e. a sync scan), the file is idle and closed, or the inode is being evicted from cache. While here, update the comments to help distinguish performance oriented heuristics from the logic that exists to maintain functional correctness. Suggested-by: Darrick Wong <[email protected]> Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
1 parent 6aac770 commit 90a71da

File tree

1 file changed

+23
-8
lines changed

1 file changed

+23
-8
lines changed

fs/xfs/xfs_icache.c

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1280,14 +1280,17 @@ xfs_inode_clear_eofblocks_tag(
12801280
}
12811281

12821282
/*
1283-
* Set ourselves up to free CoW blocks from this file. If it's already clean
1284-
* then we can bail out quickly, but otherwise we must back off if the file
1285-
* is undergoing some kind of write.
1283+
* Prepare to free COW fork blocks from an inode.
12861284
*/
12871285
static bool
12881286
xfs_prep_free_cowblocks(
1289-
struct xfs_inode *ip)
1287+
struct xfs_inode *ip,
1288+
struct xfs_icwalk *icw)
12901289
{
1290+
bool sync;
1291+
1292+
sync = icw && (icw->icw_flags & XFS_ICWALK_FLAG_SYNC);
1293+
12911294
/*
12921295
* Just clear the tag if we have an empty cow fork or none at all. It's
12931296
* possible the inode was fully unshared since it was originally tagged.
@@ -1299,9 +1302,21 @@ xfs_prep_free_cowblocks(
12991302
}
13001303

13011304
/*
1302-
* If the mapping is dirty or under writeback we cannot touch the
1303-
* CoW fork. Leave it alone if we're in the midst of a directio.
1305+
* A cowblocks trim of an inode can have a significant effect on
1306+
* fragmentation even when a reasonable COW extent size hint is set.
1307+
* Therefore, we prefer to not process cowblocks unless they are clean
1308+
* and idle. We can never process a cowblocks inode that is dirty or has
1309+
* in-flight I/O under any circumstances, because outstanding writeback
1310+
* or dio expects targeted COW fork blocks exist through write
1311+
* completion where they can be remapped into the data fork.
1312+
*
1313+
* Therefore, the heuristic used here is to never process inodes
1314+
* currently opened for write from background (i.e. non-sync) scans. For
1315+
* sync scans, use the pagecache/dio state of the inode to ensure we
1316+
* never free COW fork blocks out from under pending I/O.
13041317
*/
1318+
if (!sync && inode_is_open_for_write(VFS_I(ip)))
1319+
return false;
13051320
if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
13061321
mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
13071322
mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
@@ -1337,7 +1352,7 @@ xfs_inode_free_cowblocks(
13371352
if (!xfs_iflags_test(ip, XFS_ICOWBLOCKS))
13381353
return 0;
13391354

1340-
if (!xfs_prep_free_cowblocks(ip))
1355+
if (!xfs_prep_free_cowblocks(ip, icw))
13411356
return 0;
13421357

13431358
if (!xfs_icwalk_match(ip, icw))
@@ -1366,7 +1381,7 @@ xfs_inode_free_cowblocks(
13661381
* Check again, nobody else should be able to dirty blocks or change
13671382
* the reflink iflag now that we have the first two locks held.
13681383
*/
1369-
if (xfs_prep_free_cowblocks(ip))
1384+
if (xfs_prep_free_cowblocks(ip, icw))
13701385
ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
13711386
return ret;
13721387
}

0 commit comments

Comments
 (0)