Skip to content

Commit 76e4503

Browse files
committed
Merge tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "There's a bunch of performance improvements, most notably the FIEMAP speedup, the new block group tree to speed up mount on large filesystems, more io_uring integration, some sysfs exports and the usual fixes and core updates. Summary: Performance: - outstanding FIEMAP speed improvement - algorithmic change how extents are enumerated leads to orders of magnitude speed boost (uncached and cached) - extent sharing check speedup (2.2x uncached, 3x cached) - add more cancellation points, allowing to interrupt seeking in files with large number of extents - more efficient hole and data seeking (4x uncached, 1.3x cached) - sample results: 256M, 32K extents: 4s -> 29ms (~150x) 512M, 64K extents: 30s -> 59ms (~550x) 1G, 128K extents: 225s -> 120ms (~1800x) - improved inode logging, especially for directories (on dbench workload throughput +25%, max latency -21%) - improved buffered IO, remove redundant extent state tracking, lowering memory consumption and avoiding rb tree traversal - add sysfs tunable to let qgroup temporarily skip exact accounting when deleting snapshot, leading to a speedup but requiring a rescan after that, will be used by snapper - support io_uring and buffered writes, until now it was just for direct IO, with the no-wait semantics implemented in the buffered write path it now works and leads to speed improvement in IOPS (2x), throughput (2.2x), latency (depends, 2x to 150x) - small performance improvements when dropping and searching for extent maps as well as when flushing delalloc in COW mode (throughput +5MB/s) User visible changes: - new incompatible feature block-group-tree adding a dedicated tree for tracking block groups, this allows a much faster load during mount and avoids seeking unlike when it's scattered in the extent tree items - this reduces mount time for many-terabyte sized filesystems - conversion tool will be provided so existing filesystem can also be updated in place - to reduce test matrix and feature combinations requires no-holes and free-space-tree (mkfs defaults since 5.15) - improved reporting of super block corruption detected by scrub - scrub also tries to repair super block and does not wait until next commit - discard stats and tunables are exported in sysfs (/sys/fs/btrfs/FSID/discard) - qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/) - verify that super block was not modified when thawing filesystem Fixes: - FIEMAP fixes - fix extent sharing status, does not depend on the cached status where merged - flush delalloc so compressed extents are reported correctly - fix alignment of VMA for memory mapped files on THP - send: fix failures when processing inodes with no links (orphan files and directories) - fix race between quota enable and quota rescan ioctl - handle more corner cases for read-only compat feature verification - fix missed extent on fsync after dropping extent maps Core: - lockdep annotations to validate various transactions states and state transitions - preliminary support for fs-verity in send - more effective memory use in scrub for subpage where sector is smaller than page - block group caching progress logic has been removed, load is now synchronous - simplify end IO callbacks and bio handling, use chained bios instead of own tracking - add no-wait semantics to several functions (tree search, nocow, flushing, buffered write - cleanups and refactoring MM changes: - export balance_dirty_pages_ratelimited_flags" * tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (177 commits) btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer btrfs: drop extent map range more efficiently btrfs: avoid pointless extent map tree search when flushing delalloc btrfs: remove unnecessary next extent map search btrfs: remove unnecessary NULL pointer checks when searching extent maps btrfs: assert tree is locked when clearing extent map from logging btrfs: remove unnecessary extent map initializations btrfs: remove the refcount warning/check at free_extent_map() btrfs: add helper to replace extent map range with a new extent map btrfs: move open coded extent map tree deletion out of inode eviction btrfs: use cond_resched_rwlock_write() during inode eviction btrfs: use extent_map_end() at btrfs_drop_extent_map_range() btrfs: move btrfs_drop_extent_cache() to extent_map.c btrfs: fix missed extent on fsync after dropping extent maps btrfs: remove stale prototype of btrfs_write_inode btrfs: enable nowait async buffered writes btrfs: assert nowait mode is not used for some btree search functions btrfs: make btrfs_buffered_write nowait compatible btrfs: plumb NOWAIT through the write path btrfs: make lock_and_cleanup_extent_if_need nowait compatible ...
2 parents 4c0ed7d + cbddcc4 commit 76e4503

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+7212
-5242
lines changed

fs/btrfs/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
3131
backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
3232
uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
3333
block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
34-
subpage.o tree-mod-log.o
34+
subpage.o tree-mod-log.o extent-io-tree.o
3535

3636
btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
3737
btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o

fs/btrfs/backref.c

Lines changed: 145 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1511,16 +1511,118 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
15111511
return ret;
15121512
}
15131513

1514-
/**
1515-
* Check if an extent is shared or not
1514+
/*
1515+
* The caller has joined a transaction or is holding a read lock on the
1516+
* fs_info->commit_root_sem semaphore, so no need to worry about the root's last
1517+
* snapshot field changing while updating or checking the cache.
1518+
*/
1519+
static bool lookup_backref_shared_cache(struct btrfs_backref_shared_cache *cache,
1520+
struct btrfs_root *root,
1521+
u64 bytenr, int level, bool *is_shared)
1522+
{
1523+
struct btrfs_backref_shared_cache_entry *entry;
1524+
1525+
if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL))
1526+
return false;
1527+
1528+
/*
1529+
* Level -1 is used for the data extent, which is not reliable to cache
1530+
* because its reference count can increase or decrease without us
1531+
* realizing. We cache results only for extent buffers that lead from
1532+
* the root node down to the leaf with the file extent item.
1533+
*/
1534+
ASSERT(level >= 0);
1535+
1536+
entry = &cache->entries[level];
1537+
1538+
/* Unused cache entry or being used for some other extent buffer. */
1539+
if (entry->bytenr != bytenr)
1540+
return false;
1541+
1542+
/*
1543+
* We cached a false result, but the last snapshot generation of the
1544+
* root changed, so we now have a snapshot. Don't trust the result.
1545+
*/
1546+
if (!entry->is_shared &&
1547+
entry->gen != btrfs_root_last_snapshot(&root->root_item))
1548+
return false;
1549+
1550+
/*
1551+
* If we cached a true result and the last generation used for dropping
1552+
* a root changed, we can not trust the result, because the dropped root
1553+
* could be a snapshot sharing this extent buffer.
1554+
*/
1555+
if (entry->is_shared &&
1556+
entry->gen != btrfs_get_last_root_drop_gen(root->fs_info))
1557+
return false;
1558+
1559+
*is_shared = entry->is_shared;
1560+
1561+
return true;
1562+
}
1563+
1564+
/*
1565+
* The caller has joined a transaction or is holding a read lock on the
1566+
* fs_info->commit_root_sem semaphore, so no need to worry about the root's last
1567+
* snapshot field changing while updating or checking the cache.
1568+
*/
1569+
static void store_backref_shared_cache(struct btrfs_backref_shared_cache *cache,
1570+
struct btrfs_root *root,
1571+
u64 bytenr, int level, bool is_shared)
1572+
{
1573+
struct btrfs_backref_shared_cache_entry *entry;
1574+
u64 gen;
1575+
1576+
if (WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL))
1577+
return;
1578+
1579+
/*
1580+
* Level -1 is used for the data extent, which is not reliable to cache
1581+
* because its reference count can increase or decrease without us
1582+
* realizing. We cache results only for extent buffers that lead from
1583+
* the root node down to the leaf with the file extent item.
1584+
*/
1585+
ASSERT(level >= 0);
1586+
1587+
if (is_shared)
1588+
gen = btrfs_get_last_root_drop_gen(root->fs_info);
1589+
else
1590+
gen = btrfs_root_last_snapshot(&root->root_item);
1591+
1592+
entry = &cache->entries[level];
1593+
entry->bytenr = bytenr;
1594+
entry->is_shared = is_shared;
1595+
entry->gen = gen;
1596+
1597+
/*
1598+
* If we found an extent buffer is shared, set the cache result for all
1599+
* extent buffers below it to true. As nodes in the path are COWed,
1600+
* their sharedness is moved to their children, and if a leaf is COWed,
1601+
* then the sharedness of a data extent becomes direct, the refcount of
1602+
* data extent is increased in the extent item at the extent tree.
1603+
*/
1604+
if (is_shared) {
1605+
for (int i = 0; i < level; i++) {
1606+
entry = &cache->entries[i];
1607+
entry->is_shared = is_shared;
1608+
entry->gen = gen;
1609+
}
1610+
}
1611+
}
1612+
1613+
/*
1614+
* Check if a data extent is shared or not.
15161615
*
1517-
* @root: root inode belongs to
1518-
* @inum: inode number of the inode whose extent we are checking
1519-
* @bytenr: logical bytenr of the extent we are checking
1520-
* @roots: list of roots this extent is shared among
1521-
* @tmp: temporary list used for iteration
1616+
* @root: The root the inode belongs to.
1617+
* @inum: Number of the inode whose extent we are checking.
1618+
* @bytenr: Logical bytenr of the extent we are checking.
1619+
* @extent_gen: Generation of the extent (file extent item) or 0 if it is
1620+
* not known.
1621+
* @roots: List of roots this extent is shared among.
1622+
* @tmp: Temporary list used for iteration.
1623+
* @cache: A backref lookup result cache.
15221624
*
1523-
* btrfs_check_shared uses the backref walking code but will short
1625+
* btrfs_is_data_extent_shared uses the backref walking code but will short
15241626
* circuit as soon as it finds a root or inode that doesn't match the
15251627
* one passed in. This provides a significant performance benefit for
15261628
* callers (such as fiemap) which want to know whether the extent is
@@ -1531,8 +1633,10 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
15311633
*
15321634
* Return: 0 if extent is not shared, 1 if it is shared, < 0 on error.
15331635
*/
1534-
int btrfs_check_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
1535-
struct ulist *roots, struct ulist *tmp)
1636+
int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
1637+
u64 extent_gen,
1638+
struct ulist *roots, struct ulist *tmp,
1639+
struct btrfs_backref_shared_cache *cache)
15361640
{
15371641
struct btrfs_fs_info *fs_info = root->fs_info;
15381642
struct btrfs_trans_handle *trans;
@@ -1545,6 +1649,7 @@ int btrfs_check_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
15451649
.inum = inum,
15461650
.share_count = 0,
15471651
};
1652+
int level;
15481653

15491654
ulist_init(roots);
15501655
ulist_init(tmp);
@@ -1561,22 +1666,52 @@ int btrfs_check_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
15611666
btrfs_get_tree_mod_seq(fs_info, &elem);
15621667
}
15631668

1669+
/* -1 means we are in the bytenr of the data extent. */
1670+
level = -1;
15641671
ULIST_ITER_INIT(&uiter);
15651672
while (1) {
1673+
bool is_shared;
1674+
bool cached;
1675+
15661676
ret = find_parent_nodes(trans, fs_info, bytenr, elem.seq, tmp,
15671677
roots, NULL, &shared, false);
15681678
if (ret == BACKREF_FOUND_SHARED) {
15691679
/* this is the only condition under which we return 1 */
15701680
ret = 1;
1681+
if (level >= 0)
1682+
store_backref_shared_cache(cache, root, bytenr,
1683+
level, true);
15711684
break;
15721685
}
15731686
if (ret < 0 && ret != -ENOENT)
15741687
break;
15751688
ret = 0;
1689+
/*
1690+
* If our data extent is not shared through reflinks and it was
1691+
* created in a generation after the last one used to create a
1692+
* snapshot of the inode's root, then it can not be shared
1693+
* indirectly through subtrees, as that can only happen with
1694+
* snapshots. In this case bail out, no need to check for the
1695+
* sharedness of extent buffers.
1696+
*/
1697+
if (level == -1 &&
1698+
extent_gen > btrfs_root_last_snapshot(&root->root_item))
1699+
break;
1700+
1701+
if (level >= 0)
1702+
store_backref_shared_cache(cache, root, bytenr,
1703+
level, false);
15761704
node = ulist_next(tmp, &uiter);
15771705
if (!node)
15781706
break;
15791707
bytenr = node->val;
1708+
level++;
1709+
cached = lookup_backref_shared_cache(cache, root, bytenr, level,
1710+
&is_shared);
1711+
if (cached) {
1712+
ret = (is_shared ? 1 : 0);
1713+
break;
1714+
}
15801715
shared.share_count = 0;
15811716
cond_resched();
15821717
}

fs/btrfs/backref.h

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,20 @@ struct inode_fs_paths {
1717
struct btrfs_data_container *fspath;
1818
};
1919

20+
struct btrfs_backref_shared_cache_entry {
21+
u64 bytenr;
22+
u64 gen;
23+
bool is_shared;
24+
};
25+
26+
struct btrfs_backref_shared_cache {
27+
/*
28+
* A path from a root to a leaf that has a file extent item pointing to
29+
* a given data extent should never exceed the maximum b+tree height.
30+
*/
31+
struct btrfs_backref_shared_cache_entry entries[BTRFS_MAX_LEVEL];
32+
};
33+
2034
typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,
2135
void *ctx);
2236

@@ -62,8 +76,10 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
6276
u64 start_off, struct btrfs_path *path,
6377
struct btrfs_inode_extref **ret_extref,
6478
u64 *found_off);
65-
int btrfs_check_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
66-
struct ulist *roots, struct ulist *tmp_ulist);
79+
int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
80+
u64 extent_gen,
81+
struct ulist *roots, struct ulist *tmp,
82+
struct btrfs_backref_shared_cache *cache);
6783

6884
int __init btrfs_prelim_ref_init(void);
6985
void __cold btrfs_prelim_ref_exit(void);

0 commit comments

Comments
 (0)