Skip to content

Commit 6dec9f4

Browse files
committed
Merge tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "We don't have any big feature updates this time, there are lots of small enhacements or fixes. A highlight perhaps is the parallel fsync performance improvements, numbers below. Regarding the dio/iomap that was reverted last time, the required API changes are likely to land in the upcoming cycle, the btrfs part will be updated afterwards. User visible changes: - new mount option rescue= to group all recovery-related mount options so we don't have many specific options, currently introducing only aliases for existing options, future extensions are in development to allow read-only mount with partially damaged structures: - usebackuproot is an alias for rescue=usebackuproot - nologreplay is an alias for rescue=nologreplay - start deprecation of mount option inode_cache, removal scheduled to v5.11 - removed deprecated mount options alloc_start and subvolrootid - device stats corruption counter gets incremented when a checksum mismatch is found - qgroup information exported in /sys/fs/btrfs/<UUID>/qgroups/<id> using sysfs - add link /sys/fs/btrfs/<UUID>/bdi pointing to the associated backing dev info - FS_INFO ioctl enhancements: - add flags to request/describe newly added items - new item: numeric checksum type and checksum size - new item: generation - new item: metadata_uuid - seed device: with one new read-write device added, print the new device information in /proc/mounts - balance: detect cancellation by Ctrl-C in existing cancellation points Performance improvements: - optimized versions of various helpers on little-endian architectures, where we don't have to do LE/BE conversion from on-disk format - tree-log/fsync optimizations leading to lower max latency reported by dbench, reduced by about 12% - all chunk tree leaves are prefetched at mount time, can improve mount time on large (terabyte-sized) filesystems - speed up parallel fsync of files with reflinked/deduped extents, with jobs 16 to 1024 the throughput gets improved roughly by 50% on average and runtime decreased roughly by 30% on average, notable outlier is 128 jobs with +121.2% on throughput and -54.6% runtime - another speed up of parallel fsync, reduce number of checksum tree lookups and contention, the improvements start to show up with 2 tasks with +20% throughput and -16% runtime up to 64 with +200% throughput and -66% runtime Core: - umount-time qgroup leak checker - qgroups - add a way to unreserve partial range after failure, avoiding some EDQUOT errors - improved flushing logic when EDQUOT is hit - possible EINTR interruption caused by failed reservations after transaction start is better handled and documented - transaction abort errors are unified to EROFS in case it's not the original reason of abort or we don't have other way to determine the reason Fixes: - make truncate succeed on a NOCOW file even if data space is exhausted - fix cancelling balance on filesystem with exhausted metadata space - anon block device: - preallocate anon bdev when subvolume is created to report failure early - shorten time the anon bdev id is allocated - don't allocate anon bdev for internal roots - minor memory leak in ref-verify - refuse invalid combinations of compression and NOCOW file flags - lockdep fixes, updating the device locks - remove obsolete fallback logic for block group profile adjustments when switching from 1 to more devices, causing allocation of unwanted block groups Other cleanups, refactoring, simplifications: - conversions from struct inode to struct btrfs_inode in internal functions - removal of unused struct members" * tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (151 commits) btrfs: do not set the full sync flag on the inode during page release btrfs: release old extent maps during page release btrfs: fix race between page release and a fast fsync btrfs: open-code remount flag setting in btrfs_remount btrfs: if we're restriping, use the target restripe profile btrfs: don't adjust bg flags and use default allocation profiles btrfs: fix lockdep splat from btrfs_dump_space_info btrfs: move the chunk_mutex in btrfs_read_chunk_tree btrfs: open device without device_list_mutex btrfs: sysfs: use NOFS for device creation btrfs: return EROFS for BTRFS_FS_STATE_ERROR cases btrfs: document special case error codes for fs errors btrfs: don't WARN if we abort a transaction with EROFS btrfs: reduce contention on log trees when logging checksums btrfs: remove done label in writepage_delalloc btrfs: add comments for btrfs_reserve_flush_enum btrfs: relocation: review the call sites which can be interrupted by signal btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree btrfs: relocation: allow signal to cancel balance btrfs: raid56: remove out label in __raid56_parity_recover ...
2 parents 92b7e49 + 5e548b3 commit 6dec9f4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1909
-1221
lines changed

fs/btrfs/block-group.c

Lines changed: 73 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,8 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags)
6565
spin_lock(&fs_info->balance_lock);
6666
target = get_restripe_target(fs_info, flags);
6767
if (target) {
68-
/* Pick target profile only if it's already available */
69-
if ((flags & target) & BTRFS_EXTENDED_PROFILE_MASK) {
70-
spin_unlock(&fs_info->balance_lock);
71-
return extended_to_chunk(target);
72-
}
68+
spin_unlock(&fs_info->balance_lock);
69+
return extended_to_chunk(target);
7370
}
7471
spin_unlock(&fs_info->balance_lock);
7572

@@ -118,12 +115,12 @@ u64 btrfs_get_alloc_profile(struct btrfs_fs_info *fs_info, u64 orig_flags)
118115

119116
void btrfs_get_block_group(struct btrfs_block_group *cache)
120117
{
121-
atomic_inc(&cache->count);
118+
refcount_inc(&cache->refs);
122119
}
123120

124121
void btrfs_put_block_group(struct btrfs_block_group *cache)
125122
{
126-
if (atomic_dec_and_test(&cache->count)) {
123+
if (refcount_dec_and_test(&cache->refs)) {
127124
WARN_ON(cache->pinned > 0);
128125
WARN_ON(cache->reserved > 0);
129126

@@ -1111,7 +1108,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
11111108
if (ret < 0)
11121109
goto out;
11131110

1114-
mutex_lock(&fs_info->chunk_mutex);
11151111
spin_lock(&block_group->lock);
11161112
block_group->removed = 1;
11171113
/*
@@ -1143,8 +1139,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
11431139
remove_em = (atomic_read(&block_group->frozen) == 0);
11441140
spin_unlock(&block_group->lock);
11451141

1146-
mutex_unlock(&fs_info->chunk_mutex);
1147-
11481142
if (remove_em) {
11491143
struct extent_map_tree *em_tree;
11501144

@@ -1532,21 +1526,70 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
15321526
spin_unlock(&fs_info->unused_bgs_lock);
15331527
}
15341528

1529+
static int read_bg_from_eb(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
1530+
struct btrfs_path *path)
1531+
{
1532+
struct extent_map_tree *em_tree;
1533+
struct extent_map *em;
1534+
struct btrfs_block_group_item bg;
1535+
struct extent_buffer *leaf;
1536+
int slot;
1537+
u64 flags;
1538+
int ret = 0;
1539+
1540+
slot = path->slots[0];
1541+
leaf = path->nodes[0];
1542+
1543+
em_tree = &fs_info->mapping_tree;
1544+
read_lock(&em_tree->lock);
1545+
em = lookup_extent_mapping(em_tree, key->objectid, key->offset);
1546+
read_unlock(&em_tree->lock);
1547+
if (!em) {
1548+
btrfs_err(fs_info,
1549+
"logical %llu len %llu found bg but no related chunk",
1550+
key->objectid, key->offset);
1551+
return -ENOENT;
1552+
}
1553+
1554+
if (em->start != key->objectid || em->len != key->offset) {
1555+
btrfs_err(fs_info,
1556+
"block group %llu len %llu mismatch with chunk %llu len %llu",
1557+
key->objectid, key->offset, em->start, em->len);
1558+
ret = -EUCLEAN;
1559+
goto out_free_em;
1560+
}
1561+
1562+
read_extent_buffer(leaf, &bg, btrfs_item_ptr_offset(leaf, slot),
1563+
sizeof(bg));
1564+
flags = btrfs_stack_block_group_flags(&bg) &
1565+
BTRFS_BLOCK_GROUP_TYPE_MASK;
1566+
1567+
if (flags != (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
1568+
btrfs_err(fs_info,
1569+
"block group %llu len %llu type flags 0x%llx mismatch with chunk type flags 0x%llx",
1570+
key->objectid, key->offset, flags,
1571+
(BTRFS_BLOCK_GROUP_TYPE_MASK & em->map_lookup->type));
1572+
ret = -EUCLEAN;
1573+
}
1574+
1575+
out_free_em:
1576+
free_extent_map(em);
1577+
return ret;
1578+
}
1579+
15351580
static int find_first_block_group(struct btrfs_fs_info *fs_info,
15361581
struct btrfs_path *path,
15371582
struct btrfs_key *key)
15381583
{
15391584
struct btrfs_root *root = fs_info->extent_root;
1540-
int ret = 0;
1585+
int ret;
15411586
struct btrfs_key found_key;
15421587
struct extent_buffer *leaf;
1543-
struct btrfs_block_group_item bg;
1544-
u64 flags;
15451588
int slot;
15461589

15471590
ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
15481591
if (ret < 0)
1549-
goto out;
1592+
return ret;
15501593

15511594
while (1) {
15521595
slot = path->slots[0];
@@ -1563,49 +1606,10 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info,
15631606

15641607
if (found_key.objectid >= key->objectid &&
15651608
found_key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) {
1566-
struct extent_map_tree *em_tree;
1567-
struct extent_map *em;
1568-
1569-
em_tree = &root->fs_info->mapping_tree;
1570-
read_lock(&em_tree->lock);
1571-
em = lookup_extent_mapping(em_tree, found_key.objectid,
1572-
found_key.offset);
1573-
read_unlock(&em_tree->lock);
1574-
if (!em) {
1575-
btrfs_err(fs_info,
1576-
"logical %llu len %llu found bg but no related chunk",
1577-
found_key.objectid, found_key.offset);
1578-
ret = -ENOENT;
1579-
} else if (em->start != found_key.objectid ||
1580-
em->len != found_key.offset) {
1581-
btrfs_err(fs_info,
1582-
"block group %llu len %llu mismatch with chunk %llu len %llu",
1583-
found_key.objectid, found_key.offset,
1584-
em->start, em->len);
1585-
ret = -EUCLEAN;
1586-
} else {
1587-
read_extent_buffer(leaf, &bg,
1588-
btrfs_item_ptr_offset(leaf, slot),
1589-
sizeof(bg));
1590-
flags = btrfs_stack_block_group_flags(&bg) &
1591-
BTRFS_BLOCK_GROUP_TYPE_MASK;
1592-
1593-
if (flags != (em->map_lookup->type &
1594-
BTRFS_BLOCK_GROUP_TYPE_MASK)) {
1595-
btrfs_err(fs_info,
1596-
"block group %llu len %llu type flags 0x%llx mismatch with chunk type flags 0x%llx",
1597-
found_key.objectid,
1598-
found_key.offset, flags,
1599-
(BTRFS_BLOCK_GROUP_TYPE_MASK &
1600-
em->map_lookup->type));
1601-
ret = -EUCLEAN;
1602-
} else {
1603-
ret = 0;
1604-
}
1605-
}
1606-
free_extent_map(em);
1607-
goto out;
1609+
ret = read_bg_from_eb(fs_info, &found_key, path);
1610+
break;
16081611
}
1612+
16091613
path->slots[0]++;
16101614
}
16111615
out:
@@ -1657,19 +1661,12 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start,
16571661
return -EIO;
16581662

16591663
map = em->map_lookup;
1660-
data_stripe_length = em->len;
1664+
data_stripe_length = em->orig_block_len;
16611665
io_stripe_size = map->stripe_len;
16621666

1663-
if (map->type & BTRFS_BLOCK_GROUP_RAID10)
1664-
data_stripe_length = div_u64(data_stripe_length,
1665-
map->num_stripes / map->sub_stripes);
1666-
else if (map->type & BTRFS_BLOCK_GROUP_RAID0)
1667-
data_stripe_length = div_u64(data_stripe_length, map->num_stripes);
1668-
else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
1669-
data_stripe_length = div_u64(data_stripe_length,
1670-
nr_data_stripes(map));
1667+
/* For RAID5/6 adjust to a full IO stripe length */
1668+
if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
16711669
io_stripe_size = map->stripe_len * nr_data_stripes(map);
1672-
}
16731670

16741671
buf = kcalloc(map->num_stripes, sizeof(u64), GFP_NOFS);
16751672
if (!buf) {
@@ -1748,25 +1745,12 @@ static int exclude_super_stripes(struct btrfs_block_group *cache)
17481745
return ret;
17491746

17501747
while (nr--) {
1751-
u64 start, len;
1752-
1753-
if (logical[nr] > cache->start + cache->length)
1754-
continue;
1755-
1756-
if (logical[nr] + stripe_len <= cache->start)
1757-
continue;
1758-
1759-
start = logical[nr];
1760-
if (start < cache->start) {
1761-
start = cache->start;
1762-
len = (logical[nr] + stripe_len) - start;
1763-
} else {
1764-
len = min_t(u64, stripe_len,
1765-
cache->start + cache->length - start);
1766-
}
1748+
u64 len = min_t(u64, stripe_len,
1749+
cache->start + cache->length - logical[nr]);
17671750

17681751
cache->bytes_super += len;
1769-
ret = btrfs_add_excluded_extent(fs_info, start, len);
1752+
ret = btrfs_add_excluded_extent(fs_info, logical[nr],
1753+
len);
17701754
if (ret) {
17711755
kfree(logical);
17721756
return ret;
@@ -1818,7 +1802,7 @@ static struct btrfs_block_group *btrfs_create_block_group_cache(
18181802

18191803
cache->discard_index = BTRFS_DISCARD_INDEX_UNUSED;
18201804

1821-
atomic_set(&cache->count, 1);
1805+
refcount_set(&cache->refs, 1);
18221806
spin_lock_init(&cache->lock);
18231807
init_rwsem(&cache->data_rwsem);
18241808
INIT_LIST_HEAD(&cache->list);
@@ -2207,54 +2191,6 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
22072191
return 0;
22082192
}
22092193

2210-
static u64 update_block_group_flags(struct btrfs_fs_info *fs_info, u64 flags)
2211-
{
2212-
u64 num_devices;
2213-
u64 stripped;
2214-
2215-
/*
2216-
* if restripe for this chunk_type is on pick target profile and
2217-
* return, otherwise do the usual balance
2218-
*/
2219-
stripped = get_restripe_target(fs_info, flags);
2220-
if (stripped)
2221-
return extended_to_chunk(stripped);
2222-
2223-
num_devices = fs_info->fs_devices->rw_devices;
2224-
2225-
stripped = BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID56_MASK |
2226-
BTRFS_BLOCK_GROUP_RAID1_MASK | BTRFS_BLOCK_GROUP_RAID10;
2227-
2228-
if (num_devices == 1) {
2229-
stripped |= BTRFS_BLOCK_GROUP_DUP;
2230-
stripped = flags & ~stripped;
2231-
2232-
/* turn raid0 into single device chunks */
2233-
if (flags & BTRFS_BLOCK_GROUP_RAID0)
2234-
return stripped;
2235-
2236-
/* turn mirroring into duplication */
2237-
if (flags & (BTRFS_BLOCK_GROUP_RAID1_MASK |
2238-
BTRFS_BLOCK_GROUP_RAID10))
2239-
return stripped | BTRFS_BLOCK_GROUP_DUP;
2240-
} else {
2241-
/* they already had raid on here, just return */
2242-
if (flags & stripped)
2243-
return flags;
2244-
2245-
stripped |= BTRFS_BLOCK_GROUP_DUP;
2246-
stripped = flags & ~stripped;
2247-
2248-
/* switch duplicated blocks with raid1 */
2249-
if (flags & BTRFS_BLOCK_GROUP_DUP)
2250-
return stripped | BTRFS_BLOCK_GROUP_RAID1;
2251-
2252-
/* this is drive concat, leave it alone */
2253-
}
2254-
2255-
return flags;
2256-
}
2257-
22582194
/*
22592195
* Mark one block group RO, can be called several times for the same block
22602196
* group.
@@ -2300,7 +2236,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
23002236
* If we are changing raid levels, try to allocate a
23012237
* corresponding block group with the new raid level.
23022238
*/
2303-
alloc_flags = update_block_group_flags(fs_info, cache->flags);
2239+
alloc_flags = btrfs_get_alloc_profile(fs_info, cache->flags);
23042240
if (alloc_flags != cache->flags) {
23052241
ret = btrfs_chunk_alloc(trans, alloc_flags,
23062242
CHUNK_ALLOC_FORCE);
@@ -2327,7 +2263,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
23272263
ret = inc_block_group_ro(cache, 0);
23282264
out:
23292265
if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) {
2330-
alloc_flags = update_block_group_flags(fs_info, cache->flags);
2266+
alloc_flags = btrfs_get_alloc_profile(fs_info, cache->flags);
23312267
mutex_lock(&fs_info->chunk_mutex);
23322268
check_system_chunk(trans, alloc_flags);
23332269
mutex_unlock(&fs_info->chunk_mutex);
@@ -2521,7 +2457,8 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
25212457
num_pages *= 16;
25222458
num_pages *= PAGE_SIZE;
25232459

2524-
ret = btrfs_check_data_free_space(inode, &data_reserved, 0, num_pages);
2460+
ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, 0,
2461+
num_pages);
25252462
if (ret)
25262463
goto out_put;
25272464

@@ -3392,7 +3329,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
33923329
ASSERT(list_empty(&block_group->dirty_list));
33933330
ASSERT(list_empty(&block_group->io_list));
33943331
ASSERT(list_empty(&block_group->bg_list));
3395-
ASSERT(atomic_read(&block_group->count) == 1);
3332+
ASSERT(refcount_read(&block_group->refs) == 1);
33963333
btrfs_put_block_group(block_group);
33973334

33983335
spin_lock(&info->block_group_cache_lock);
@@ -3447,15 +3384,13 @@ void btrfs_unfreeze_block_group(struct btrfs_block_group *block_group)
34473384
spin_unlock(&block_group->lock);
34483385

34493386
if (cleanup) {
3450-
mutex_lock(&fs_info->chunk_mutex);
34513387
em_tree = &fs_info->mapping_tree;
34523388
write_lock(&em_tree->lock);
34533389
em = lookup_extent_mapping(em_tree, block_group->start,
34543390
1);
34553391
BUG_ON(!em); /* logic error, can't happen */
34563392
remove_extent_mapping(em_tree, em);
34573393
write_unlock(&em_tree->lock);
3458-
mutex_unlock(&fs_info->chunk_mutex);
34593394

34603395
/* once for us and once for the tree */
34613396
free_extent_map(em);

fs/btrfs/block-group.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,8 +114,7 @@ struct btrfs_block_group {
114114
/* For block groups in the same raid type */
115115
struct list_head list;
116116

117-
/* Usage count */
118-
atomic_t count;
117+
refcount_t refs;
119118

120119
/*
121120
* List of struct btrfs_free_clusters for this block group.

fs/btrfs/btrfs_inode.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,17 @@ struct btrfs_inode {
151151
*/
152152
u64 last_unlink_trans;
153153

154+
/*
155+
* The id/generation of the last transaction where this inode was
156+
* either the source or the destination of a clone/dedupe operation.
157+
* Used when logging an inode to know if there are shared extents that
158+
* need special care when logging checksum items, to avoid duplicate
159+
* checksum items in a log (which can lead to a corruption where we end
160+
* up with missing checksum ranges after log replay).
161+
* Protected by the vfs inode lock.
162+
*/
163+
u64 last_reflink_trans;
164+
154165
/*
155166
* Number of bytes outstanding that are going to need csums. This is
156167
* used in ENOSPC accounting.

0 commit comments

Comments
 (0)