Skip to content

Commit c14a8a4

Browse files
committed
Merge tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "Changes outside of btrfs: add io_uring command flag to track a dying task (the rest will go via the block git tree). User visible changes: - wire encoded read (ioctl) to io_uring commands, this can be used on itself, in the future this will allow 'send' to be asynchronous. As a consequence, the encoded read ioctl can also work in non-blocking mode - new ioctl to wait for cleaned subvolumes, no need to use the generic and root-only SEARCH_TREE ioctl, will be used by "btrfs subvol sync" - recognize different paths/symlinks for the same devices and don't report them during rescanning, this can be observed with LVM or DM - seeding device use case change, the sprout device (the one capturing new writes) will not clear the read-only status of the super block; this prevents accumulating space from deleted snapshots Performance improvements: - reduce lock contention when traversing extent buffers - reduce extent tree lock contention when searching for inline backref - switch from rb-trees to xarray for delayed ref tracking, improvements due to better cache locality, branching factors and more compact data structures - enable extent map shrinker again (prevent memory exhaustion under some types of IO load), reworked to run in a single worker thread (there used to be problems causing long stalls under memory pressure) Core changes: - raid-stripe-tree feature updates: - make device replace and scrub work - implement partial deletion of stripe extents - new selftests - split the config option BTRFS_DEBUG and add EXPERIMENTAL for features that are experimental or with known problems so we don't misuse debugging config for that - subpage mode updates (sector < page): - update compression implementations - update writepage, writeback - continued folio API conversions: - buffered writes - make buffered write copy one page at a time, preparatory work for future integration with large folios, may cause performance drop - proper locking of root item regarding starting send - error handling improvements - code cleanups and refactoring: - dead code removal - unused parameter reduction - lockdep assertions" * tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (119 commits) btrfs: send: check for read-only send root under critical section btrfs: send: check for dead send root under critical section btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap() btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl() btrfs: fix a typo in btrfs_use_zone_append btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read() btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapshot() btrfs: remove hole from struct btrfs_delayed_node btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list btrfs: add new ioctl to wait for cleaned subvolumes btrfs: simplify range tracking in cow_file_range() btrfs: remove conditional path allocation in btrfs_read_locked_inode() btrfs: push cleanup into btrfs_read_locked_inode() io_uring/cmd: let cmds to know about dying task btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu() btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl) btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages() btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set btrfs: change btrfs_encoded_read() so that reading of extent is done by caller btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read() ...
2 parents 3e7447a + e82c936 commit c14a8a4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+2460
-1422
lines changed

fs/btrfs/Kconfig

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,32 @@ config BTRFS_ASSERT
7878

7979
If unsure, say N.
8080

81+
config BTRFS_EXPERIMENTAL
82+
bool "Btrfs experimental features"
83+
depends on BTRFS_FS
84+
default n
85+
help
86+
Enable experimental features. These features may not be stable enough
87+
for end users. This is meant for btrfs developers or users who wish
88+
to test the functionality and report problems.
89+
90+
Current list:
91+
92+
- extent map shrinker - performance problems with too frequent shrinks
93+
94+
- send stream protocol v3 - fs-verity support
95+
96+
- checksum offload mode - sysfs knob to affect when checksums are
97+
calculated (at IO time, or in a thread)
98+
99+
- raid-stripe-tree - additional mapping of extents to devices to
100+
support RAID1* profiles on zoned devices,
101+
RAID56 not yet supported
102+
103+
- extent tree v2 - complex rework of extent tracking
104+
105+
If unsure, say N.
106+
81107
config BTRFS_FS_REF_VERIFY
82108
bool "Btrfs with the ref verify tool compiled in"
83109
depends on BTRFS_FS

fs/btrfs/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,5 @@ btrfs-$(CONFIG_FS_VERITY) += verity.o
4343
btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
4444
tests/extent-buffer-tests.o tests/btrfs-tests.o \
4545
tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o \
46-
tests/free-space-tree-tests.o tests/extent-map-tests.o
46+
tests/free-space-tree-tests.o tests/extent-map-tests.o \
47+
tests/raid-stripe-tree-tests.o

fs/btrfs/backref.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1442,7 +1442,8 @@ static int find_parent_nodes(struct btrfs_backref_walk_ctx *ctx,
14421442
*/
14431443
delayed_refs = &ctx->trans->transaction->delayed_refs;
14441444
spin_lock(&delayed_refs->lock);
1445-
head = btrfs_find_delayed_ref_head(delayed_refs, ctx->bytenr);
1445+
head = btrfs_find_delayed_ref_head(ctx->fs_info, delayed_refs,
1446+
ctx->bytenr);
14461447
if (head) {
14471448
if (!mutex_trylock(&head->mutex)) {
14481449
refcount_inc(&head->refs);

fs/btrfs/bio.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -587,7 +587,7 @@ static bool should_async_write(struct btrfs_bio *bbio)
587587
{
588588
bool auto_csum_mode = true;
589589

590-
#ifdef CONFIG_BTRFS_DEBUG
590+
#ifdef CONFIG_BTRFS_EXPERIMENTAL
591591
struct btrfs_fs_devices *fs_devices = bbio->fs_info->fs_devices;
592592
enum btrfs_offload_csum_mode csum_mode = READ_ONCE(fs_devices->offload_csum_mode);
593593

fs/btrfs/block-group.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2797,7 +2797,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
27972797
* uncompressed data size, because the compression is only done
27982798
* when writeback triggered and we don't know how much space we
27992799
* are actually going to need, so we reserve the uncompressed
2800-
* size because the data may be uncompressible in the worst case.
2800+
* size because the data may be incompressible in the worst case.
28012801
*/
28022802
if (ret == 0) {
28032803
bool used;

fs/btrfs/btrfs_inode.h

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -577,7 +577,6 @@ void btrfs_merge_delalloc_extent(struct btrfs_inode *inode, struct extent_state
577577
struct extent_state *other);
578578
void btrfs_split_delalloc_extent(struct btrfs_inode *inode,
579579
struct extent_state *orig, u64 split);
580-
void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end);
581580
void btrfs_evict_inode(struct inode *inode);
582581
struct inode *btrfs_alloc_inode(struct super_block *sb);
583582
void btrfs_destroy_inode(struct inode *inode);
@@ -613,11 +612,17 @@ int btrfs_writepage_cow_fixup(struct folio *folio);
613612
int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info,
614613
int compress_type);
615614
int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
616-
u64 file_offset, u64 disk_bytenr,
617-
u64 disk_io_size,
618-
struct page **pages);
615+
u64 disk_bytenr, u64 disk_io_size,
616+
struct page **pages, void *uring_ctx);
619617
ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
620-
struct btrfs_ioctl_encoded_io_args *encoded);
618+
struct btrfs_ioctl_encoded_io_args *encoded,
619+
struct extent_state **cached_state,
620+
u64 *disk_bytenr, u64 *disk_io_size);
621+
ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter,
622+
u64 start, u64 lockend,
623+
struct extent_state **cached_state,
624+
u64 disk_bytenr, u64 disk_io_size,
625+
size_t count, bool compressed, bool *unlocked);
621626
ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
622627
const struct btrfs_ioctl_encoded_io_args *encoded);
623628

fs/btrfs/compression.c

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -453,7 +453,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
453453
if (pg_index > end_index)
454454
break;
455455

456-
folio = __filemap_get_folio(mapping, pg_index, 0, 0);
456+
folio = filemap_get_folio(mapping, pg_index);
457457
if (!IS_ERR(folio)) {
458458
u64 folio_sz = folio_size(folio);
459459
u64 offset = offset_in_folio(folio, cur);
@@ -545,8 +545,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
545545
* subpage::readers and to unlock the page.
546546
*/
547547
if (fs_info->sectorsize < PAGE_SIZE)
548-
btrfs_subpage_start_reader(fs_info, folio, cur,
549-
add_size);
548+
btrfs_folio_set_lock(fs_info, folio, cur, add_size);
550549
folio_put(folio);
551550
cur += add_size;
552551
}
@@ -702,7 +701,7 @@ static void free_heuristic_ws(struct list_head *ws)
702701
kfree(workspace);
703702
}
704703

705-
static struct list_head *alloc_heuristic_ws(unsigned int level)
704+
static struct list_head *alloc_heuristic_ws(void)
706705
{
707706
struct heuristic_ws *ws;
708707

@@ -744,9 +743,9 @@ static const struct btrfs_compress_op * const btrfs_compress_op[] = {
744743
static struct list_head *alloc_workspace(int type, unsigned int level)
745744
{
746745
switch (type) {
747-
case BTRFS_COMPRESS_NONE: return alloc_heuristic_ws(level);
746+
case BTRFS_COMPRESS_NONE: return alloc_heuristic_ws();
748747
case BTRFS_COMPRESS_ZLIB: return zlib_alloc_workspace(level);
749-
case BTRFS_COMPRESS_LZO: return lzo_alloc_workspace(level);
748+
case BTRFS_COMPRESS_LZO: return lzo_alloc_workspace();
750749
case BTRFS_COMPRESS_ZSTD: return zstd_alloc_workspace(level);
751750
default:
752751
/*
@@ -1030,13 +1029,16 @@ int btrfs_compress_folios(unsigned int type_level, struct address_space *mapping
10301029
{
10311030
int type = btrfs_compress_type(type_level);
10321031
int level = btrfs_compress_level(type_level);
1032+
const unsigned long orig_len = *total_out;
10331033
struct list_head *workspace;
10341034
int ret;
10351035

10361036
level = btrfs_compress_set_level(type, level);
10371037
workspace = get_workspace(type, level);
10381038
ret = compression_compress_pages(type, workspace, mapping, start, folios,
10391039
out_folios, total_in, total_out);
1040+
/* The total read-in bytes should be no larger than the input. */
1041+
ASSERT(*total_in <= orig_len);
10401042
put_workspace(type, workspace);
10411043
return ret;
10421044
}

fs/btrfs/compression.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
175175
int lzo_decompress(struct list_head *ws, const u8 *data_in,
176176
struct folio *dest_folio, unsigned long dest_pgoff, size_t srclen,
177177
size_t destlen);
178-
struct list_head *lzo_alloc_workspace(unsigned int level);
178+
struct list_head *lzo_alloc_workspace(void);
179179
void lzo_free_workspace(struct list_head *ws);
180180

181181
int zstd_compress_folios(struct list_head *ws, struct address_space *mapping,

fs/btrfs/ctree.c

Lines changed: 83 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1508,26 +1508,26 @@ static noinline void unlock_up(struct btrfs_path *path, int level,
15081508
*/
15091509
static int
15101510
read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
1511-
struct extent_buffer **eb_ret, int level, int slot,
1511+
struct extent_buffer **eb_ret, int slot,
15121512
const struct btrfs_key *key)
15131513
{
15141514
struct btrfs_fs_info *fs_info = root->fs_info;
15151515
struct btrfs_tree_parent_check check = { 0 };
15161516
u64 blocknr;
1517-
u64 gen;
1518-
struct extent_buffer *tmp;
1519-
int ret;
1517+
struct extent_buffer *tmp = NULL;
1518+
int ret = 0;
15201519
int parent_level;
1521-
bool unlock_up;
1520+
int err;
1521+
bool read_tmp = false;
1522+
bool tmp_locked = false;
1523+
bool path_released = false;
15221524

1523-
unlock_up = ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]);
15241525
blocknr = btrfs_node_blockptr(*eb_ret, slot);
1525-
gen = btrfs_node_ptr_generation(*eb_ret, slot);
15261526
parent_level = btrfs_header_level(*eb_ret);
15271527
btrfs_node_key_to_cpu(*eb_ret, &check.first_key, slot);
15281528
check.has_first_key = true;
15291529
check.level = parent_level - 1;
1530-
check.transid = gen;
1530+
check.transid = btrfs_node_ptr_generation(*eb_ret, slot);
15311531
check.owner_root = btrfs_root_id(root);
15321532

15331533
/*
@@ -1540,79 +1540,115 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
15401540
tmp = find_extent_buffer(fs_info, blocknr);
15411541
if (tmp) {
15421542
if (p->reada == READA_FORWARD_ALWAYS)
1543-
reada_for_search(fs_info, p, level, slot, key->objectid);
1543+
reada_for_search(fs_info, p, parent_level, slot, key->objectid);
15441544

15451545
/* first we do an atomic uptodate check */
1546-
if (btrfs_buffer_uptodate(tmp, gen, 1) > 0) {
1546+
if (btrfs_buffer_uptodate(tmp, check.transid, 1) > 0) {
15471547
/*
15481548
* Do extra check for first_key, eb can be stale due to
15491549
* being cached, read from scrub, or have multiple
15501550
* parents (shared tree blocks).
15511551
*/
1552-
if (btrfs_verify_level_key(tmp,
1553-
parent_level - 1, &check.first_key, gen)) {
1554-
free_extent_buffer(tmp);
1555-
return -EUCLEAN;
1552+
if (btrfs_verify_level_key(tmp, &check)) {
1553+
ret = -EUCLEAN;
1554+
goto out;
15561555
}
15571556
*eb_ret = tmp;
1558-
return 0;
1557+
tmp = NULL;
1558+
ret = 0;
1559+
goto out;
15591560
}
15601561

15611562
if (p->nowait) {
1562-
free_extent_buffer(tmp);
1563-
return -EAGAIN;
1563+
ret = -EAGAIN;
1564+
goto out;
15641565
}
15651566

1566-
if (unlock_up)
1567-
btrfs_unlock_up_safe(p, level + 1);
1568-
1569-
/* now we're allowed to do a blocking uptodate check */
1570-
ret = btrfs_read_extent_buffer(tmp, &check);
1571-
if (ret) {
1572-
free_extent_buffer(tmp);
1567+
if (!p->skip_locking) {
1568+
btrfs_unlock_up_safe(p, parent_level + 1);
1569+
tmp_locked = true;
1570+
btrfs_tree_read_lock(tmp);
15731571
btrfs_release_path(p);
1574-
return ret;
1572+
ret = -EAGAIN;
1573+
path_released = true;
15751574
}
15761575

1577-
if (unlock_up)
1578-
ret = -EAGAIN;
1576+
/* Now we're allowed to do a blocking uptodate check. */
1577+
err = btrfs_read_extent_buffer(tmp, &check);
1578+
if (err) {
1579+
ret = err;
1580+
goto out;
1581+
}
15791582

1583+
if (ret == 0) {
1584+
ASSERT(!tmp_locked);
1585+
*eb_ret = tmp;
1586+
tmp = NULL;
1587+
}
15801588
goto out;
15811589
} else if (p->nowait) {
1582-
return -EAGAIN;
1590+
ret = -EAGAIN;
1591+
goto out;
15831592
}
15841593

1585-
if (unlock_up) {
1586-
btrfs_unlock_up_safe(p, level + 1);
1594+
if (!p->skip_locking) {
1595+
btrfs_unlock_up_safe(p, parent_level + 1);
15871596
ret = -EAGAIN;
1588-
} else {
1589-
ret = 0;
15901597
}
15911598

15921599
if (p->reada != READA_NONE)
1593-
reada_for_search(fs_info, p, level, slot, key->objectid);
1600+
reada_for_search(fs_info, p, parent_level, slot, key->objectid);
15941601

1595-
tmp = read_tree_block(fs_info, blocknr, &check);
1602+
tmp = btrfs_find_create_tree_block(fs_info, blocknr, check.owner_root, check.level);
15961603
if (IS_ERR(tmp)) {
1604+
ret = PTR_ERR(tmp);
1605+
tmp = NULL;
1606+
goto out;
1607+
}
1608+
read_tmp = true;
1609+
1610+
if (!p->skip_locking) {
1611+
ASSERT(ret == -EAGAIN);
1612+
tmp_locked = true;
1613+
btrfs_tree_read_lock(tmp);
15971614
btrfs_release_path(p);
1598-
return PTR_ERR(tmp);
1615+
path_released = true;
1616+
}
1617+
1618+
/* Now we're allowed to do a blocking uptodate check. */
1619+
err = btrfs_read_extent_buffer(tmp, &check);
1620+
if (err) {
1621+
ret = err;
1622+
goto out;
15991623
}
1624+
16001625
/*
16011626
* If the read above didn't mark this buffer up to date,
16021627
* it will never end up being up to date. Set ret to EIO now
16031628
* and give up so that our caller doesn't loop forever
16041629
* on our EAGAINs.
16051630
*/
1606-
if (!extent_buffer_uptodate(tmp))
1631+
if (!extent_buffer_uptodate(tmp)) {
16071632
ret = -EIO;
1633+
goto out;
1634+
}
16081635

1609-
out:
16101636
if (ret == 0) {
1637+
ASSERT(!tmp_locked);
16111638
*eb_ret = tmp;
1612-
} else {
1613-
free_extent_buffer(tmp);
1614-
btrfs_release_path(p);
1639+
tmp = NULL;
1640+
}
1641+
out:
1642+
if (tmp) {
1643+
if (tmp_locked)
1644+
btrfs_tree_read_unlock(tmp);
1645+
if (read_tmp && ret && ret != -EAGAIN)
1646+
free_extent_buffer_stale(tmp);
1647+
else
1648+
free_extent_buffer(tmp);
16151649
}
1650+
if (ret && !path_released)
1651+
btrfs_release_path(p);
16161652

16171653
return ret;
16181654
}
@@ -2197,8 +2233,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
21972233
goto done;
21982234
}
21992235

2200-
err = read_block_for_search(root, p, &b, level, slot, key);
2201-
if (err == -EAGAIN)
2236+
err = read_block_for_search(root, p, &b, slot, key);
2237+
if (err == -EAGAIN && !p->nowait)
22022238
goto again;
22032239
if (err) {
22042240
ret = err;
@@ -2324,8 +2360,8 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
23242360
goto done;
23252361
}
23262362

2327-
err = read_block_for_search(root, p, &b, level, slot, key);
2328-
if (err == -EAGAIN)
2363+
err = read_block_for_search(root, p, &b, slot, key);
2364+
if (err == -EAGAIN && !p->nowait)
23292365
goto again;
23302366
if (err) {
23312367
ret = err;
@@ -2334,7 +2370,7 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
23342370

23352371
level = btrfs_header_level(b);
23362372
btrfs_tree_read_lock(b);
2337-
b = btrfs_tree_mod_log_rewind(fs_info, p, b, time_seq);
2373+
b = btrfs_tree_mod_log_rewind(fs_info, b, time_seq);
23382374
if (!b) {
23392375
ret = -ENOMEM;
23402376
goto done;
@@ -4930,8 +4966,7 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
49304966
}
49314967

49324968
next = c;
4933-
ret = read_block_for_search(root, path, &next, level,
4934-
slot, &key);
4969+
ret = read_block_for_search(root, path, &next, slot, &key);
49354970
if (ret == -EAGAIN && !path->nowait)
49364971
goto again;
49374972

@@ -4974,8 +5009,7 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
49745009
if (!level)
49755010
break;
49765011

4977-
ret = read_block_for_search(root, path, &next, level,
4978-
0, &key);
5012+
ret = read_block_for_search(root, path, &next, 0, &key);
49795013
if (ret == -EAGAIN && !path->nowait)
49805014
goto again;
49815015

0 commit comments

Comments
 (0)