Skip to content

Commit b3f391f

Browse files
committed
Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs
Pull bcachefs updates from Kent Overstreet: - rcu_pending, btree key cache rework: this solves lock contenting in the key cache, eliminating the biggest source of the srcu lock hold time warnings, and drastically improving performance on some metadata heavy workloads - on multithreaded creates we're now 3-4x faster than xfs. - We're now using an rhashtable instead of the system inode hash table; this is another significant performance improvement on multithreaded metadata workloads, eliminating more lock contention. - for_each_btree_key_in_subvolume_upto(): new helper for iterating over keys within a specific subvolume, eliminating a lot of open coded "subvolume_get_snapshot()" and also fixing another source of srcu lock time warnings, by running each loop iteration in its own transaction (as the existing for_each_btree_key() does). - More work on btree_trans locking asserts; we now assert that we don't hold btree node locks when trans->locked is false, which is important because we don't use lockdep for tracking individual btree node locks. - Some cleanups and improvements in the bset.c btree node lookup code, from Alan. - Rework of btree node pinning, which we use in backpointers fsck. The old hacky implementation, where the shrinker just skipped over nodes in the pinned range, was causing OOMs; instead we now use another shrinker with a much higher seeks number for pinned nodes. - Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue where rebalance would sometimes fall back to allocating from the full filesystem, which is not what we want when it's trying to move data to a specific target. - Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache allocations. - Idmap mounts are now supported (Hongbo Li) - Rename whiteouts are now supported (Hongbo Li) - Erasure coding can now handle devices being marked as failed, or forcibly removed. We still need the evacuate path for erasure coding, but it's getting very close to ready for people to start using. * tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits) bcachefs: return err ptr instead of null in read sb clean bcachefs: Remove duplicated include in backpointers.c bcachefs: Don't drop devices with stripe pointers bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices bcachefs: bch_fs.rw_devs_change_count bcachefs: bch2_dev_remove_stripes() bcachefs: bch2_trigger_ptr() calculates sectors even when no device bcachefs: improve error messages in bch2_ec_read_extent() bcachefs: improve error message on too few devices for ec bcachefs: improve bch2_new_stripe_to_text() bcachefs: ec_stripe_head.nr_created bcachefs: bch_stripe.disk_label bcachefs: stripe_to_mem() bcachefs: EIO errcode cleanup bcachefs: Rework btree node pinning bcachefs: split up btree cache counters for live, freeable bcachefs: btree cache counters should be size_t bcachefs: Don't count "skipped access bit" as touched in btree cache scan bcachefs: Failed devices no longer require mounting in degraded mode bcachefs: bch2_dev_rcu_noerror() ...
2 parents f8ffbc3 + 025c55a commit b3f391f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+3155
-1690
lines changed

Documentation/filesystems/bcachefs/CodingStyle.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ errors in our thinking by running our code and seeing what happens. If your
175175
time is being wasted because your tools are bad or too slow - don't accept it,
176176
fix it.
177177

178-
Put effort into your documentation, commmit messages, and code comments - but
178+
Put effort into your documentation, commit messages, and code comments - but
179179
don't go overboard. A good commit message is wonderful - but if the information
180180
was important enough to go in a commit message, ask yourself if it would be
181181
even better as a code comment.

fs/bcachefs/Kconfig

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,13 @@ config BCACHEFS_SIX_OPTIMISTIC_SPIN
8787
is held by another thread, spin for a short while, as long as the
8888
thread owning the lock is running.
8989

90+
config BCACHEFS_PATH_TRACEPOINTS
91+
bool "Extra btree_path tracepoints"
92+
depends on BCACHEFS_FS
93+
help
94+
Enable extra tracepoints for debugging btree_path operations; we don't
95+
normally want these enabled because they happen at very high rates.
96+
9097
config MEAN_AND_VARIANCE_UNIT_TEST
9198
tristate "mean_and_variance unit tests" if !KUNIT_ALL_TESTS
9299
depends on KUNIT

fs/bcachefs/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ bcachefs-y := \
6969
printbuf.o \
7070
quota.o \
7171
rebalance.o \
72+
rcu_pending.o \
7273
recovery.o \
7374
recovery_passes.o \
7475
reflink.o \

fs/bcachefs/acl.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,7 @@ int bch2_set_acl(struct mnt_idmap *idmap,
361361
bch2_trans_begin(trans);
362362
acl = _acl;
363363

364-
ret = bch2_subvol_is_ro_trans(trans, inode->ei_subvol) ?:
364+
ret = bch2_subvol_is_ro_trans(trans, inode->ei_inum.subvol) ?:
365365
bch2_inode_peek(trans, &inode_iter, &inode_u, inode_inum(inode),
366366
BTREE_ITER_intent);
367367
if (ret)

fs/bcachefs/alloc_background.c

Lines changed: 40 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#include <linux/rcupdate.h>
3131
#include <linux/sched/task.h>
3232
#include <linux/sort.h>
33+
#include <linux/jiffies.h>
3334

3435
static void bch2_discard_one_bucket_fast(struct bch_dev *, u64);
3536

@@ -2183,7 +2184,7 @@ int bch2_dev_freespace_init(struct bch_fs *c, struct bch_dev *ca,
21832184
* freespace/need_discard/need_gc_gens btrees as needed:
21842185
*/
21852186
while (1) {
2186-
if (last_updated + HZ * 10 < jiffies) {
2187+
if (time_after(jiffies, last_updated + HZ * 10)) {
21872188
bch_info(ca, "%s: currently at %llu/%llu",
21882189
__func__, iter.pos.offset, ca->mi.nbuckets);
21892190
last_updated = jiffies;
@@ -2297,6 +2298,36 @@ int bch2_fs_freespace_init(struct bch_fs *c)
22972298
return 0;
22982299
}
22992300

2301+
/* device removal */
2302+
2303+
int bch2_dev_remove_alloc(struct bch_fs *c, struct bch_dev *ca)
2304+
{
2305+
struct bpos start = POS(ca->dev_idx, 0);
2306+
struct bpos end = POS(ca->dev_idx, U64_MAX);
2307+
int ret;
2308+
2309+
/*
2310+
* We clear the LRU and need_discard btrees first so that we don't race
2311+
* with bch2_do_invalidates() and bch2_do_discards()
2312+
*/
2313+
ret = bch2_dev_remove_stripes(c, ca->dev_idx) ?:
2314+
bch2_btree_delete_range(c, BTREE_ID_lru, start, end,
2315+
BTREE_TRIGGER_norun, NULL) ?:
2316+
bch2_btree_delete_range(c, BTREE_ID_need_discard, start, end,
2317+
BTREE_TRIGGER_norun, NULL) ?:
2318+
bch2_btree_delete_range(c, BTREE_ID_freespace, start, end,
2319+
BTREE_TRIGGER_norun, NULL) ?:
2320+
bch2_btree_delete_range(c, BTREE_ID_backpointers, start, end,
2321+
BTREE_TRIGGER_norun, NULL) ?:
2322+
bch2_btree_delete_range(c, BTREE_ID_bucket_gens, start, end,
2323+
BTREE_TRIGGER_norun, NULL) ?:
2324+
bch2_btree_delete_range(c, BTREE_ID_alloc, start, end,
2325+
BTREE_TRIGGER_norun, NULL) ?:
2326+
bch2_dev_usage_remove(c, ca->dev_idx);
2327+
bch_err_msg(ca, ret, "removing dev alloc info");
2328+
return ret;
2329+
}
2330+
23002331
/* Bucket IO clocks: */
23012332

23022333
int bch2_bucket_io_time_reset(struct btree_trans *trans, unsigned dev,
@@ -2432,13 +2463,15 @@ static bool bch2_dev_has_open_write_point(struct bch_fs *c, struct bch_dev *ca)
24322463
/* device goes ro: */
24332464
void bch2_dev_allocator_remove(struct bch_fs *c, struct bch_dev *ca)
24342465
{
2435-
unsigned i;
2466+
lockdep_assert_held(&c->state_lock);
24362467

24372468
/* First, remove device from allocation groups: */
24382469

2439-
for (i = 0; i < ARRAY_SIZE(c->rw_devs); i++)
2470+
for (unsigned i = 0; i < ARRAY_SIZE(c->rw_devs); i++)
24402471
clear_bit(ca->dev_idx, c->rw_devs[i].d);
24412472

2473+
c->rw_devs_change_count++;
2474+
24422475
/*
24432476
* Capacity is calculated based off of devices in allocation groups:
24442477
*/
@@ -2467,11 +2500,13 @@ void bch2_dev_allocator_remove(struct bch_fs *c, struct bch_dev *ca)
24672500
/* device goes rw: */
24682501
void bch2_dev_allocator_add(struct bch_fs *c, struct bch_dev *ca)
24692502
{
2470-
unsigned i;
2503+
lockdep_assert_held(&c->state_lock);
24712504

2472-
for (i = 0; i < ARRAY_SIZE(c->rw_devs); i++)
2505+
for (unsigned i = 0; i < ARRAY_SIZE(c->rw_devs); i++)
24732506
if (ca->mi.data_allowed & (1 << i))
24742507
set_bit(ca->dev_idx, c->rw_devs[i].d);
2508+
2509+
c->rw_devs_change_count++;
24752510
}
24762511

24772512
void bch2_dev_allocator_background_exit(struct bch_dev *ca)

fs/bcachefs/alloc_background.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ enum bch_validate_flags;
1616
static inline bool bch2_dev_bucket_exists(struct bch_fs *c, struct bpos pos)
1717
{
1818
rcu_read_lock();
19-
struct bch_dev *ca = bch2_dev_rcu(c, pos.inode);
19+
struct bch_dev *ca = bch2_dev_rcu_noerror(c, pos.inode);
2020
bool ret = ca && bucket_valid(ca, pos.offset);
2121
rcu_read_unlock();
2222
return ret;
@@ -338,6 +338,7 @@ static inline const struct bch_backpointer *alloc_v4_backpointers_c(const struct
338338

339339
int bch2_dev_freespace_init(struct bch_fs *, struct bch_dev *, u64, u64);
340340
int bch2_fs_freespace_init(struct bch_fs *);
341+
int bch2_dev_remove_alloc(struct bch_fs *, struct bch_dev *);
341342

342343
void bch2_recalc_capacity(struct bch_fs *);
343344
u64 bch2_min_rw_member_capacity(struct bch_fs *);

fs/bcachefs/alloc_foreground.c

Lines changed: 24 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -600,6 +600,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
600600
enum bch_watermark watermark,
601601
enum bch_data_type data_type,
602602
struct closure *cl,
603+
bool nowait,
603604
struct bch_dev_usage *usage)
604605
{
605606
struct bch_fs *c = trans->c;
@@ -609,7 +610,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
609610
struct bucket_alloc_state s = {
610611
.btree_bitmap = data_type == BCH_DATA_btree,
611612
};
612-
bool waiting = false;
613+
bool waiting = nowait;
613614
again:
614615
bch2_dev_usage_read_fast(ca, usage);
615616
avail = dev_buckets_free(ca, *usage, watermark);
@@ -685,7 +686,7 @@ struct open_bucket *bch2_bucket_alloc(struct bch_fs *c, struct bch_dev *ca,
685686

686687
bch2_trans_do(c, NULL, NULL, 0,
687688
PTR_ERR_OR_ZERO(ob = bch2_bucket_alloc_trans(trans, ca, watermark,
688-
data_type, cl, &usage)));
689+
data_type, cl, false, &usage)));
689690
return ob;
690691
}
691692

@@ -748,7 +749,6 @@ static int add_new_bucket(struct bch_fs *c,
748749
unsigned nr_replicas,
749750
unsigned *nr_effective,
750751
bool *have_cache,
751-
unsigned flags,
752752
struct open_bucket *ob)
753753
{
754754
unsigned durability = ob_dev(c, ob)->mi.durability;
@@ -775,7 +775,7 @@ int bch2_bucket_alloc_set_trans(struct btree_trans *trans,
775775
unsigned nr_replicas,
776776
unsigned *nr_effective,
777777
bool *have_cache,
778-
unsigned flags,
778+
enum bch_write_flags flags,
779779
enum bch_data_type data_type,
780780
enum bch_watermark watermark,
781781
struct closure *cl)
@@ -801,7 +801,8 @@ int bch2_bucket_alloc_set_trans(struct btree_trans *trans,
801801
continue;
802802
}
803803

804-
ob = bch2_bucket_alloc_trans(trans, ca, watermark, data_type, cl, &usage);
804+
ob = bch2_bucket_alloc_trans(trans, ca, watermark, data_type,
805+
cl, flags & BCH_WRITE_ALLOC_NOWAIT, &usage);
805806
if (!IS_ERR(ob))
806807
bch2_dev_stripe_increment_inlined(ca, stripe, &usage);
807808
bch2_dev_put(ca);
@@ -815,7 +816,7 @@ int bch2_bucket_alloc_set_trans(struct btree_trans *trans,
815816

816817
if (add_new_bucket(c, ptrs, devs_may_alloc,
817818
nr_replicas, nr_effective,
818-
have_cache, flags, ob)) {
819+
have_cache, ob)) {
819820
ret = 0;
820821
break;
821822
}
@@ -841,7 +842,7 @@ static int bucket_alloc_from_stripe(struct btree_trans *trans,
841842
unsigned *nr_effective,
842843
bool *have_cache,
843844
enum bch_watermark watermark,
844-
unsigned flags,
845+
enum bch_write_flags flags,
845846
struct closure *cl)
846847
{
847848
struct bch_fs *c = trans->c;
@@ -883,7 +884,7 @@ static int bucket_alloc_from_stripe(struct btree_trans *trans,
883884

884885
ret = add_new_bucket(c, ptrs, devs_may_alloc,
885886
nr_replicas, nr_effective,
886-
have_cache, flags, ob);
887+
have_cache, ob);
887888
out_put_head:
888889
bch2_ec_stripe_head_put(c, h);
889890
return ret;
@@ -922,7 +923,7 @@ static int bucket_alloc_set_writepoint(struct bch_fs *c,
922923
unsigned nr_replicas,
923924
unsigned *nr_effective,
924925
bool *have_cache,
925-
bool ec, unsigned flags)
926+
bool ec)
926927
{
927928
struct open_buckets ptrs_skip = { .nr = 0 };
928929
struct open_bucket *ob;
@@ -934,7 +935,7 @@ static int bucket_alloc_set_writepoint(struct bch_fs *c,
934935
have_cache, ec, ob))
935936
ret = add_new_bucket(c, ptrs, devs_may_alloc,
936937
nr_replicas, nr_effective,
937-
have_cache, flags, ob);
938+
have_cache, ob);
938939
else
939940
ob_push(c, &ptrs_skip, ob);
940941
}
@@ -950,8 +951,7 @@ static int bucket_alloc_set_partial(struct bch_fs *c,
950951
unsigned nr_replicas,
951952
unsigned *nr_effective,
952953
bool *have_cache, bool ec,
953-
enum bch_watermark watermark,
954-
unsigned flags)
954+
enum bch_watermark watermark)
955955
{
956956
int i, ret = 0;
957957

@@ -983,7 +983,7 @@ static int bucket_alloc_set_partial(struct bch_fs *c,
983983

984984
ret = add_new_bucket(c, ptrs, devs_may_alloc,
985985
nr_replicas, nr_effective,
986-
have_cache, flags, ob);
986+
have_cache, ob);
987987
if (ret)
988988
break;
989989
}
@@ -1003,7 +1003,7 @@ static int __open_bucket_add_buckets(struct btree_trans *trans,
10031003
unsigned *nr_effective,
10041004
bool *have_cache,
10051005
enum bch_watermark watermark,
1006-
unsigned flags,
1006+
enum bch_write_flags flags,
10071007
struct closure *_cl)
10081008
{
10091009
struct bch_fs *c = trans->c;
@@ -1022,18 +1022,15 @@ static int __open_bucket_add_buckets(struct btree_trans *trans,
10221022
open_bucket_for_each(c, ptrs, ob, i)
10231023
__clear_bit(ob->dev, devs.d);
10241024

1025-
if (erasure_code && ec_open_bucket(c, ptrs))
1026-
return 0;
1027-
10281025
ret = bucket_alloc_set_writepoint(c, ptrs, wp, &devs,
10291026
nr_replicas, nr_effective,
1030-
have_cache, erasure_code, flags);
1027+
have_cache, erasure_code);
10311028
if (ret)
10321029
return ret;
10331030

10341031
ret = bucket_alloc_set_partial(c, ptrs, wp, &devs,
10351032
nr_replicas, nr_effective,
1036-
have_cache, erasure_code, watermark, flags);
1033+
have_cache, erasure_code, watermark);
10371034
if (ret)
10381035
return ret;
10391036

@@ -1074,12 +1071,12 @@ static int open_bucket_add_buckets(struct btree_trans *trans,
10741071
unsigned *nr_effective,
10751072
bool *have_cache,
10761073
enum bch_watermark watermark,
1077-
unsigned flags,
1074+
enum bch_write_flags flags,
10781075
struct closure *cl)
10791076
{
10801077
int ret;
10811078

1082-
if (erasure_code) {
1079+
if (erasure_code && !ec_open_bucket(trans->c, ptrs)) {
10831080
ret = __open_bucket_add_buckets(trans, ptrs, wp,
10841081
devs_have, target, erasure_code,
10851082
nr_replicas, nr_effective, have_cache,
@@ -1376,7 +1373,7 @@ int bch2_alloc_sectors_start_trans(struct btree_trans *trans,
13761373
unsigned nr_replicas,
13771374
unsigned nr_replicas_required,
13781375
enum bch_watermark watermark,
1379-
unsigned flags,
1376+
enum bch_write_flags flags,
13801377
struct closure *cl,
13811378
struct write_point **wp_ret)
13821379
{
@@ -1392,8 +1389,6 @@ int bch2_alloc_sectors_start_trans(struct btree_trans *trans,
13921389
if (!IS_ENABLED(CONFIG_BCACHEFS_ERASURE_CODING))
13931390
erasure_code = false;
13941391

1395-
BUG_ON(flags & BCH_WRITE_ONLY_SPECIFIED_DEVS);
1396-
13971392
BUG_ON(!nr_replicas || !nr_replicas_required);
13981393
retry:
13991394
ptrs.nr = 0;
@@ -1498,11 +1493,12 @@ int bch2_alloc_sectors_start_trans(struct btree_trans *trans,
14981493
try_decrease_writepoints(trans, write_points_nr))
14991494
goto retry;
15001495

1501-
if (bch2_err_matches(ret, BCH_ERR_open_buckets_empty) ||
1496+
if (cl && bch2_err_matches(ret, BCH_ERR_open_buckets_empty))
1497+
ret = -BCH_ERR_bucket_alloc_blocked;
1498+
1499+
if (cl && !(flags & BCH_WRITE_ALLOC_NOWAIT) &&
15021500
bch2_err_matches(ret, BCH_ERR_freelist_empty))
1503-
return cl
1504-
? -BCH_ERR_bucket_alloc_blocked
1505-
: -BCH_ERR_ENOSPC_bucket_alloc;
1501+
ret = -BCH_ERR_bucket_alloc_blocked;
15061502

15071503
return ret;
15081504
}
@@ -1733,13 +1729,6 @@ void bch2_dev_alloc_debug_to_text(struct printbuf *out, struct bch_dev *ca)
17331729
for (unsigned i = 0; i < ARRAY_SIZE(c->open_buckets); i++)
17341730
nr[c->open_buckets[i].data_type]++;
17351731

1736-
printbuf_tabstops_reset(out);
1737-
printbuf_tabstop_push(out, 12);
1738-
printbuf_tabstop_push(out, 16);
1739-
printbuf_tabstop_push(out, 16);
1740-
printbuf_tabstop_push(out, 16);
1741-
printbuf_tabstop_push(out, 16);
1742-
17431732
bch2_dev_usage_to_text(out, ca, &stats);
17441733

17451734
prt_newline(out);

fs/bcachefs/alloc_foreground.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,9 +155,10 @@ static inline bool bch2_bucket_is_open_safe(struct bch_fs *c, unsigned dev, u64
155155
return ret;
156156
}
157157

158+
enum bch_write_flags;
158159
int bch2_bucket_alloc_set_trans(struct btree_trans *, struct open_buckets *,
159160
struct dev_stripe_state *, struct bch_devs_mask *,
160-
unsigned, unsigned *, bool *, unsigned,
161+
unsigned, unsigned *, bool *, enum bch_write_flags,
161162
enum bch_data_type, enum bch_watermark,
162163
struct closure *);
163164

@@ -167,7 +168,7 @@ int bch2_alloc_sectors_start_trans(struct btree_trans *,
167168
struct bch_devs_list *,
168169
unsigned, unsigned,
169170
enum bch_watermark,
170-
unsigned,
171+
enum bch_write_flags,
171172
struct closure *,
172173
struct write_point **);
173174

0 commit comments

Comments
 (0)