Skip to content

Commit 511fb5b

Browse files
committed
Merge tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2 parents de16588 + cd4284c commit 511fb5b

File tree

40 files changed

+1043
-629
lines changed

40 files changed

+1043
-629
lines changed

Documentation/filesystems/vfs.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -260,9 +260,11 @@ filesystem. The following members are defined:
260260
void (*evict_inode) (struct inode *);
261261
void (*put_super) (struct super_block *);
262262
int (*sync_fs)(struct super_block *sb, int wait);
263-
int (*freeze_super) (struct super_block *);
263+
int (*freeze_super) (struct super_block *sb,
264+
enum freeze_holder who);
264265
int (*freeze_fs) (struct super_block *);
265-
int (*thaw_super) (struct super_block *);
266+
int (*thaw_super) (struct super_block *sb,
267+
enum freeze_wholder who);
266268
int (*unfreeze_fs) (struct super_block *);
267269
int (*statfs) (struct dentry *, struct kstatfs *);
268270
int (*remount_fs) (struct super_block *, int *, char *);

block/bdev.c

Lines changed: 32 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -206,23 +206,6 @@ int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend)
206206
}
207207
EXPORT_SYMBOL(sync_blockdev_range);
208208

209-
/*
210-
* Write out and wait upon all dirty data associated with this
211-
* device. Filesystem data as well as the underlying block
212-
* device. Takes the superblock lock.
213-
*/
214-
int fsync_bdev(struct block_device *bdev)
215-
{
216-
struct super_block *sb = get_super(bdev);
217-
if (sb) {
218-
int res = sync_filesystem(sb);
219-
drop_super(sb);
220-
return res;
221-
}
222-
return sync_blockdev(bdev);
223-
}
224-
EXPORT_SYMBOL(fsync_bdev);
225-
226209
/**
227210
* freeze_bdev - lock a filesystem and force it into a consistent state
228211
* @bdev: blockdevice to lock
@@ -248,9 +231,9 @@ int freeze_bdev(struct block_device *bdev)
248231
if (!sb)
249232
goto sync;
250233
if (sb->s_op->freeze_super)
251-
error = sb->s_op->freeze_super(sb);
234+
error = sb->s_op->freeze_super(sb, FREEZE_HOLDER_USERSPACE);
252235
else
253-
error = freeze_super(sb);
236+
error = freeze_super(sb, FREEZE_HOLDER_USERSPACE);
254237
deactivate_super(sb);
255238

256239
if (error) {
@@ -291,9 +274,9 @@ int thaw_bdev(struct block_device *bdev)
291274
goto out;
292275

293276
if (sb->s_op->thaw_super)
294-
error = sb->s_op->thaw_super(sb);
277+
error = sb->s_op->thaw_super(sb, FREEZE_HOLDER_USERSPACE);
295278
else
296-
error = thaw_super(sb);
279+
error = thaw_super(sb, FREEZE_HOLDER_USERSPACE);
297280
if (error)
298281
bdev->bd_fsfreeze_count++;
299282
else
@@ -960,26 +943,38 @@ int lookup_bdev(const char *pathname, dev_t *dev)
960943
}
961944
EXPORT_SYMBOL(lookup_bdev);
962945

963-
int __invalidate_device(struct block_device *bdev, bool kill_dirty)
946+
/**
947+
* bdev_mark_dead - mark a block device as dead
948+
* @bdev: block device to operate on
949+
* @surprise: indicate a surprise removal
950+
*
951+
* Tell the file system that this devices or media is dead. If @surprise is set
952+
* to %true the device or media is already gone, if not we are preparing for an
953+
* orderly removal.
954+
*
955+
* This calls into the file system, which then typicall syncs out all dirty data
956+
* and writes back inodes and then invalidates any cached data in the inodes on
957+
* the file system. In addition we also invalidate the block device mapping.
958+
*/
959+
void bdev_mark_dead(struct block_device *bdev, bool surprise)
964960
{
965-
struct super_block *sb = get_super(bdev);
966-
int res = 0;
961+
mutex_lock(&bdev->bd_holder_lock);
962+
if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
963+
bdev->bd_holder_ops->mark_dead(bdev, surprise);
964+
else
965+
sync_blockdev(bdev);
966+
mutex_unlock(&bdev->bd_holder_lock);
967967

968-
if (sb) {
969-
/*
970-
* no need to lock the super, get_super holds the
971-
* read mutex so the filesystem cannot go away
972-
* under us (->put_super runs with the write lock
973-
* hold).
974-
*/
975-
shrink_dcache_sb(sb);
976-
res = invalidate_inodes(sb, kill_dirty);
977-
drop_super(sb);
978-
}
979968
invalidate_bdev(bdev);
980-
return res;
981969
}
982-
EXPORT_SYMBOL(__invalidate_device);
970+
#ifdef CONFIG_DASD_MODULE
971+
/*
972+
* Drivers should not use this directly, but the DASD driver has historically
973+
* had a shutdown to offline mode that doesn't actually remove the gendisk
974+
* that otherwise looks a lot like a safe device removal.
975+
*/
976+
EXPORT_SYMBOL_GPL(bdev_mark_dead);
977+
#endif
983978

984979
void sync_bdevs(bool wait)
985980
{

block/disk-events.c

Lines changed: 6 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -281,9 +281,7 @@ bool disk_check_media_change(struct gendisk *disk)
281281
if (!(events & DISK_EVENT_MEDIA_CHANGE))
282282
return false;
283283

284-
if (__invalidate_device(disk->part0, true))
285-
pr_warn("VFS: busy inodes on changed media %s\n",
286-
disk->disk_name);
284+
bdev_mark_dead(disk->part0, true);
287285
set_bit(GD_NEED_PART_SCAN, &disk->state);
288286
return true;
289287
}
@@ -294,25 +292,16 @@ EXPORT_SYMBOL(disk_check_media_change);
294292
* @disk: the disk which will raise the event
295293
* @events: the events to raise
296294
*
297-
* Generate uevents for the disk. If DISK_EVENT_MEDIA_CHANGE is present,
298-
* attempt to free all dentries and inodes and invalidates all block
295+
* Should be called when the media changes for @disk. Generates a uevent
296+
* and attempts to free all dentries and inodes and invalidates all block
299297
* device page cache entries in that case.
300-
*
301-
* Returns %true if DISK_EVENT_MEDIA_CHANGE was raised, or %false if not.
302298
*/
303-
bool disk_force_media_change(struct gendisk *disk, unsigned int events)
299+
void disk_force_media_change(struct gendisk *disk)
304300
{
305-
disk_event_uevent(disk, events);
306-
307-
if (!(events & DISK_EVENT_MEDIA_CHANGE))
308-
return false;
309-
301+
disk_event_uevent(disk, DISK_EVENT_MEDIA_CHANGE);
310302
inc_diskseq(disk);
311-
if (__invalidate_device(disk->part0, true))
312-
pr_warn("VFS: busy inodes on changed media %s\n",
313-
disk->disk_name);
303+
bdev_mark_dead(disk->part0, true);
314304
set_bit(GD_NEED_PART_SCAN, &disk->state);
315-
return true;
316305
}
317306
EXPORT_SYMBOL_GPL(disk_force_media_change);
318307

block/genhd.c

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -554,7 +554,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
554554
}
555555
EXPORT_SYMBOL(device_add_disk);
556556

557-
static void blk_report_disk_dead(struct gendisk *disk)
557+
static void blk_report_disk_dead(struct gendisk *disk, bool surprise)
558558
{
559559
struct block_device *bdev;
560560
unsigned long idx;
@@ -565,25 +565,15 @@ static void blk_report_disk_dead(struct gendisk *disk)
565565
continue;
566566
rcu_read_unlock();
567567

568-
mutex_lock(&bdev->bd_holder_lock);
569-
if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
570-
bdev->bd_holder_ops->mark_dead(bdev);
571-
mutex_unlock(&bdev->bd_holder_lock);
568+
bdev_mark_dead(bdev, surprise);
572569

573570
put_device(&bdev->bd_device);
574571
rcu_read_lock();
575572
}
576573
rcu_read_unlock();
577574
}
578575

579-
/**
580-
* blk_mark_disk_dead - mark a disk as dead
581-
* @disk: disk to mark as dead
582-
*
583-
* Mark as disk as dead (e.g. surprise removed) and don't accept any new I/O
584-
* to this disk.
585-
*/
586-
void blk_mark_disk_dead(struct gendisk *disk)
576+
static void __blk_mark_disk_dead(struct gendisk *disk)
587577
{
588578
/*
589579
* Fail any new I/O.
@@ -603,8 +593,19 @@ void blk_mark_disk_dead(struct gendisk *disk)
603593
* Prevent new I/O from crossing bio_queue_enter().
604594
*/
605595
blk_queue_start_drain(disk->queue);
596+
}
606597

607-
blk_report_disk_dead(disk);
598+
/**
599+
* blk_mark_disk_dead - mark a disk as dead
600+
* @disk: disk to mark as dead
601+
*
602+
* Mark as disk as dead (e.g. surprise removed) and don't accept any new I/O
603+
* to this disk.
604+
*/
605+
void blk_mark_disk_dead(struct gendisk *disk)
606+
{
607+
__blk_mark_disk_dead(disk);
608+
blk_report_disk_dead(disk, true);
608609
}
609610
EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
610611

@@ -641,18 +642,20 @@ void del_gendisk(struct gendisk *disk)
641642
disk_del_events(disk);
642643

643644
/*
644-
* Prevent new openers by unlinked the bdev inode, and write out
645-
* dirty data before marking the disk dead and stopping all I/O.
645+
* Prevent new openers by unlinked the bdev inode.
646646
*/
647647
mutex_lock(&disk->open_mutex);
648-
xa_for_each(&disk->part_tbl, idx, part) {
648+
xa_for_each(&disk->part_tbl, idx, part)
649649
remove_inode_hash(part->bd_inode);
650-
fsync_bdev(part);
651-
__invalidate_device(part, true);
652-
}
653650
mutex_unlock(&disk->open_mutex);
654651

655-
blk_mark_disk_dead(disk);
652+
/*
653+
* Tell the file system to write back all dirty data and shut down if
654+
* it hasn't been notified earlier.
655+
*/
656+
if (!test_bit(GD_DEAD, &disk->state))
657+
blk_report_disk_dead(disk, false);
658+
__blk_mark_disk_dead(disk);
656659

657660
/*
658661
* Drop all partitions now that the disk is marked dead.

block/ioctl.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,14 @@ static int blkdev_flushbuf(struct block_device *bdev, unsigned cmd,
364364
{
365365
if (!capable(CAP_SYS_ADMIN))
366366
return -EACCES;
367-
fsync_bdev(bdev);
367+
368+
mutex_lock(&bdev->bd_holder_lock);
369+
if (bdev->bd_holder_ops && bdev->bd_holder_ops->sync)
370+
bdev->bd_holder_ops->sync(bdev);
371+
else
372+
sync_blockdev(bdev);
373+
mutex_unlock(&bdev->bd_holder_lock);
374+
368375
invalidate_bdev(bdev);
369376
return 0;
370377
}

block/partitions/core.c

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -281,10 +281,7 @@ static void delete_partition(struct block_device *part)
281281
* looked up any more even when openers still hold references.
282282
*/
283283
remove_inode_hash(part->bd_inode);
284-
285-
fsync_bdev(part);
286-
__invalidate_device(part, true);
287-
284+
bdev_mark_dead(part, false);
288285
drop_partition(part);
289286
}
290287

drivers/block/amiflop.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1547,7 +1547,6 @@ static int fd_locked_ioctl(struct block_device *bdev, blk_mode_t mode,
15471547
rel_fdc();
15481548
return -EBUSY;
15491549
}
1550-
fsync_bdev(bdev);
15511550
if (fd_motor_on(drive) == 0) {
15521551
rel_fdc();
15531552
return -ENODEV;

drivers/block/floppy.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3255,7 +3255,7 @@ static int set_geometry(unsigned int cmd, struct floppy_struct *g,
32553255

32563256
if (!disk || ITYPE(drive_state[cnt].fd_device) != type)
32573257
continue;
3258-
__invalidate_device(disk->part0, true);
3258+
disk_force_media_change(disk);
32593259
}
32603260
mutex_unlock(&open_lock);
32613261
} else {

drivers/block/loop.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -603,7 +603,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
603603
goto out_err;
604604

605605
/* and ... switch */
606-
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
606+
disk_force_media_change(lo->lo_disk);
607607
blk_mq_freeze_queue(lo->lo_queue);
608608
mapping_set_gfp_mask(old_file->f_mapping, lo->old_gfp_mask);
609609
lo->lo_backing_file = file;
@@ -1067,7 +1067,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
10671067
/* suppress uevents while reconfiguring the device */
10681068
dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 1);
10691069

1070-
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
1070+
disk_force_media_change(lo->lo_disk);
10711071
set_disk_ro(lo->lo_disk, (lo->lo_flags & LO_FLAGS_READ_ONLY) != 0);
10721072

10731073
lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
@@ -1171,7 +1171,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
11711171
if (!release)
11721172
blk_mq_unfreeze_queue(lo->lo_queue);
11731173

1174-
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
1174+
disk_force_media_change(lo->lo_disk);
11751175

11761176
if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
11771177
int err;

drivers/block/nbd.c

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1434,12 +1434,10 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd)
14341434
return ret;
14351435
}
14361436

1437-
static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
1438-
struct block_device *bdev)
1437+
static void nbd_clear_sock_ioctl(struct nbd_device *nbd)
14391438
{
1439+
blk_mark_disk_dead(nbd->disk);
14401440
nbd_clear_sock(nbd);
1441-
__invalidate_device(bdev, true);
1442-
nbd_bdev_reset(nbd);
14431441
if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,
14441442
&nbd->config->runtime_flags))
14451443
nbd_config_put(nbd);
@@ -1465,7 +1463,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
14651463
case NBD_DISCONNECT:
14661464
return nbd_disconnect(nbd);
14671465
case NBD_CLEAR_SOCK:
1468-
nbd_clear_sock_ioctl(nbd, bdev);
1466+
nbd_clear_sock_ioctl(nbd);
14691467
return 0;
14701468
case NBD_SET_SOCK:
14711469
return nbd_add_socket(nbd, arg, false);

0 commit comments

Comments
 (0)