Skip to content

Commit 22ed7ec

Browse files
committed
fs: add FSCONFIG_CMD_CREATE_EXCL
Summary ======= This introduces FSCONFIG_CMD_CREATE_EXCL which will allows userspace to implement something like mount -t ext4 --exclusive /dev/sda /B which fails if a superblock for the requested filesystem does already exist: Before this patch ----------------- $ sudo ./move-mount -f xfs -o source=/dev/sda4 /A Requesting filesystem type xfs Mount options requested: source=/dev/sda4 Attaching mount at /A Moving single attached mount Setting key(source) with val(/dev/sda4) $ sudo ./move-mount -f xfs -o source=/dev/sda4 /B Requesting filesystem type xfs Mount options requested: source=/dev/sda4 Attaching mount at /B Moving single attached mount Setting key(source) with val(/dev/sda4) After this patch with --exclusive as a switch for FSCONFIG_CMD_CREATE_EXCL -------------------------------------------------------------------------- $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /A Requesting filesystem type xfs Request exclusive superblock creation Mount options requested: source=/dev/sda4 Attaching mount at /A Moving single attached mount Setting key(source) with val(/dev/sda4) $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /B Requesting filesystem type xfs Request exclusive superblock creation Mount options requested: source=/dev/sda4 Attaching mount at /B Moving single attached mount Setting key(source) with val(/dev/sda4) Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed Details ======= As mentioned on the list (cf. [1]-[3]) mount requests like mount -t ext4 /dev/sda /A are ambigous for userspace. Either a new superblock has been created and mounted or an existing superblock has been reused and a bind-mount has been created. This becomes clear in the following example where two processes create the same mount for the same block device: P1 P2 fd_fs = fsopen("ext4"); fd_fs = fsopen("ext4"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "dax", "always"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "resuid", "1000"); // wins and creates superblock fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) // finds compatible superblock of P1 // spins until P1 sets SB_BORN and grabs a reference fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) fd_mnt1 = fsmount(fd_fs); fd_mnt2 = fsmount(fd_fs); move_mount(fd_mnt1, "/A") move_mount(fd_mnt2, "/B") Not just does P2 get a bind-mount but the mount options that P2 requestes are silently ignored. The VFS itself doesn't, can't and shouldn't enforce filesystem specific mount option compatibility. It only enforces incompatibility for read-only <-> read-write transitions: mount -t ext4 /dev/sda /A mount -t ext4 -o ro /dev/sda /B The read-only request will fail with EBUSY as the VFS can't just silently transition a superblock from read-write to read-only or vica versa without risking security issues. To userspace this silent superblock reuse can become a security issue in because there is currently no straightforward way for userspace to know that they did indeed manage to create a new superblock and didn't just reuse an existing one. This adds a new FSCONFIG_CMD_CREATE_EXCL command to fsconfig() that returns EBUSY if an existing superblock would be reused. Userspace that needs to be sure that it did create a new superblock with the requested mount options can request superblock creation using this command. If the command succeeds they can be sure that they did create a new superblock with the requested mount options. This requires the new mount api. With the old mount api it would be necessary to plumb this through every legacy filesystem's file_system_type->mount() method. If they want this feature they are most welcome to switch to the new mount api. Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on each high-level superblock creation helper: (1) get_tree_nodev() Always allocate new superblock. Hence, FSCONFIG_CMD_CREATE and FSCONFIG_CMD_CREATE_EXCL are equivalent. The binderfs or overlayfs filesystems are examples. (4) get_tree_keyed() Finds an existing superblock based on sb->s_fs_info. Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY. The mqueue or nfsd filesystems are examples. (2) get_tree_bdev() This effectively works like get_tree_keyed(). The ext4 or xfs filesystems are examples. (3) get_tree_single() Only one superblock of this filesystem type can ever exist. Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY. The securityfs or configfs filesystems are examples. Note that some single-instance filesystems never destroy the superblock once it has been created during the first mount. For example, if securityfs has been mounted at least onces then the created superblock will never be destroyed again as long as there is still an LSM making use it. Consequently, even if securityfs is unmounted and the superblock seemingly destroyed it really isn't which means that FSCONFIG_CMD_CREATE_EXCL will continue rejecting reusing an existing superblock. This is acceptable thugh since special purpose filesystems such as this shouldn't have a need to use FSCONFIG_CMD_CREATE_EXCL anyway and if they do it's probably to make sure that mount options aren't ignored. Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on filesystems that make use of the low-level sget_fc() helper directly. They're all effectively variants on get_tree_keyed(), get_tree_bdev(), or get_tree_nodev(): (5) mtd_get_sb() Similar logic to get_tree_keyed(). (6) afs_get_tree() Similar logic to get_tree_keyed(). (7) ceph_get_tree() Similar logic to get_tree_keyed(). Already explicitly allows forcing the allocation of a new superblock via CEPH_OPT_NOSHARE. This turns it into get_tree_nodev(). (8) fuse_get_tree_submount() Similar logic to get_tree_nodev(). (9) fuse_get_tree() Forces reuse of existing FUSE superblock. Forces reuse of existing superblock if passed in file refers to an existing FUSE connection. If FSCONFIG_CMD_CREATE_EXCL is specified together with an fd referring to an existing FUSE connections this would cause the superblock reusal to fail. If reusing is the intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified. (10) fuse_get_tree() -> get_tree_nodev() Same logic as in get_tree_nodev(). (11) fuse_get_tree() -> get_tree_bdev() Same logic as in get_tree_bdev(). (12) virtio_fs_get_tree() Same logic as get_tree_keyed(). (13) gfs2_meta_get_tree() Forces reuse of existing gfs2 superblock. Mounting gfs2meta enforces that a gf2s superblock must already exist. If not, it will error out. Consequently, mounting gfs2meta with FSCONFIG_CMD_CREATE_EXCL would always fail. If reusing is the intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified. (14) kernfs_get_tree() Similar logic to get_tree_keyed(). (15) nfs_get_tree_common() Similar logic to get_tree_keyed(). Already explicitly allows forcing the allocation of a new superblock via NFS_MOUNT_UNSHARED. This effectively turns it into get_tree_nodev(). Link: [1] https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner Link: [2] https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner Link: [3] https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Aleksa Sarai <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
1 parent 11a51d8 commit 22ed7ec

File tree

5 files changed

+41
-12
lines changed

5 files changed

+41
-12
lines changed

fs/fs_context.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,7 @@ void vfs_clean_context(struct fs_context *fc)
692692
security_free_mnt_opts(&fc->security);
693693
kfree(fc->source);
694694
fc->source = NULL;
695+
fc->exclusive = false;
695696

696697
fc->purpose = FS_CONTEXT_FOR_RECONFIGURE;
697698
fc->phase = FS_CONTEXT_AWAITING_RECONF;

fs/fsopen.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ SYSCALL_DEFINE3(fspick, int, dfd, const char __user *, path, unsigned int, flags
209209
return ret;
210210
}
211211

212-
static int vfs_cmd_create(struct fs_context *fc)
212+
static int vfs_cmd_create(struct fs_context *fc, bool exclusive)
213213
{
214214
struct super_block *sb;
215215
int ret;
@@ -220,7 +220,12 @@ static int vfs_cmd_create(struct fs_context *fc)
220220
if (!mount_capable(fc))
221221
return -EPERM;
222222

223+
/* require the new mount api */
224+
if (exclusive && fc->ops == &legacy_fs_context_ops)
225+
return -EOPNOTSUPP;
226+
223227
fc->phase = FS_CONTEXT_CREATING;
228+
fc->exclusive = exclusive;
224229

225230
ret = vfs_get_tree(fc);
226231
if (ret) {
@@ -284,7 +289,9 @@ static int vfs_fsconfig_locked(struct fs_context *fc, int cmd,
284289
return ret;
285290
switch (cmd) {
286291
case FSCONFIG_CMD_CREATE:
287-
return vfs_cmd_create(fc);
292+
return vfs_cmd_create(fc, false);
293+
case FSCONFIG_CMD_CREATE_EXCL:
294+
return vfs_cmd_create(fc, true);
288295
case FSCONFIG_CMD_RECONFIGURE:
289296
return vfs_cmd_reconfigure(fc);
290297
default:
@@ -381,6 +388,7 @@ SYSCALL_DEFINE5(fsconfig,
381388
return -EINVAL;
382389
break;
383390
case FSCONFIG_CMD_CREATE:
391+
case FSCONFIG_CMD_CREATE_EXCL:
384392
case FSCONFIG_CMD_RECONFIGURE:
385393
if (_key || _value || aux)
386394
return -EINVAL;

fs/super.c

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -546,17 +546,31 @@ bool mount_capable(struct fs_context *fc)
546546
* @test: Comparison callback
547547
* @set: Setup callback
548548
*
549-
* Find or create a superblock using the parameters stored in the filesystem
550-
* context and the two callback functions.
549+
* Create a new superblock or find an existing one.
551550
*
552-
* If an extant superblock is matched, then that will be returned with an
553-
* elevated reference count that the caller must transfer or discard.
551+
* The @test callback is used to find a matching existing superblock.
552+
* Whether or not the requested parameters in @fc are taken into account
553+
* is specific to the @test callback that is used. They may even be
554+
* completely ignored.
555+
*
556+
* If an extant superblock is matched, it will be returned unless:
557+
*
558+
* (1) the namespace the filesystem context @fc and the extant
559+
* superblock's namespace differ
560+
*
561+
* (2) the filesystem context @fc has requested that reusing an extant
562+
* superblock is not allowed
563+
*
564+
* In both cases EBUSY will be returned.
554565
*
555566
* If no match is made, a new superblock will be allocated and basic
556-
* initialisation will be performed (s_type, s_fs_info and s_id will be set and
557-
* the set() callback will be invoked), the superblock will be published and it
558-
* will be returned in a partially constructed state with SB_BORN and SB_ACTIVE
559-
* as yet unset.
567+
* initialisation will be performed (s_type, s_fs_info and s_id will be
568+
* set and the @set callback will be invoked), the superblock will be
569+
* published and it will be returned in a partially constructed state
570+
* with SB_BORN and SB_ACTIVE as yet unset.
571+
*
572+
* Return: On success, an extant or newly created superblock is
573+
* returned. On failure an error pointer is returned.
560574
*/
561575
struct super_block *sget_fc(struct fs_context *fc,
562576
int (*test)(struct super_block *, struct fs_context *),
@@ -603,9 +617,13 @@ struct super_block *sget_fc(struct fs_context *fc,
603617
return s;
604618

605619
share_extant_sb:
606-
if (user_ns != old->s_user_ns) {
620+
if (user_ns != old->s_user_ns || fc->exclusive) {
607621
spin_unlock(&sb_lock);
608622
destroy_unused_super(s);
623+
if (fc->exclusive)
624+
warnfc(fc, "reusing existing filesystem not allowed");
625+
else
626+
warnfc(fc, "reusing existing filesystem in another namespace not allowed");
609627
return ERR_PTR(-EBUSY);
610628
}
611629
if (!grab_super(old))

include/linux/fs_context.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ struct fs_context {
109109
bool need_free:1; /* Need to call ops->free() */
110110
bool global:1; /* Goes into &init_user_ns */
111111
bool oldapi:1; /* Coming from mount(2) */
112+
bool exclusive:1; /* create new superblock, reject existing one */
112113
};
113114

114115
struct fs_context_operations {

include/uapi/linux/mount.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,9 @@ enum fsconfig_command {
100100
FSCONFIG_SET_PATH = 3, /* Set parameter, supplying an object by path */
101101
FSCONFIG_SET_PATH_EMPTY = 4, /* Set parameter, supplying an object by (empty) path */
102102
FSCONFIG_SET_FD = 5, /* Set parameter, supplying an object by fd */
103-
FSCONFIG_CMD_CREATE = 6, /* Invoke superblock creation */
103+
FSCONFIG_CMD_CREATE = 6, /* Create new or reuse existing superblock */
104104
FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */
105+
FSCONFIG_CMD_CREATE_EXCL = 8, /* Create new superblock, fail if reusing existing superblock */
105106
};
106107

107108
/*

0 commit comments

Comments
 (0)