Skip to content

Commit 426b4ca

Browse files
committed
Merge tag 'fs.setgid.v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull setgid updates from Christian Brauner: "This contains the work to move setgid stripping out of individual filesystems and into the VFS itself. Creating files that have both the S_IXGRP and S_ISGID bit raised in directories that themselves have the S_ISGID bit set requires additional privileges to avoid security issues. When a filesystem creates a new inode it needs to take care that the caller is either in the group of the newly created inode or they have CAP_FSETID in their current user namespace and are privileged over the parent directory of the new inode. If any of these two conditions is true then the S_ISGID bit can be raised for an S_IXGRP file and if not it needs to be stripped. However, there are several key issues with the current implementation: - S_ISGID stripping logic is entangled with umask stripping. For example, if the umask removes the S_IXGRP bit from the file about to be created then the S_ISGID bit will be kept. The inode_init_owner() helper is responsible for S_ISGID stripping and is called before posix_acl_create(). So we can end up with two different orderings: 1. FS without POSIX ACL support First strip umask then strip S_ISGID in inode_init_owner(). In other words, if a filesystem doesn't support or enable POSIX ACLs then umask stripping is done directly in the vfs before calling into the filesystem: 2. FS with POSIX ACL support First strip S_ISGID in inode_init_owner() then strip umask in posix_acl_create(). In other words, if the filesystem does support POSIX ACLs then unmask stripping may be done in the filesystem itself when calling posix_acl_create(). Note that technically filesystems are free to impose their own ordering between posix_acl_create() and inode_init_owner() meaning that there's additional ordering issues that influence S_ISGID inheritance. (Note that the commit message of commit 1639a49 ("fs: move S_ISGID stripping into the vfs_*() helpers") gets the ordering between inode_init_owner() and posix_acl_create() the wrong way around. I realized this too late.) - Filesystems that don't rely on inode_init_owner() don't get S_ISGID stripping logic. While that may be intentional (e.g. network filesystems might just defer setgid stripping to a server) it is often just a security issue. Note that mandating the use of inode_init_owner() was proposed as an alternative solution but that wouldn't fix the ordering issues and there are examples such as afs where the use of inode_init_owner() isn't possible. In any case, we should also try the cleaner and generalized solution first before resorting to this approach. - We still have S_ISGID inheritance bugs years after the initial round of S_ISGID inheritance fixes: e014f37 ("xfs: use setattr_copy to set vfs inode attributes") 01ea173 ("xfs: fix up non-directory creation in SGID directories") fd84bfd ("ceph: fix up non-directory creation in SGID directories") All of this led us to conclude that the current state is too messy. While we won't be able to make it completely clean as posix_acl_create() is still a filesystem specific call we can improve the S_SIGD stripping situation quite a bit by hoisting it out of inode_init_owner() and into the respective vfs creation operations. The obvious advantage is that we don't need to rely on individual filesystems getting S_ISGID stripping right and instead can standardize the ordering between S_ISGID and umask stripping directly in the VFS. A few short implementation notes: - The stripping logic needs to happen in vfs_*() helpers for the sake of stacking filesystems such as overlayfs that rely on these helpers taking care of S_ISGID stripping. - Security hooks have never seen the mode as it is ultimately seen by the filesystem because of the ordering issue we mentioned. Nothing is changed for them. We simply continue to strip the umask before passing the mode down to the security hooks. - The following filesystems use inode_init_owner() and thus relied on S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus, hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs, reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs. We've audited all callchains as best as we could. More details can be found in the commit message to 1639a49 ("fs: move S_ISGID stripping into the vfs_*() helpers")" * tag 'fs.setgid.v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: ceph: rely on vfs for setgid stripping fs: move S_ISGID stripping into the vfs_*() helpers fs: Add missing umask strip in vfs_tmpfile fs: add mode_strip_sgid() helper
2 parents b8dcef8 + 5fadbd9 commit 426b4ca

File tree

5 files changed

+102
-19
lines changed

5 files changed

+102
-19
lines changed

fs/ceph/file.c

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -656,10 +656,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
656656
/* Directories always inherit the setgid bit. */
657657
if (S_ISDIR(mode))
658658
mode |= S_ISGID;
659-
else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
660-
!in_group_p(dir->i_gid) &&
661-
!capable_wrt_inode_uidgid(&init_user_ns, dir, CAP_FSETID))
662-
mode &= ~S_ISGID;
663659
} else {
664660
in.gid = cpu_to_le32(from_kgid(&init_user_ns, current_fsgid()));
665661
}

fs/inode.c

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2326,10 +2326,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
23262326
/* Directories are special, and always inherit S_ISGID */
23272327
if (S_ISDIR(mode))
23282328
mode |= S_ISGID;
2329-
else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
2330-
!in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
2331-
!capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
2332-
mode &= ~S_ISGID;
23332329
} else
23342330
inode_fsgid_set(inode, mnt_userns);
23352331
inode->i_mode = mode;
@@ -2485,3 +2481,33 @@ struct timespec64 current_time(struct inode *inode)
24852481
return timestamp_truncate(now, inode);
24862482
}
24872483
EXPORT_SYMBOL(current_time);
2484+
2485+
/**
2486+
* mode_strip_sgid - handle the sgid bit for non-directories
2487+
* @mnt_userns: User namespace of the mount the inode was created from
2488+
* @dir: parent directory inode
2489+
* @mode: mode of the file to be created in @dir
2490+
*
2491+
* If the @mode of the new file has both the S_ISGID and S_IXGRP bit
2492+
* raised and @dir has the S_ISGID bit raised ensure that the caller is
2493+
* either in the group of the parent directory or they have CAP_FSETID
2494+
* in their user namespace and are privileged over the parent directory.
2495+
* In all other cases, strip the S_ISGID bit from @mode.
2496+
*
2497+
* Return: the new mode to use for the file
2498+
*/
2499+
umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
2500+
const struct inode *dir, umode_t mode)
2501+
{
2502+
if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
2503+
return mode;
2504+
if (S_ISDIR(mode) || !dir || !(dir->i_mode & S_ISGID))
2505+
return mode;
2506+
if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
2507+
return mode;
2508+
if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
2509+
return mode;
2510+
2511+
return mode & ~S_ISGID;
2512+
}
2513+
EXPORT_SYMBOL(mode_strip_sgid);

fs/namei.c

Lines changed: 69 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3023,6 +3023,65 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
30233023
}
30243024
EXPORT_SYMBOL(unlock_rename);
30253025

3026+
/**
3027+
* mode_strip_umask - handle vfs umask stripping
3028+
* @dir: parent directory of the new inode
3029+
* @mode: mode of the new inode to be created in @dir
3030+
*
3031+
* Umask stripping depends on whether or not the filesystem supports POSIX
3032+
* ACLs. If the filesystem doesn't support it umask stripping is done directly
3033+
* in here. If the filesystem does support POSIX ACLs umask stripping is
3034+
* deferred until the filesystem calls posix_acl_create().
3035+
*
3036+
* Returns: mode
3037+
*/
3038+
static inline umode_t mode_strip_umask(const struct inode *dir, umode_t mode)
3039+
{
3040+
if (!IS_POSIXACL(dir))
3041+
mode &= ~current_umask();
3042+
return mode;
3043+
}
3044+
3045+
/**
3046+
* vfs_prepare_mode - prepare the mode to be used for a new inode
3047+
* @mnt_userns: user namespace of the mount the inode was found from
3048+
* @dir: parent directory of the new inode
3049+
* @mode: mode of the new inode
3050+
* @mask_perms: allowed permission by the vfs
3051+
* @type: type of file to be created
3052+
*
3053+
* This helper consolidates and enforces vfs restrictions on the @mode of a new
3054+
* object to be created.
3055+
*
3056+
* Umask stripping depends on whether the filesystem supports POSIX ACLs (see
3057+
* the kernel documentation for mode_strip_umask()). Moving umask stripping
3058+
* after setgid stripping allows the same ordering for both non-POSIX ACL and
3059+
* POSIX ACL supporting filesystems.
3060+
*
3061+
* Note that it's currently valid for @type to be 0 if a directory is created.
3062+
* Filesystems raise that flag individually and we need to check whether each
3063+
* filesystem can deal with receiving S_IFDIR from the vfs before we enforce a
3064+
* non-zero type.
3065+
*
3066+
* Returns: mode to be passed to the filesystem
3067+
*/
3068+
static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns,
3069+
const struct inode *dir, umode_t mode,
3070+
umode_t mask_perms, umode_t type)
3071+
{
3072+
mode = mode_strip_sgid(mnt_userns, dir, mode);
3073+
mode = mode_strip_umask(dir, mode);
3074+
3075+
/*
3076+
* Apply the vfs mandated allowed permission mask and set the type of
3077+
* file to be created before we call into the filesystem.
3078+
*/
3079+
mode &= (mask_perms & ~S_IFMT);
3080+
mode |= (type & S_IFMT);
3081+
3082+
return mode;
3083+
}
3084+
30263085
/**
30273086
* vfs_create - create new file
30283087
* @mnt_userns: user namespace of the mount the inode was found from
@@ -3048,8 +3107,8 @@ int vfs_create(struct user_namespace *mnt_userns, struct inode *dir,
30483107

30493108
if (!dir->i_op->create)
30503109
return -EACCES; /* shouldn't it be ENOSYS? */
3051-
mode &= S_IALLUGO;
3052-
mode |= S_IFREG;
3110+
3111+
mode = vfs_prepare_mode(mnt_userns, dir, mode, S_IALLUGO, S_IFREG);
30533112
error = security_inode_create(dir, dentry, mode);
30543113
if (error)
30553114
return error;
@@ -3312,8 +3371,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
33123371
if (open_flag & O_CREAT) {
33133372
if (open_flag & O_EXCL)
33143373
open_flag &= ~O_TRUNC;
3315-
if (!IS_POSIXACL(dir->d_inode))
3316-
mode &= ~current_umask();
3374+
mode = vfs_prepare_mode(mnt_userns, dir->d_inode, mode, mode, mode);
33173375
if (likely(got_write))
33183376
create_error = may_o_create(mnt_userns, &nd->path,
33193377
dentry, mode);
@@ -3544,6 +3602,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
35443602
child = d_alloc(dentry, &slash_name);
35453603
if (unlikely(!child))
35463604
goto out_err;
3605+
mode = vfs_prepare_mode(mnt_userns, dir, mode, mode, mode);
35473606
error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
35483607
if (error)
35493608
goto out_err;
@@ -3821,6 +3880,7 @@ int vfs_mknod(struct user_namespace *mnt_userns, struct inode *dir,
38213880
if (!dir->i_op->mknod)
38223881
return -EPERM;
38233882

3883+
mode = vfs_prepare_mode(mnt_userns, dir, mode, mode, mode);
38243884
error = devcgroup_inode_mknod(mode, dev);
38253885
if (error)
38263886
return error;
@@ -3871,9 +3931,8 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
38713931
if (IS_ERR(dentry))
38723932
goto out1;
38733933

3874-
if (!IS_POSIXACL(path.dentry->d_inode))
3875-
mode &= ~current_umask();
3876-
error = security_path_mknod(&path, dentry, mode, dev);
3934+
error = security_path_mknod(&path, dentry,
3935+
mode_strip_umask(path.dentry->d_inode, mode), dev);
38773936
if (error)
38783937
goto out2;
38793938

@@ -3943,7 +4002,7 @@ int vfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
39434002
if (!dir->i_op->mkdir)
39444003
return -EPERM;
39454004

3946-
mode &= (S_IRWXUGO|S_ISVTX);
4005+
mode = vfs_prepare_mode(mnt_userns, dir, mode, S_IRWXUGO | S_ISVTX, 0);
39474006
error = security_inode_mkdir(dir, dentry, mode);
39484007
if (error)
39494008
return error;
@@ -3971,9 +4030,8 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
39714030
if (IS_ERR(dentry))
39724031
goto out_putname;
39734032

3974-
if (!IS_POSIXACL(path.dentry->d_inode))
3975-
mode &= ~current_umask();
3976-
error = security_path_mkdir(&path, dentry, mode);
4033+
error = security_path_mkdir(&path, dentry,
4034+
mode_strip_umask(path.dentry->d_inode, mode));
39774035
if (!error) {
39784036
struct user_namespace *mnt_userns;
39794037
mnt_userns = mnt_user_ns(path.mnt);

fs/ocfs2/namei.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
197197
* callers. */
198198
if (S_ISDIR(mode))
199199
set_nlink(inode, 2);
200+
mode = mode_strip_sgid(&init_user_ns, dir, mode);
200201
inode_init_owner(&init_user_ns, inode, dir, mode);
201202
status = dquot_initialize(inode);
202203
if (status)

include/linux/fs.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2035,6 +2035,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
20352035
void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
20362036
const struct inode *dir, umode_t mode);
20372037
extern bool may_open_dev(const struct path *path);
2038+
umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
2039+
const struct inode *dir, umode_t mode);
20382040

20392041
/*
20402042
* This is the "filldir" function type, used by readdir() to let

0 commit comments

Comments
 (0)