Skip to content

Commit ecd7db2

Browse files
committed
Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull libfs and tmpfs updates from Christian Brauner: "This cycle saw a lot of work for tmpfs that required changes to the vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this cycle. Things will go back to mm next cycle. Features ======== - By far the biggest work is the quota support for tmpfs. New tmpfs quota infrastructure is added to support it and a new QFMT_SHMEM uapi option is exposed. This offers user and group quotas to tmpfs (project quotas will be added later). Similar to other filesystems tmpfs quota are not supported within user namespaces yet. - Add support for user xattrs. While tmpfs already supports security xattrs (security.*) and POSIX ACLs for a long time it lacked support for user xattrs (user.*). With this pull request tmpfs will be able to support a limited number of user xattrs. This is accompanied by a fix (see below) to limit persistent simple xattr allocations. - Add support for stable directory offsets. Currently tmpfs relies on the libfs provided cursor-based mechanism for readdir. This causes issues when a tmpfs filesystem is exported via NFS. NFS clients do not open directories. Instead, each server-side readdir operation opens the directory, reads it, and then closes it. Since the cursor state for that directory is associated with the opened file it is discarded after each readdir operation. Such directory offsets are not just cached by NFS clients but also various userspace libraries based on these clients. As it stands there is no way to invalidate the caches when directory offsets have changed and the whole application depends on unchanging directory offsets. At LSFMM we discussed how to solve this problem and decided to support stable directory offsets. libfs now allows filesystems like tmpfs to use an xarrary to map a directory offset to a dentry. This mechanism is currently only used by tmpfs but can be supported by others as well. Fixes ===== - Change persistent simple xattrs allocations in libfs from GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory cgroup limits. Since this is a change to libfs it affects both tmpfs and kernfs. - Correctly verify {g,u}id mount options. A new filesystem context is created via fsopen() which records the namespace that becomes the owning namespace of the superblock when fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are mountable in namespaces. However, fsconfig() calls can occur in a namespace different from the namespace where fsopen() has been called. Currently, when fsconfig() is called to set {g,u}id mount options the requested {g,u}id is mapped into a k{g,u}id according to the namespace where fsconfig() was called from. The resulting k{g,u}id is not guaranteed to be resolvable in the namespace of the filesystem (the one that fsopen() was called in). This means it's possible for an unprivileged user to create files owned by any group in a tmpfs mount since it's possible to set the setid bits on the tmpfs directory. The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs. The new mount api's cross-namespace delegation abilities are already widely used. Having talked to a bunch of userspace this is the most faithful solution with minimal regression risks" * tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs mm: invalidation check mapping before folio_contains tmpfs: trivial support for direct IO tmpfs,xattr: enable limited user extended attributes tmpfs: track free_ispace instead of free_inodes xattr: simple_xattr_set() return old_xattr to be freed tmpfs: verify {g,u}id mount options correctly shmem: move spinlock into shmem_recalc_inode() to fix quota support libfs: Remove parent dentry locking in offset_iterate_dir() libfs: Add a lock class for the offset map's xa_lock shmem: stable directory offsets shmem: Refactor shmem_symlink() libfs: Add directory operations for stable offsets shmem: fix quota lock nesting in huge hole handling shmem: Add default quota limit mount options shmem: quota support shmem: prepare shmem quota infrastructure quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks shmem: make shmem_get_inode() return ERR_PTR instead of NULL shmem: make shmem_inode_acct_block() return error
2 parents 615e958 + 572a3d1 commit ecd7db2

File tree

19 files changed

+1412
-297
lines changed

19 files changed

+1412
-297
lines changed

Documentation/filesystems/locking.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,13 +85,14 @@ prototypes::
8585
struct dentry *dentry, struct fileattr *fa);
8686
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
8787
struct posix_acl * (*get_acl)(struct mnt_idmap *, struct dentry *, int);
88+
struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
8889

8990
locking rules:
9091
all may block
9192

92-
============== =============================================
93+
============== ==================================================
9394
ops i_rwsem(inode)
94-
============== =============================================
95+
============== ==================================================
9596
lookup: shared
9697
create: exclusive
9798
link: exclusive (both)
@@ -115,7 +116,8 @@ atomic_open: shared (exclusive if O_CREAT is set in open flags)
115116
tmpfile: no
116117
fileattr_get: no or exclusive
117118
fileattr_set: exclusive
118-
============== =============================================
119+
get_offset_ctx no
120+
============== ==================================================
119121

120122

121123
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem

Documentation/filesystems/tmpfs.rst

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ explained further below, some of which can be reconfigured dynamically on the
2121
fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
2222
filesystem can be resized but it cannot be resized to a size below its current
2323
usage. tmpfs also supports POSIX ACLs, and extended attributes for the
24-
trusted.* and security.* namespaces. ramfs does not use swap and you cannot
25-
modify any parameter for a ramfs filesystem. The size limit of a ramfs
24+
trusted.*, security.* and user.* namespaces. ramfs does not use swap and you
25+
cannot modify any parameter for a ramfs filesystem. The size limit of a ramfs
2626
filesystem is how much memory you have available, and so care must be taken if
2727
used so to not run out of memory.
2828

@@ -97,6 +97,9 @@ mount with such options, since it allows any user with write access to
9797
use up all the memory on the machine; but enhances the scalability of
9898
that instance in a system with many CPUs making intensive use of it.
9999

100+
If nr_inodes is not 0, that limited space for inodes is also used up by
101+
extended attributes: "df -i"'s IUsed and IUse% increase, IFree decreases.
102+
100103
tmpfs blocks may be swapped out, when there is a shortage of memory.
101104
tmpfs has a mount option to disable its use of swap:
102105

@@ -123,6 +126,37 @@ sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled: which can
123126
be used to deny huge pages on all tmpfs mounts in an emergency, or to
124127
force huge pages on all tmpfs mounts for testing.
125128

129+
tmpfs also supports quota with the following mount options
130+
131+
======================== =================================================
132+
quota User and group quota accounting and enforcement
133+
is enabled on the mount. Tmpfs is using hidden
134+
system quota files that are initialized on mount.
135+
usrquota User quota accounting and enforcement is enabled
136+
on the mount.
137+
grpquota Group quota accounting and enforcement is enabled
138+
on the mount.
139+
usrquota_block_hardlimit Set global user quota block hard limit.
140+
usrquota_inode_hardlimit Set global user quota inode hard limit.
141+
grpquota_block_hardlimit Set global group quota block hard limit.
142+
grpquota_inode_hardlimit Set global group quota inode hard limit.
143+
======================== =================================================
144+
145+
None of the quota related mount options can be set or changed on remount.
146+
147+
Quota limit parameters accept a suffix k, m or g for kilo, mega and giga
148+
and can't be changed on remount. Default global quota limits are taking
149+
effect for any and all user/group/project except root the first time the
150+
quota entry for user/group/project id is being accessed - typically the
151+
first time an inode with a particular id ownership is being created after
152+
the mount. In other words, instead of the limits being initialized to zero,
153+
they are initialized with the particular value provided with these mount
154+
options. The limits can be changed for any user/group id at any time as they
155+
normally can be.
156+
157+
Note that tmpfs quotas do not support user namespaces so no uid/gid
158+
translation is done if quotas are enabled inside user namespaces.
159+
126160
tmpfs has a mount option to set the NUMA memory allocation policy for
127161
all files in that instance (if CONFIG_NUMA is enabled) - which can be
128162
adjusted on the fly via 'mount -o remount ...'

Documentation/filesystems/vfs.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -515,6 +515,7 @@ As of kernel 2.6.22, the following members are defined:
515515
int (*fileattr_set)(struct mnt_idmap *idmap,
516516
struct dentry *dentry, struct fileattr *fa);
517517
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
518+
struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
518519
};
519520
520521
Again, all methods are called without any locks being held, unless
@@ -675,7 +676,10 @@ otherwise noted.
675676
called on ioctl(FS_IOC_SETFLAGS) and ioctl(FS_IOC_FSSETXATTR) to
676677
change miscellaneous file flags and attributes. Callers hold
677678
i_rwsem exclusive. If unset, then fall back to f_op->ioctl().
678-
679+
``get_offset_ctx``
680+
called to get the offset context for a directory inode. A
681+
filesystem must define this operation to use
682+
simple_offset_dir_operations.
679683

680684
The Address Space Object
681685
========================

fs/Kconfig

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -205,8 +205,8 @@ config TMPFS_XATTR
205205
Extended attributes are name:value pairs associated with inodes by
206206
the kernel or by users (see the attr(5) manual page for details).
207207

208-
Currently this enables support for the trusted.* and
209-
security.* namespaces.
208+
This enables support for the trusted.*, security.* and user.*
209+
namespaces.
210210

211211
You need this for POSIX ACL support on tmpfs.
212212

@@ -233,6 +233,18 @@ config TMPFS_INODE64
233233

234234
If unsure, say N.
235235

236+
config TMPFS_QUOTA
237+
bool "Tmpfs quota support"
238+
depends on TMPFS
239+
select QUOTA
240+
help
241+
Quota support allows to set per user and group limits for tmpfs
242+
usage. Say Y to enable quota support. Once enabled you can control
243+
user and group quota enforcement with quota, usrquota and grpquota
244+
mount options.
245+
246+
If unsure, say N.
247+
236248
config ARCH_SUPPORTS_HUGETLBFS
237249
def_bool n
238250

fs/kernfs/dir.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -556,7 +556,7 @@ void kernfs_put(struct kernfs_node *kn)
556556
kfree_const(kn->name);
557557

558558
if (kn->iattr) {
559-
simple_xattrs_free(&kn->iattr->xattrs);
559+
simple_xattrs_free(&kn->iattr->xattrs, NULL);
560560
kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
561561
}
562562
spin_lock(&kernfs_idr_lock);

fs/kernfs/inode.c

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -305,11 +305,17 @@ int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
305305
int kernfs_xattr_set(struct kernfs_node *kn, const char *name,
306306
const void *value, size_t size, int flags)
307307
{
308+
struct simple_xattr *old_xattr;
308309
struct kernfs_iattrs *attrs = kernfs_iattrs(kn);
309310
if (!attrs)
310311
return -ENOMEM;
311312

312-
return simple_xattr_set(&attrs->xattrs, name, value, size, flags, NULL);
313+
old_xattr = simple_xattr_set(&attrs->xattrs, name, value, size, flags);
314+
if (IS_ERR(old_xattr))
315+
return PTR_ERR(old_xattr);
316+
317+
simple_xattr_free(old_xattr);
318+
return 0;
313319
}
314320

315321
static int kernfs_vfs_xattr_get(const struct xattr_handler *handler,
@@ -341,7 +347,7 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
341347
{
342348
atomic_t *sz = &kn->iattr->user_xattr_size;
343349
atomic_t *nr = &kn->iattr->nr_user_xattrs;
344-
ssize_t removed_size;
350+
struct simple_xattr *old_xattr;
345351
int ret;
346352

347353
if (atomic_inc_return(nr) > KERNFS_MAX_USER_XATTRS) {
@@ -354,13 +360,18 @@ static int kernfs_vfs_user_xattr_add(struct kernfs_node *kn,
354360
goto dec_size_out;
355361
}
356362

357-
ret = simple_xattr_set(xattrs, full_name, value, size, flags,
358-
&removed_size);
359-
360-
if (!ret && removed_size >= 0)
361-
size = removed_size;
362-
else if (!ret)
363+
old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
364+
if (!old_xattr)
363365
return 0;
366+
367+
if (IS_ERR(old_xattr)) {
368+
ret = PTR_ERR(old_xattr);
369+
goto dec_size_out;
370+
}
371+
372+
ret = 0;
373+
size = old_xattr->size;
374+
simple_xattr_free(old_xattr);
364375
dec_size_out:
365376
atomic_sub(size, sz);
366377
dec_count_out:
@@ -375,18 +386,19 @@ static int kernfs_vfs_user_xattr_rm(struct kernfs_node *kn,
375386
{
376387
atomic_t *sz = &kn->iattr->user_xattr_size;
377388
atomic_t *nr = &kn->iattr->nr_user_xattrs;
378-
ssize_t removed_size;
379-
int ret;
389+
struct simple_xattr *old_xattr;
380390

381-
ret = simple_xattr_set(xattrs, full_name, value, size, flags,
382-
&removed_size);
391+
old_xattr = simple_xattr_set(xattrs, full_name, value, size, flags);
392+
if (!old_xattr)
393+
return 0;
383394

384-
if (removed_size >= 0) {
385-
atomic_sub(removed_size, sz);
386-
atomic_dec(nr);
387-
}
395+
if (IS_ERR(old_xattr))
396+
return PTR_ERR(old_xattr);
388397

389-
return ret;
398+
atomic_sub(old_xattr->size, sz);
399+
atomic_dec(nr);
400+
simple_xattr_free(old_xattr);
401+
return 0;
390402
}
391403

392404
static int kernfs_vfs_user_xattr_set(const struct xattr_handler *handler,

0 commit comments

Comments
 (0)