Skip to content

Commit de16588

Browse files
committed
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems. Features: - Block mode changes on symlinks and rectify our broken semantics - Report file modifications via fsnotify() for splice - Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up - Use synchronous fput for the close system call Cleanups: - Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper - Simplify epoll allocation helper - Convert simple_write_begin and simple_write_end to use a folio - Convert page_cache_pipe_buf_confirm() to use a folio - Simplify __range_close to avoid pointless locking - Disable per-cpu buffer head cache for isolated cpus - Port ecryptfs to kmap_local_page() api - Remove redundant initialization of pointer buf in pipe code - Unexport the d_genocide() function which is only used within core vfs - Replace printk(KERN_ERR) and WARN_ON() with WARN() Fixes: - Fix various kernel-doc issues - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE - Fix a mainly theoretical issue in devpts - Check the return value of __getblk() in reiserfs - Fix a racy assert in i_readcount_dec - Fix integer conversion issues in various functions - Fix LSM security context handling during automounts that prevented NFS superblock sharing" * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ...
2 parents ecd7db2 + e6fa4c7 commit de16588

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+458
-243
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5522,6 +5522,10 @@
55225522
Useful for devices that are detected asynchronously
55235523
(e.g. USB and MMC devices).
55245524

5525+
rootwait= [KNL] Maximum time (in seconds) to wait for root device
5526+
to show up before attempting to mount the root
5527+
filesystem.
5528+
55255529
rproc_mem=nn[KMG][@address]
55265530
[KNL,ARM,CMA] Remoteproc physical memory block.
55275531
Memory area to be used by remote processor image,

Documentation/filesystems/idmappings.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,9 +146,10 @@ For the rest of this document we will prefix all userspace ids with ``u`` and
146146
all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
147147
an idmapping will be written as ``u0:k10000:r10000``.
148148

149-
For example, the id ``u1000`` is an id in the upper idmapset or "userspace
150-
idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a
151-
kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``.
149+
For example, within this idmapping, the id ``u1000`` is an id in the upper
150+
idmapset or "userspace idmapset" starting with ``u0``. And it is mapped to
151+
``k11000`` which is a kernel id in the lower idmapset or "kernel idmapset"
152+
starting with ``k10000``.
152153

153154
A kernel id is always created by an idmapping. Such idmappings are associated
154155
with user namespaces. Since we mainly care about how idmappings work we're not
@@ -373,6 +374,13 @@ kernel maps the caller's userspace id down into a kernel id according to the
373374
caller's idmapping and then maps that kernel id up according to the
374375
filesystem's idmapping.
375376

377+
From the implementation point it's worth mentioning how idmappings are represented.
378+
All idmappings are taken from the corresponding user namespace.
379+
380+
- caller's idmapping (usually taken from ``current_user_ns()``)
381+
- filesystem's idmapping (``sb->s_user_ns``)
382+
- mount's idmapping (``mnt_idmap(vfsmnt)``)
383+
376384
Let's see some examples with caller/filesystem idmapping but without mount
377385
idmappings. This will exhibit some problems we can hit. After that we will
378386
revisit/reconsider these examples, this time using mount idmappings, to see how

fs/aio.c

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1447,13 +1447,8 @@ static void aio_complete_rw(struct kiocb *kiocb, long res)
14471447
if (kiocb->ki_flags & IOCB_WRITE) {
14481448
struct inode *inode = file_inode(kiocb->ki_filp);
14491449

1450-
/*
1451-
* Tell lockdep we inherited freeze protection from submission
1452-
* thread.
1453-
*/
14541450
if (S_ISREG(inode->i_mode))
1455-
__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
1456-
file_end_write(kiocb->ki_filp);
1451+
kiocb_end_write(kiocb);
14571452
}
14581453

14591454
iocb->ki_res.res = res;
@@ -1581,17 +1576,8 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
15811576
return ret;
15821577
ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
15831578
if (!ret) {
1584-
/*
1585-
* Open-code file_start_write here to grab freeze protection,
1586-
* which will be released by another thread in
1587-
* aio_complete_rw(). Fool lockdep by telling it the lock got
1588-
* released so that it doesn't complain about the held lock when
1589-
* we return to userspace.
1590-
*/
1591-
if (S_ISREG(file_inode(file)->i_mode)) {
1592-
sb_start_write(file_inode(file)->i_sb);
1593-
__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
1594-
}
1579+
if (S_ISREG(file_inode(file)->i_mode))
1580+
kiocb_start_write(req);
15951581
req->ki_flags |= IOCB_WRITE;
15961582
aio_rw_done(req, call_write_iter(file, req, &iter));
15971583
}

fs/attr.c

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -394,9 +394,25 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
394394
return error;
395395

396396
if ((ia_valid & ATTR_MODE)) {
397-
umode_t amode = attr->ia_mode;
397+
/*
398+
* Don't allow changing the mode of symlinks:
399+
*
400+
* (1) The vfs doesn't take the mode of symlinks into account
401+
* during permission checking.
402+
* (2) This has never worked correctly. Most major filesystems
403+
* did return EOPNOTSUPP due to interactions with POSIX ACLs
404+
* but did still updated the mode of the symlink.
405+
* This inconsistency led system call wrapper providers such
406+
* as libc to block changing the mode of symlinks with
407+
* EOPNOTSUPP already.
408+
* (3) To even do this in the first place one would have to use
409+
* specific file descriptors and quite some effort.
410+
*/
411+
if (S_ISLNK(inode->i_mode))
412+
return -EOPNOTSUPP;
413+
398414
/* Flag setting protected by i_mutex */
399-
if (is_sxid(amode))
415+
if (is_sxid(attr->ia_mode))
400416
inode->i_flags &= ~S_NOSEC;
401417
}
402418

fs/buffer.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
#include <trace/events/block.h>
5050
#include <linux/fscrypt.h>
5151
#include <linux/fsverity.h>
52+
#include <linux/sched/isolation.h>
5253

5354
#include "internal.h"
5455

@@ -1352,7 +1353,7 @@ static void bh_lru_install(struct buffer_head *bh)
13521353
* failing page migration.
13531354
* Skip putting upcoming bh into bh_lru until migration is done.
13541355
*/
1355-
if (lru_cache_disabled()) {
1356+
if (lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) {
13561357
bh_lru_unlock();
13571358
return;
13581359
}
@@ -1382,6 +1383,10 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
13821383

13831384
check_irqs_on();
13841385
bh_lru_lock();
1386+
if (cpu_is_isolated(smp_processor_id())) {
1387+
bh_lru_unlock();
1388+
return NULL;
1389+
}
13851390
for (i = 0; i < BH_LRU_SIZE; i++) {
13861391
struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]);
13871392

fs/cachefiles/io.c

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -259,9 +259,7 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
259259

260260
_enter("%ld", ret);
261261

262-
/* Tell lockdep we inherited freeze protection from submission thread */
263-
__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
264-
__sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
262+
kiocb_end_write(iocb);
265263

266264
if (ret < 0)
267265
trace_cachefiles_io_error(object, inode, ret,
@@ -286,7 +284,6 @@ int __cachefiles_write(struct cachefiles_object *object,
286284
{
287285
struct cachefiles_cache *cache;
288286
struct cachefiles_kiocb *ki;
289-
struct inode *inode;
290287
unsigned int old_nofs;
291288
ssize_t ret;
292289
size_t len = iov_iter_count(iter);
@@ -322,19 +319,12 @@ int __cachefiles_write(struct cachefiles_object *object,
322319
ki->iocb.ki_complete = cachefiles_write_complete;
323320
atomic_long_add(ki->b_writing, &cache->b_writing);
324321

325-
/* Open-code file_start_write here to grab freeze protection, which
326-
* will be released by another thread in aio_complete_rw(). Fool
327-
* lockdep by telling it the lock got released so that it doesn't
328-
* complain about the held lock when we return to userspace.
329-
*/
330-
inode = file_inode(file);
331-
__sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
332-
__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
322+
kiocb_start_write(&ki->iocb);
333323

334324
get_file(ki->iocb.ki_filp);
335325
cachefiles_grab_object(object, cachefiles_obj_get_ioreq);
336326

337-
trace_cachefiles_write(object, inode, ki->iocb.ki_pos, len);
327+
trace_cachefiles_write(object, file_inode(file), ki->iocb.ki_pos, len);
338328
old_nofs = memalloc_nofs_save();
339329
ret = cachefiles_inject_write_error();
340330
if (ret == 0)

fs/dcache.c

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1664,7 +1664,7 @@ static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
16641664
if (dentry == _data && dentry->d_lockref.count == 1)
16651665
return D_WALK_CONTINUE;
16661666

1667-
printk(KERN_ERR "BUG: Dentry %p{i=%lx,n=%pd} "
1667+
WARN(1, "BUG: Dentry %p{i=%lx,n=%pd} "
16681668
" still in use (%d) [unmount of %s %s]\n",
16691669
dentry,
16701670
dentry->d_inode ?
@@ -1673,7 +1673,6 @@ static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
16731673
dentry->d_lockref.count,
16741674
dentry->d_sb->s_type->name,
16751675
dentry->d_sb->s_id);
1676-
WARN_ON(1);
16771676
return D_WALK_CONTINUE;
16781677
}
16791678

@@ -3247,8 +3246,6 @@ void d_genocide(struct dentry *parent)
32473246
d_walk(parent, parent, d_genocide_kill);
32483247
}
32493248

3250-
EXPORT_SYMBOL(d_genocide);
3251-
32523249
void d_tmpfile(struct file *file, struct inode *inode)
32533250
{
32543251
struct dentry *dentry = file->f_path.dentry;

fs/devpts/inode.c

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -534,12 +534,12 @@ void devpts_kill_index(struct pts_fs_info *fsi, int idx)
534534

535535
/**
536536
* devpts_pty_new -- create a new inode in /dev/pts/
537-
* @ptmx_inode: inode of the master
538-
* @device: major+minor of the node to be created
537+
* @fsi: Filesystem info for this instance.
539538
* @index: used as a name of the node
540539
* @priv: what's given back by devpts_get_priv
541540
*
542-
* The created inode is returned. Remove it from /dev/pts/ by devpts_pty_kill.
541+
* The dentry for the created inode is returned.
542+
* Remove it from /dev/pts/ with devpts_pty_kill().
543543
*/
544544
struct dentry *devpts_pty_new(struct pts_fs_info *fsi, int index, void *priv)
545545
{
@@ -580,7 +580,7 @@ struct dentry *devpts_pty_new(struct pts_fs_info *fsi, int index, void *priv)
580580

581581
/**
582582
* devpts_get_priv -- get private data for a slave
583-
* @pts_inode: inode of the slave
583+
* @dentry: dentry of the slave
584584
*
585585
* Returns whatever was passed as priv in devpts_pty_new for a given inode.
586586
*/
@@ -593,7 +593,7 @@ void *devpts_get_priv(struct dentry *dentry)
593593

594594
/**
595595
* devpts_pty_kill -- remove inode form /dev/pts/
596-
* @inode: inode of the slave to be removed
596+
* @dentry: dentry of the slave to be removed
597597
*
598598
* This is an inverse operation of devpts_pty_new.
599599
*/

fs/ecryptfs/crypto.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -441,10 +441,10 @@ int ecryptfs_encrypt_page(struct page *page)
441441
}
442442

443443
lower_offset = lower_offset_for_page(crypt_stat, page);
444-
enc_extent_virt = kmap(enc_extent_page);
444+
enc_extent_virt = kmap_local_page(enc_extent_page);
445445
rc = ecryptfs_write_lower(ecryptfs_inode, enc_extent_virt, lower_offset,
446446
PAGE_SIZE);
447-
kunmap(enc_extent_page);
447+
kunmap_local(enc_extent_virt);
448448
if (rc < 0) {
449449
ecryptfs_printk(KERN_ERR,
450450
"Error attempting to write lower page; rc = [%d]\n",
@@ -490,10 +490,10 @@ int ecryptfs_decrypt_page(struct page *page)
490490
BUG_ON(!(crypt_stat->flags & ECRYPTFS_ENCRYPTED));
491491

492492
lower_offset = lower_offset_for_page(crypt_stat, page);
493-
page_virt = kmap(page);
493+
page_virt = kmap_local_page(page);
494494
rc = ecryptfs_read_lower(page_virt, lower_offset, PAGE_SIZE,
495495
ecryptfs_inode);
496-
kunmap(page);
496+
kunmap_local(page_virt);
497497
if (rc < 0) {
498498
ecryptfs_printk(KERN_ERR,
499499
"Error attempting to read lower page; rc = [%d]\n",

fs/ecryptfs/mmap.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ ecryptfs_copy_up_encrypted_with_header(struct page *page,
125125
/* This is a header extent */
126126
char *page_virt;
127127

128-
page_virt = kmap_atomic(page);
128+
page_virt = kmap_local_page(page);
129129
memset(page_virt, 0, PAGE_SIZE);
130130
/* TODO: Support more than one header extent */
131131
if (view_extent_num == 0) {
@@ -138,7 +138,7 @@ ecryptfs_copy_up_encrypted_with_header(struct page *page,
138138
crypt_stat,
139139
&written);
140140
}
141-
kunmap_atomic(page_virt);
141+
kunmap_local(page_virt);
142142
flush_dcache_page(page);
143143
if (rc) {
144144
printk(KERN_ERR "%s: Error reading xattr "
@@ -255,7 +255,6 @@ static int fill_zeros_to_end_of_page(struct page *page, unsigned int to)
255255
* @mapping: The eCryptfs object
256256
* @pos: The file offset at which to start writing
257257
* @len: Length of the write
258-
* @flags: Various flags
259258
* @pagep: Pointer to return the page
260259
* @fsdata: Pointer to return fs data (unused)
261260
*

0 commit comments

Comments
 (0)