Skip to content

Commit 181d8e3

Browse files
committed
Merge tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Use folios for symlinks in the page cache FUSE already uses folios for its symlinks. Mirror that conversion in the generic code and the NFS code. That lets us get rid of a few folio->page->folio conversions in this path, and some of the few remaining users of read_cache_page() / read_mapping_page() - Try and make a few filesystem operations killable on the VFS inode->i_mutex level - Add sysctl vfs_cache_pressure_denom for bulk file operations Some workloads need to preserve more dentries than we currently allow through out sysctl interface A HDFS servers with 12 HDDs per server, on a HDFS datanode startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization To minimize dentry reclamation, they set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes To maintain service stability, more dentries need to be preserved during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for such workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation - Avoid some jumps in inode_permission() using likely()/unlikely() - Avid a memory access which is most likely a cache miss when descending into devcgroup_inode_permission() - Add fastpath predicts for stat() and fdput() - Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode - Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock - Add proper tests for anonymous inode behavior - Make it easy to detect proper anonymous inodes and to ensure that we can detect them in codepaths such as readahead() Cleanups: - Port pidfs to the new anon_inode_{g,s}etattr() helpers - Try to remove the uselib() system call - Add unlikely branch hint return path for poll - Add unlikely branch hint on return path for core_sys_select - Don't allow signals to interrupt getdents copying for fuse - Provide a size hint to dir_context for during readdir() - Use writeback_iter directly in mpage_writepages - Update compression and mtime descriptions in initramfs documentation - Update main netfs API document - Remove useless plus one in super_cache_scan() - Remove unnecessary NULL-check guards during setns() - Add separate separate {get,put}_cgroup_ns no-op cases Fixes: - Fix typo in root= kernel parameter description - Use KERN_INFO for infof()|info_plog()|infofc() - Correct comments of fs_validate_description() - Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() - Delete macro fsparam_u32hex() - Remove unused and problematic validate_constant_table() - Fix potential unsigned integer underflow in fs_name() - Make file-nr output the total allocated file handles" * tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits) fs: Pass a folio to page_put_link() nfs: Use a folio in nfs_get_link() fs: Convert __page_get_link() to use a folio fs/read_write: make default_llseek() killable fs/open: make do_truncate() killable fs/open: make chmod_common() and chown_common() killable include/linux/fs.h: add inode_lock_killable() readdir: supply dir_context.count as readdir buffer size hint vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations fuse: don't allow signals to interrupt getdents copying Documentation: fix typo in root= kernel parameter description include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards fs: use writeback_iter directly in mpage_writepages fs: remove useless plus one in super_cache_scan() fs: add S_ANON_INODE fs: remove uselib() system call device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() fs/fs_parse: Remove unused and problematic validate_constant_table() fs: touch up predicts in inode_permission() ...
2 parents a1ae8ce + 76145cb commit 181d8e3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1164
-694
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6268,7 +6268,7 @@
62686268
port and the regular usb controller gets disabled.
62696269

62706270
root= [KNL] Root filesystem
6271-
Usually this a a block device specifier of some kind,
6271+
Usually this is a block device specifier of some kind,
62726272
see the early_lookup_bdev comment in
62736273
block/early-lookup.c for details.
62746274
Alternatively this can be "ram" for the legacy initial

Documentation/admin-guide/sysctl/vm.rst

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ Currently, these files are in /proc/sys/vm:
7575
- unprivileged_userfaultfd
7676
- user_reserve_kbytes
7777
- vfs_cache_pressure
78+
- vfs_cache_pressure_denom
7879
- watermark_boost_factor
7980
- watermark_scale_factor
8081
- zone_reclaim_mode
@@ -1017,19 +1018,28 @@ vfs_cache_pressure
10171018
This percentage value controls the tendency of the kernel to reclaim
10181019
the memory which is used for caching of directory and inode objects.
10191020

1020-
At the default value of vfs_cache_pressure=100 the kernel will attempt to
1021-
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
1022-
swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
1023-
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
1024-
never reclaim dentries and inodes due to memory pressure and this can easily
1025-
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
1026-
causes the kernel to prefer to reclaim dentries and inodes.
1021+
At the default value of vfs_cache_pressure=vfs_cache_pressure_denom the kernel
1022+
will attempt to reclaim dentries and inodes at a "fair" rate with respect to
1023+
pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the
1024+
kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0,
1025+
the kernel will never reclaim dentries and inodes due to memory pressure and
1026+
this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure
1027+
beyond vfs_cache_pressure_denom causes the kernel to prefer to reclaim dentries
1028+
and inodes.
10271029

1028-
Increasing vfs_cache_pressure significantly beyond 100 may have negative
1029-
performance impact. Reclaim code needs to take various locks to find freeable
1030-
directory and inode objects. With vfs_cache_pressure=1000, it will look for
1031-
ten times more freeable objects than there are.
1030+
Increasing vfs_cache_pressure significantly beyond vfs_cache_pressure_denom may
1031+
have negative performance impact. Reclaim code needs to take various locks to
1032+
find freeable directory and inode objects. When vfs_cache_pressure equals
1033+
(10 * vfs_cache_pressure_denom), it will look for ten times more freeable
1034+
objects than there are.
10321035

1036+
Note: This setting should always be used together with vfs_cache_pressure_denom.
1037+
1038+
vfs_cache_pressure_denom
1039+
========================
1040+
1041+
Defaults to 100 (minimum allowed value). Requires corresponding
1042+
vfs_cache_pressure setting to take effect.
10331043

10341044
watermark_boost_factor
10351045
======================

Documentation/driver-api/early-userspace/buffer-format.rst

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,33 +4,39 @@ initramfs buffer format
44

55
Al Viro, H. Peter Anvin
66

7-
Last revision: 2002-01-13
8-
9-
Starting with kernel 2.5.x, the old "initial ramdisk" protocol is
10-
getting {replaced/complemented} with the new "initial ramfs"
11-
(initramfs) protocol. The initramfs contents is passed using the same
12-
memory buffer protocol used by the initrd protocol, but the contents
7+
With kernel 2.5.x, the old "initial ramdisk" protocol was complemented
8+
with an "initial ramfs" protocol. The initramfs content is passed
9+
using the same memory buffer protocol used by initrd, but the content
1310
is different. The initramfs buffer contains an archive which is
14-
expanded into a ramfs filesystem; this document details the format of
15-
the initramfs buffer format.
11+
expanded into a ramfs filesystem; this document details the initramfs
12+
buffer format.
1613

1714
The initramfs buffer format is based around the "newc" or "crc" CPIO
1815
formats, and can be created with the cpio(1) utility. The cpio
19-
archive can be compressed using gzip(1). One valid version of an
20-
initramfs buffer is thus a single .cpio.gz file.
16+
archive can be compressed using gzip(1), or any other algorithm provided
17+
via CONFIG_DECOMPRESS_*. One valid version of an initramfs buffer is
18+
thus a single .cpio.gz file.
2119

2220
The full format of the initramfs buffer is defined by the following
2321
grammar, where::
2422

2523
* is used to indicate "0 or more occurrences of"
2624
(|) indicates alternatives
2725
+ indicates concatenation
28-
GZIP() indicates the gzip(1) of the operand
26+
GZIP() indicates gzip compression of the operand
27+
BZIP2() indicates bzip2 compression of the operand
28+
LZMA() indicates lzma compression of the operand
29+
XZ() indicates xz compression of the operand
30+
LZO() indicates lzo compression of the operand
31+
LZ4() indicates lz4 compression of the operand
32+
ZSTD() indicates zstd compression of the operand
2933
ALGN(n) means padding with null bytes to an n-byte boundary
3034

31-
initramfs := ("\0" | cpio_archive | cpio_gzip_archive)*
35+
initramfs := ("\0" | cpio_archive | cpio_compressed_archive)*
3236

33-
cpio_gzip_archive := GZIP(cpio_archive)
37+
cpio_compressed_archive := (GZIP(cpio_archive) | BZIP2(cpio_archive)
38+
| LZMA(cpio_archive) | XZ(cpio_archive) | LZO(cpio_archive)
39+
| LZ4(cpio_archive) | ZSTD(cpio_archive))
3440

3541
cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
3642

@@ -75,6 +81,8 @@ c_chksum 8 bytes Checksum of data field if c_magic is 070702;
7581
The c_mode field matches the contents of st_mode returned by stat(2)
7682
on Linux, and encodes the file type and file permissions.
7783

84+
c_mtime is ignored unless CONFIG_INITRAMFS_PRESERVE_MTIME=y is set.
85+
7886
The c_filesize should be zero for any file which is not a regular file
7987
or symlink.
8088

Documentation/filesystems/mount_api.rst

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -671,7 +671,6 @@ The members are as follows:
671671
fsparam_bool() fs_param_is_bool
672672
fsparam_u32() fs_param_is_u32
673673
fsparam_u32oct() fs_param_is_u32_octal
674-
fsparam_u32hex() fs_param_is_u32_hex
675674
fsparam_s32() fs_param_is_s32
676675
fsparam_u64() fs_param_is_u64
677676
fsparam_enum() fs_param_is_enum
@@ -753,21 +752,6 @@ process the parameters it is given.
753752
If a match is found, the corresponding value is returned. If a match
754753
isn't found, the not_found value is returned instead.
755754

756-
* ::
757-
758-
bool validate_constant_table(const struct constant_table *tbl,
759-
size_t tbl_size,
760-
int low, int high, int special);
761-
762-
Validate a constant table. Checks that all the elements are appropriately
763-
ordered, that there are no duplicates and that the values are between low
764-
and high inclusive, though provision is made for one allowable special
765-
value outside of that range. If no special value is required, special
766-
should just be set to lie inside the low-to-high range.
767-
768-
If all is good, true is returned. If the table is invalid, errors are
769-
logged to the kernel log buffer and false is returned.
770-
771755
* ::
772756

773757
bool fs_validate_description(const char *name,

0 commit comments

Comments
 (0)