Skip to content

Commit e7b9cea

Browse files
laoarbrauner
authored andcommitted
vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization. To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes. To maintain service stability, we need to preserve more dentries during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for our workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control. The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation. Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0] Signed-off-by: Yafang Shao <[email protected]> Link: https://lore.kernel.org/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
1 parent 8d91170 commit e7b9cea

File tree

2 files changed

+31
-12
lines changed

2 files changed

+31
-12
lines changed

Documentation/admin-guide/sysctl/vm.rst

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ Currently, these files are in /proc/sys/vm:
7575
- unprivileged_userfaultfd
7676
- user_reserve_kbytes
7777
- vfs_cache_pressure
78+
- vfs_cache_pressure_denom
7879
- watermark_boost_factor
7980
- watermark_scale_factor
8081
- zone_reclaim_mode
@@ -1017,19 +1018,28 @@ vfs_cache_pressure
10171018
This percentage value controls the tendency of the kernel to reclaim
10181019
the memory which is used for caching of directory and inode objects.
10191020

1020-
At the default value of vfs_cache_pressure=100 the kernel will attempt to
1021-
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
1022-
swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
1023-
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
1024-
never reclaim dentries and inodes due to memory pressure and this can easily
1025-
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
1026-
causes the kernel to prefer to reclaim dentries and inodes.
1021+
At the default value of vfs_cache_pressure=vfs_cache_pressure_denom the kernel
1022+
will attempt to reclaim dentries and inodes at a "fair" rate with respect to
1023+
pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the
1024+
kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0,
1025+
the kernel will never reclaim dentries and inodes due to memory pressure and
1026+
this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure
1027+
beyond vfs_cache_pressure_denom causes the kernel to prefer to reclaim dentries
1028+
and inodes.
10271029

1028-
Increasing vfs_cache_pressure significantly beyond 100 may have negative
1029-
performance impact. Reclaim code needs to take various locks to find freeable
1030-
directory and inode objects. With vfs_cache_pressure=1000, it will look for
1031-
ten times more freeable objects than there are.
1030+
Increasing vfs_cache_pressure significantly beyond vfs_cache_pressure_denom may
1031+
have negative performance impact. Reclaim code needs to take various locks to
1032+
find freeable directory and inode objects. When vfs_cache_pressure equals
1033+
(10 * vfs_cache_pressure_denom), it will look for ten times more freeable
1034+
objects than there are.
10321035

1036+
Note: This setting should always be used together with vfs_cache_pressure_denom.
1037+
1038+
vfs_cache_pressure_denom
1039+
========================
1040+
1041+
Defaults to 100 (minimum allowed value). Requires corresponding
1042+
vfs_cache_pressure setting to take effect.
10331043

10341044
watermark_boost_factor
10351045
======================

fs/dcache.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,10 +74,11 @@
7474
* arbitrary, since it's serialized on rename_lock
7575
*/
7676
static int sysctl_vfs_cache_pressure __read_mostly = 100;
77+
static int sysctl_vfs_cache_pressure_denom __read_mostly = 100;
7778

7879
unsigned long vfs_pressure_ratio(unsigned long val)
7980
{
80-
return mult_frac(val, sysctl_vfs_cache_pressure, 100);
81+
return mult_frac(val, sysctl_vfs_cache_pressure, sysctl_vfs_cache_pressure_denom);
8182
}
8283
EXPORT_SYMBOL_GPL(vfs_pressure_ratio);
8384

@@ -225,6 +226,14 @@ static const struct ctl_table vm_dcache_sysctls[] = {
225226
.proc_handler = proc_dointvec_minmax,
226227
.extra1 = SYSCTL_ZERO,
227228
},
229+
{
230+
.procname = "vfs_cache_pressure_denom",
231+
.data = &sysctl_vfs_cache_pressure_denom,
232+
.maxlen = sizeof(sysctl_vfs_cache_pressure_denom),
233+
.mode = 0644,
234+
.proc_handler = proc_dointvec_minmax,
235+
.extra1 = SYSCTL_ONE_HUNDRED,
236+
},
228237
};
229238

230239
static int __init init_fs_dcache_sysctls(void)

0 commit comments

Comments
 (0)