You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
set_initial_priority() tries to jump-start global reclaim by estimating
the priority based on cold/hot LRU pages. The estimation does not account
for shrinker objects, and it cannot do so because their sizes can be in
different units other than page.
If shrinker objects are the majority, e.g., on TrueNAS SCALE 24.04.0 where
ZFS ARC can use almost all system memory, set_initial_priority() can
vastly underestimate how much memory ARC shrinker can evict and assign
extreme low values to scan_control->priority, resulting in overshoots of
shrinker objects.
To reproduce the problem, using TrueNAS SCALE 24.04.0 with 32GB DRAM, a
test ZFS pool and the following commands:
fio --name=mglru.file --numjobs=36 --ioengine=io_uring \
--directory=/root/test-zfs-pool/ --size=1024m --buffered=1 \
--rw=randread --random_distribution=random \
--time_based --runtime=1h &
for ((i = 0; i < 20; i++))
do
sleep 120
fio --name=mglru.anon --numjobs=16 --ioengine=mmap \
--filename=/dev/zero --size=1024m --fadvise_hint=0 \
--rw=randrw --random_distribution=random \
--time_based --runtime=1m
done
To fix the problem:
1. Cap scan_control->priority at or above DEF_PRIORITY/2, to prevent
the jump-start from being overly aggressive.
2. Account for the progress from mm_account_reclaimed_pages(), to
prevent kswapd_shrink_node() from raising the priority
unnecessarily.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e4dde56 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <[email protected]>
Reported-by: Alexander Motin <[email protected]>
Cc: Wei Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
0 commit comments