Skip to content

Commit fb56fdf

Browse files
ryncsnakpm00
authored andcommitted
mm/list_lru: split the lock to per-cgroup scope
Currently, every list_lru has a per-node lock that protects adding, deletion, isolation, and reparenting of all list_lru_one instances belonging to this list_lru on this node. This lock contention is heavy when multiple cgroups modify the same list_lru. This lock can be split into per-cgroup scope to reduce contention. To achieve this, we need a stable list_lru_one for every cgroup. This commit adds a lock to each list_lru_one and introduced a helper function lock_list_lru_of_memcg, making it possible to pin the list_lru of a memcg. Then reworked the reparenting process. Reparenting will switch the list_lru_one instances one by one. By locking each instance and marking it dead using the nr_items counter, reparenting ensures that all items in the corresponding cgroup (on-list or not, because items have a stable cgroup, see below) will see the list_lru_one switch synchronously. Objcg reparent is also moved after list_lru reparent so items will have a stable mem cgroup until all list_lru_one instances are drained. The only caller that doesn't work the *_obj interfaces are direct calls to list_lru_{add,del}. But it's only used by zswap and that's also based on objcg, so it's fine. This also changes the bahaviour of the isolation function when LRU_RETRY or LRU_REMOVED_RETRY is returned, because now releasing the lock could unblock reparenting and free the list_lru_one, isolation function will have to return withoug re-lock the lru. prepare() { mkdir /tmp/test-fs modprobe brd rd_nr=1 rd_size=33554432 mkfs.xfs -f /dev/ram0 mount -t xfs /dev/ram0 /tmp/test-fs for i in $(seq 1 512); do mkdir "/tmp/test-fs/$i" for j in $(seq 1 10240); do echo TEST-CONTENT > "/tmp/test-fs/$i/$j" done & done; wait } do_test() { read_worker() { sleep 1 tar -cv "$1" &>/dev/null } read_in_all() { cd "/tmp/test-fs" && ls for i in $(seq 1 512); do (exec sh -c 'echo "$PPID"') > "/sys/fs/cgroup/benchmark/$i/cgroup.procs" read_worker "$i" & done; wait } for i in $(seq 1 512); do mkdir -p "/sys/fs/cgroup/benchmark/$i" done echo +memory > /sys/fs/cgroup/benchmark/cgroup.subtree_control echo 512M > /sys/fs/cgroup/benchmark/memory.max echo 3 > /proc/sys/vm/drop_caches time read_in_all } Above script simulates compression of small files in multiple cgroups with memory pressure. Run prepare() then do_test for 6 times: Before: real 0m7.762s user 0m11.340s sys 3m11.224s real 0m8.123s user 0m11.548s sys 3m2.549s real 0m7.736s user 0m11.515s sys 3m11.171s real 0m8.539s user 0m11.508s sys 3m7.618s real 0m7.928s user 0m11.349s sys 3m13.063s real 0m8.105s user 0m11.128s sys 3m14.313s After this commit (about ~15% faster): real 0m6.953s user 0m11.327s sys 2m42.912s real 0m7.453s user 0m11.343s sys 2m51.942s real 0m6.916s user 0m11.269s sys 2m43.957s real 0m6.894s user 0m11.528s sys 2m45.346s real 0m6.911s user 0m11.095s sys 2m43.168s real 0m6.773s user 0m11.518s sys 2m40.774s Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kairui Song <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Waiman Long <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 28e9802 commit fb56fdf

File tree

8 files changed

+135
-103
lines changed

8 files changed

+135
-103
lines changed

drivers/android/binder_alloc.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1106,7 +1106,6 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
11061106
mmput_async(mm);
11071107
__free_page(page_to_free);
11081108

1109-
spin_lock(lock);
11101109
return LRU_REMOVED_RETRY;
11111110

11121111
err_invalid_vma:

fs/inode.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -934,7 +934,6 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
934934
mm_account_reclaimed_pages(reap);
935935
}
936936
inode_unpin_lru_isolating(inode);
937-
spin_lock(lru_lock);
938937
return LRU_RETRY;
939938
}
940939

fs/xfs/xfs_qm.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -496,7 +496,6 @@ xfs_qm_dquot_isolate(
496496
trace_xfs_dqreclaim_busy(dqp);
497497
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses);
498498
xfs_dqunlock(dqp);
499-
spin_lock(lru_lock);
500499
return LRU_RETRY;
501500
}
502501

include/linux/list_lru.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ struct list_lru_one {
3232
struct list_head list;
3333
/* may become negative during memcg reparenting */
3434
long nr_items;
35+
/* protects all fields above */
36+
spinlock_t lock;
3537
};
3638

3739
struct list_lru_memcg {
@@ -41,11 +43,9 @@ struct list_lru_memcg {
4143
};
4244

4345
struct list_lru_node {
44-
/* protects all lists on the node, including per cgroup */
45-
spinlock_t lock;
4646
/* global list, used for the root cgroup in cgroup aware lrus */
4747
struct list_lru_one lru;
48-
long nr_items;
48+
atomic_long_t nr_items;
4949
} ____cacheline_aligned_in_smp;
5050

5151
struct list_lru {

0 commit comments

Comments
 (0)