Skip to content

Commit 203c06e

Browse files
gormanmtorvalds
authored andcommitted
mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
Dave Hansen reported the following about Feng Tang's tests on a machine with persistent memory onlined as a DRAM-like device. Feng Tang tossed these on a "Cascade Lake" system with 96 threads and ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile use" mode and being managed via the buddy just like the normal RAM. The PMEM zones are big ones: present 65011712 = 248 G high 134595 = 525 M The PMEM nodes, of course, don't have any CPUs in them. With your series, the pcp->high value per-cpu is 69584 pages or about 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of worst-case memory in the pcps per zone, or roughly 10% of the size of the zone. This should not cause a problem as such although it could trigger reclaim due to pages being stored on per-cpu lists for CPUs remote to a node. It is not possible to treat cpuless nodes exactly the same as normal nodes but the worst-case scenario can be mitigated by splitting pcp->high across all online CPUs for cpuless memory nodes. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Dave Hansen <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Tang, Feng" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 44042b4 commit 203c06e

File tree

1 file changed

+9
-5
lines changed

1 file changed

+9
-5
lines changed

mm/page_alloc.c

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6790,7 +6790,7 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online)
67906790
{
67916791
#ifdef CONFIG_MMU
67926792
int high;
6793-
int nr_local_cpus;
6793+
int nr_split_cpus;
67946794
unsigned long total_pages;
67956795

67966796
if (!percpu_pagelist_high_fraction) {
@@ -6813,10 +6813,14 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online)
68136813
* Split the high value across all online CPUs local to the zone. Note
68146814
* that early in boot that CPUs may not be online yet and that during
68156815
* CPU hotplug that the cpumask is not yet updated when a CPU is being
6816-
* onlined.
6817-
*/
6818-
nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online;
6819-
high = total_pages / nr_local_cpus;
6816+
* onlined. For memory nodes that have no CPUs, split pcp->high across
6817+
* all online CPUs to mitigate the risk that reclaim is triggered
6818+
* prematurely due to pages stored on pcp lists.
6819+
*/
6820+
nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online;
6821+
if (!nr_split_cpus)
6822+
nr_split_cpus = num_online_cpus();
6823+
high = total_pages / nr_split_cpus;
68206824

68216825
/*
68226826
* Ensure high is at least batch*4. The multiple is based on the

0 commit comments

Comments
 (0)