Skip to content

Commit c744dc4

Browse files
bea-armPeter Zijlstra
authored andcommitted
sched/topology: Rework CPU capacity asymmetry detection
Currently the CPU capacity asymmetry detection, performed through asym_cpu_capacity_level, tries to identify the lowest topology level at which the highest CPU capacity is being observed, not necessarily finding the level at which all possible capacity values are visible to all CPUs, which might be bit problematic for some possible/valid asymmetric topologies i.e.: DIE [ ] MC [ ][ ] CPU [0] [1] [2] [3] [4] [5] [6] [7] Capacity |.....| |.....| |.....| |.....| L M B B Where: arch_scale_cpu_capacity(L) = 512 arch_scale_cpu_capacity(M) = 871 arch_scale_cpu_capacity(B) = 1024 In this particular case, the asymmetric topology level will point at MC, as all possible CPU masks for that level do cover the CPU with the highest capacity. It will work just fine for the first cluster, not so much for the second one though (consider the find_energy_efficient_cpu which might end up attempting the energy aware wake-up for a domain that does not see any asymmetry at all) Rework the way the capacity asymmetry levels are being detected, allowing to point to the lowest topology level (for a given CPU), where full set of available CPU capacities is visible to all CPUs within given domain. As a result, the per-cpu sd_asym_cpucapacity might differ across the domains. This will have an impact on EAS wake-up placement in a way that it might see different range of CPUs to be considered, depending on the given current and target CPUs. Additionally, those levels, where any range of asymmetry (not necessarily full) is being detected will get identified as well. The selected asymmetric topology level will be denoted by SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels' would receive the already used SD_ASYM_CPUCAPACITY flag. This allows maintaining the current behaviour for asymmetric topologies, with misfit migration operating correctly on lower levels, if applicable, as any asymmetry is enough to trigger the misfit migration. The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not relate to the full asymmetry level denoted by the sd_asym_cpucapacity pointer. Detecting the CPU capacity asymmetry is being based on a set of available CPU capacities for all possible CPUs. This data is being generated upon init and updated once CPU topology changes are being detected (through arch_update_cpu_topology). As such, any changes to identified CPU capacities (like initializing cpufreq) need to be explicitly advertised by corresponding archs to trigger rebuilding the data. Additional -dflags- parameter, used when building sched domains, has been removed as well, as the asymmetry flags are now being set directly in sd_init. Suggested-by: Peter Zijlstra <[email protected]> Suggested-by: Valentin Schneider <[email protected]> Signed-off-by: Beata Michalska <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Tested-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 2309a05 commit c744dc4

File tree

1 file changed

+131
-78
lines changed

1 file changed

+131
-78
lines changed

kernel/sched/topology.c

Lines changed: 131 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -675,7 +675,7 @@ static void update_top_cache_domain(int cpu)
675675
sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
676676
rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
677677

678-
sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY);
678+
sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL);
679679
rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd);
680680
}
681681

@@ -1266,6 +1266,116 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
12661266
update_group_capacity(sd, cpu);
12671267
}
12681268

1269+
/*
1270+
* Asymmetric CPU capacity bits
1271+
*/
1272+
struct asym_cap_data {
1273+
struct list_head link;
1274+
unsigned long capacity;
1275+
unsigned long cpus[];
1276+
};
1277+
1278+
/*
1279+
* Set of available CPUs grouped by their corresponding capacities
1280+
* Each list entry contains a CPU mask reflecting CPUs that share the same
1281+
* capacity.
1282+
* The lifespan of data is unlimited.
1283+
*/
1284+
static LIST_HEAD(asym_cap_list);
1285+
1286+
#define cpu_capacity_span(asym_data) to_cpumask((asym_data)->cpus)
1287+
1288+
/*
1289+
* Verify whether there is any CPU capacity asymmetry in a given sched domain.
1290+
* Provides sd_flags reflecting the asymmetry scope.
1291+
*/
1292+
static inline int
1293+
asym_cpu_capacity_classify(const struct cpumask *sd_span,
1294+
const struct cpumask *cpu_map)
1295+
{
1296+
struct asym_cap_data *entry;
1297+
int count = 0, miss = 0;
1298+
1299+
/*
1300+
* Count how many unique CPU capacities this domain spans across
1301+
* (compare sched_domain CPUs mask with ones representing available
1302+
* CPUs capacities). Take into account CPUs that might be offline:
1303+
* skip those.
1304+
*/
1305+
list_for_each_entry(entry, &asym_cap_list, link) {
1306+
if (cpumask_intersects(sd_span, cpu_capacity_span(entry)))
1307+
++count;
1308+
else if (cpumask_intersects(cpu_map, cpu_capacity_span(entry)))
1309+
++miss;
1310+
}
1311+
1312+
WARN_ON_ONCE(!count && !list_empty(&asym_cap_list));
1313+
1314+
/* No asymmetry detected */
1315+
if (count < 2)
1316+
return 0;
1317+
/* Some of the available CPU capacity values have not been detected */
1318+
if (miss)
1319+
return SD_ASYM_CPUCAPACITY;
1320+
1321+
/* Full asymmetry */
1322+
return SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
1323+
1324+
}
1325+
1326+
static inline void asym_cpu_capacity_update_data(int cpu)
1327+
{
1328+
unsigned long capacity = arch_scale_cpu_capacity(cpu);
1329+
struct asym_cap_data *entry = NULL;
1330+
1331+
list_for_each_entry(entry, &asym_cap_list, link) {
1332+
if (capacity == entry->capacity)
1333+
goto done;
1334+
}
1335+
1336+
entry = kzalloc(sizeof(*entry) + cpumask_size(), GFP_KERNEL);
1337+
if (WARN_ONCE(!entry, "Failed to allocate memory for asymmetry data\n"))
1338+
return;
1339+
entry->capacity = capacity;
1340+
list_add(&entry->link, &asym_cap_list);
1341+
done:
1342+
__cpumask_set_cpu(cpu, cpu_capacity_span(entry));
1343+
}
1344+
1345+
/*
1346+
* Build-up/update list of CPUs grouped by their capacities
1347+
* An update requires explicit request to rebuild sched domains
1348+
* with state indicating CPU topology changes.
1349+
*/
1350+
static void asym_cpu_capacity_scan(void)
1351+
{
1352+
struct asym_cap_data *entry, *next;
1353+
int cpu;
1354+
1355+
list_for_each_entry(entry, &asym_cap_list, link)
1356+
cpumask_clear(cpu_capacity_span(entry));
1357+
1358+
for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_FLAG_DOMAIN))
1359+
asym_cpu_capacity_update_data(cpu);
1360+
1361+
list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
1362+
if (cpumask_empty(cpu_capacity_span(entry))) {
1363+
list_del(&entry->link);
1364+
kfree(entry);
1365+
}
1366+
}
1367+
1368+
/*
1369+
* Only one capacity value has been detected i.e. this system is symmetric.
1370+
* No need to keep this data around.
1371+
*/
1372+
if (list_is_singular(&asym_cap_list)) {
1373+
entry = list_first_entry(&asym_cap_list, typeof(*entry), link);
1374+
list_del(&entry->link);
1375+
kfree(entry);
1376+
}
1377+
}
1378+
12691379
/*
12701380
* Initializers for schedule domains
12711381
* Non-inlined to reduce accumulated stack pressure in build_sched_domains()
@@ -1399,11 +1509,12 @@ int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE;
13991509
static struct sched_domain *
14001510
sd_init(struct sched_domain_topology_level *tl,
14011511
const struct cpumask *cpu_map,
1402-
struct sched_domain *child, int dflags, int cpu)
1512+
struct sched_domain *child, int cpu)
14031513
{
14041514
struct sd_data *sdd = &tl->data;
14051515
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
14061516
int sd_id, sd_weight, sd_flags = 0;
1517+
struct cpumask *sd_span;
14071518

14081519
#ifdef CONFIG_NUMA
14091520
/*
@@ -1420,9 +1531,6 @@ sd_init(struct sched_domain_topology_level *tl,
14201531
"wrong sd_flags in topology description\n"))
14211532
sd_flags &= TOPOLOGY_SD_FLAGS;
14221533

1423-
/* Apply detected topology flags */
1424-
sd_flags |= dflags;
1425-
14261534
*sd = (struct sched_domain){
14271535
.min_interval = sd_weight,
14281536
.max_interval = 2*sd_weight,
@@ -1454,13 +1562,19 @@ sd_init(struct sched_domain_topology_level *tl,
14541562
#endif
14551563
};
14561564

1457-
cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
1458-
sd_id = cpumask_first(sched_domain_span(sd));
1565+
sd_span = sched_domain_span(sd);
1566+
cpumask_and(sd_span, cpu_map, tl->mask(cpu));
1567+
sd_id = cpumask_first(sd_span);
1568+
1569+
sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
1570+
1571+
WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
1572+
(SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),
1573+
"CPU capacity asymmetry not supported on SMT\n");
14591574

14601575
/*
14611576
* Convert topological properties into behaviour.
14621577
*/
1463-
14641578
/* Don't attempt to spread across CPUs of different capacities. */
14651579
if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
14661580
sd->child->flags &= ~SD_PREFER_SIBLING;
@@ -1926,9 +2040,9 @@ static void __sdt_free(const struct cpumask *cpu_map)
19262040

19272041
static struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
19282042
const struct cpumask *cpu_map, struct sched_domain_attr *attr,
1929-
struct sched_domain *child, int dflags, int cpu)
2043+
struct sched_domain *child, int cpu)
19302044
{
1931-
struct sched_domain *sd = sd_init(tl, cpu_map, child, dflags, cpu);
2045+
struct sched_domain *sd = sd_init(tl, cpu_map, child, cpu);
19322046

19332047
if (child) {
19342048
sd->level = child->level + 1;
@@ -1990,65 +2104,6 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
19902104
return true;
19912105
}
19922106

1993-
/*
1994-
* Find the sched_domain_topology_level where all CPU capacities are visible
1995-
* for all CPUs.
1996-
*/
1997-
static struct sched_domain_topology_level
1998-
*asym_cpu_capacity_level(const struct cpumask *cpu_map)
1999-
{
2000-
int i, j, asym_level = 0;
2001-
bool asym = false;
2002-
struct sched_domain_topology_level *tl, *asym_tl = NULL;
2003-
unsigned long cap;
2004-
2005-
/* Is there any asymmetry? */
2006-
cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
2007-
2008-
for_each_cpu(i, cpu_map) {
2009-
if (arch_scale_cpu_capacity(i) != cap) {
2010-
asym = true;
2011-
break;
2012-
}
2013-
}
2014-
2015-
if (!asym)
2016-
return NULL;
2017-
2018-
/*
2019-
* Examine topology from all CPU's point of views to detect the lowest
2020-
* sched_domain_topology_level where a highest capacity CPU is visible
2021-
* to everyone.
2022-
*/
2023-
for_each_cpu(i, cpu_map) {
2024-
unsigned long max_capacity = arch_scale_cpu_capacity(i);
2025-
int tl_id = 0;
2026-
2027-
for_each_sd_topology(tl) {
2028-
if (tl_id < asym_level)
2029-
goto next_level;
2030-
2031-
for_each_cpu_and(j, tl->mask(i), cpu_map) {
2032-
unsigned long capacity;
2033-
2034-
capacity = arch_scale_cpu_capacity(j);
2035-
2036-
if (capacity <= max_capacity)
2037-
continue;
2038-
2039-
max_capacity = capacity;
2040-
asym_level = tl_id;
2041-
asym_tl = tl;
2042-
}
2043-
next_level:
2044-
tl_id++;
2045-
}
2046-
}
2047-
2048-
return asym_tl;
2049-
}
2050-
2051-
20522107
/*
20532108
* Build sched domains for a given set of CPUs and attach the sched domains
20542109
* to the individual CPUs
@@ -2061,7 +2116,6 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
20612116
struct s_data d;
20622117
struct rq *rq = NULL;
20632118
int i, ret = -ENOMEM;
2064-
struct sched_domain_topology_level *tl_asym;
20652119
bool has_asym = false;
20662120

20672121
if (WARN_ON(cpumask_empty(cpu_map)))
@@ -2071,24 +2125,19 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
20712125
if (alloc_state != sa_rootdomain)
20722126
goto error;
20732127

2074-
tl_asym = asym_cpu_capacity_level(cpu_map);
2075-
20762128
/* Set up domains for CPUs specified by the cpu_map: */
20772129
for_each_cpu(i, cpu_map) {
20782130
struct sched_domain_topology_level *tl;
2079-
int dflags = 0;
20802131

20812132
sd = NULL;
20822133
for_each_sd_topology(tl) {
2083-
if (tl == tl_asym) {
2084-
dflags |= SD_ASYM_CPUCAPACITY;
2085-
has_asym = true;
2086-
}
20872134

20882135
if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
20892136
goto error;
20902137

2091-
sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
2138+
sd = build_sched_domain(tl, cpu_map, attr, sd, i);
2139+
2140+
has_asym |= sd->flags & SD_ASYM_CPUCAPACITY;
20922141

20932142
if (tl == sched_domain_topology)
20942143
*per_cpu_ptr(d.sd, i) = sd;
@@ -2217,6 +2266,7 @@ int sched_init_domains(const struct cpumask *cpu_map)
22172266
zalloc_cpumask_var(&fallback_doms, GFP_KERNEL);
22182267

22192268
arch_update_cpu_topology();
2269+
asym_cpu_capacity_scan();
22202270
ndoms_cur = 1;
22212271
doms_cur = alloc_sched_domains(ndoms_cur);
22222272
if (!doms_cur)
@@ -2299,6 +2349,9 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
22992349

23002350
/* Let the architecture update CPU core mappings: */
23012351
new_topology = arch_update_cpu_topology();
2352+
/* Trigger rebuilding CPU capacity asymmetry data */
2353+
if (new_topology)
2354+
asym_cpu_capacity_scan();
23022355

23032356
if (!doms_new) {
23042357
WARN_ON_ONCE(dattr_new);

0 commit comments

Comments
 (0)