Skip to content

Commit f6ce6b9

Browse files
arighihtejun
authored andcommitted
sched_ext: Do not enable LLC/NUMA optimizations when domains overlap
When the LLC and NUMA domains fully overlap, enabling both optimizations in the built-in idle CPU selection policy is redundant, as it leads to searching for an idle CPU within the same domain twice. Likewise, if all online CPUs are within a single LLC domain, LLC optimization is unnecessary. Therefore, detect overlapping domains and enable topology optimizations only when necessary. Moreover, rely on the online CPUs for this detection logic, instead of using the possible CPUs. Fixes: 860a452 ("sched_ext: Introduce NUMA awareness to the default idle selection policy") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
1 parent 860a452 commit f6ce6b9

File tree

1 file changed

+72
-13
lines changed

1 file changed

+72
-13
lines changed

kernel/sched/ext.c

Lines changed: 72 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3129,12 +3129,63 @@ static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
31293129
goto retry;
31303130
}
31313131

3132+
/*
3133+
* Return true if the LLC domains do not perfectly overlap with the NUMA
3134+
* domains, false otherwise.
3135+
*/
3136+
static bool llc_numa_mismatch(void)
3137+
{
3138+
int cpu;
3139+
3140+
/*
3141+
* We need to scan all online CPUs to verify whether their scheduling
3142+
* domains overlap.
3143+
*
3144+
* While it is rare to encounter architectures with asymmetric NUMA
3145+
* topologies, CPU hotplugging or virtualized environments can result
3146+
* in asymmetric configurations.
3147+
*
3148+
* For example:
3149+
*
3150+
* NUMA 0:
3151+
* - LLC 0: cpu0..cpu7
3152+
* - LLC 1: cpu8..cpu15 [offline]
3153+
*
3154+
* NUMA 1:
3155+
* - LLC 0: cpu16..cpu23
3156+
* - LLC 1: cpu24..cpu31
3157+
*
3158+
* In this case, if we only check the first online CPU (cpu0), we might
3159+
* incorrectly assume that the LLC and NUMA domains are fully
3160+
* overlapping, which is incorrect (as NUMA 1 has two distinct LLC
3161+
* domains).
3162+
*/
3163+
for_each_online_cpu(cpu) {
3164+
const struct cpumask *numa_cpus;
3165+
struct sched_domain *sd;
3166+
3167+
sd = rcu_dereference(per_cpu(sd_llc, cpu));
3168+
if (!sd)
3169+
return true;
3170+
3171+
numa_cpus = cpumask_of_node(cpu_to_node(cpu));
3172+
if (sd->span_weight != cpumask_weight(numa_cpus))
3173+
return true;
3174+
}
3175+
3176+
return false;
3177+
}
3178+
31323179
/*
31333180
* Initialize topology-aware scheduling.
31343181
*
31353182
* Detect if the system has multiple LLC or multiple NUMA domains and enable
31363183
* cache-aware / NUMA-aware scheduling optimizations in the default CPU idle
31373184
* selection policy.
3185+
*
3186+
* Assumption: the kernel's internal topology representation assumes that each
3187+
* CPU belongs to a single LLC domain, and that each LLC domain is entirely
3188+
* contained within a single NUMA node.
31383189
*/
31393190
static void update_selcpu_topology(void)
31403191
{
@@ -3144,26 +3195,34 @@ static void update_selcpu_topology(void)
31443195
s32 cpu = cpumask_first(cpu_online_mask);
31453196

31463197
/*
3147-
* We only need to check the NUMA node and LLC domain of the first
3148-
* available CPU to determine if they cover all CPUs.
3198+
* Enable LLC domain optimization only when there are multiple LLC
3199+
* domains among the online CPUs. If all online CPUs are part of a
3200+
* single LLC domain, the idle CPU selection logic can choose any
3201+
* online CPU without bias.
31493202
*
3150-
* If all CPUs belong to the same NUMA node or share the same LLC
3151-
* domain, enabling NUMA or LLC optimizations is unnecessary.
3152-
* Otherwise, these optimizations can be enabled.
3203+
* Note that it is sufficient to check the LLC domain of the first
3204+
* online CPU to determine whether a single LLC domain includes all
3205+
* CPUs.
31533206
*/
31543207
rcu_read_lock();
31553208
sd = rcu_dereference(per_cpu(sd_llc, cpu));
31563209
if (sd) {
3157-
cpus = sched_domain_span(sd);
3158-
if (cpumask_weight(cpus) < num_possible_cpus())
3210+
if (sd->span_weight < num_online_cpus())
31593211
enable_llc = true;
31603212
}
3161-
sd = highest_flag_domain(cpu, SD_NUMA);
3162-
if (sd) {
3163-
cpus = sched_group_span(sd->groups);
3164-
if (cpumask_weight(cpus) < num_possible_cpus())
3165-
enable_numa = true;
3166-
}
3213+
3214+
/*
3215+
* Enable NUMA optimization only when there are multiple NUMA domains
3216+
* among the online CPUs and the NUMA domains don't perfectly overlaps
3217+
* with the LLC domains.
3218+
*
3219+
* If all CPUs belong to the same NUMA node and the same LLC domain,
3220+
* enabling both NUMA and LLC optimizations is unnecessary, as checking
3221+
* for an idle CPU in the same domain twice is redundant.
3222+
*/
3223+
cpus = cpumask_of_node(cpu_to_node(cpu));
3224+
if ((cpumask_weight(cpus) < num_online_cpus()) & llc_numa_mismatch())
3225+
enable_numa = true;
31673226
rcu_read_unlock();
31683227

31693228
pr_debug("sched_ext: LLC idle selection %s\n",

0 commit comments

Comments
 (0)