Skip to content

Commit ee9a4e9

Browse files
arighihtejun
authored andcommitted
sched_ext: idle: Properly handle invalid prev_cpu during idle selection
The default idle selection policy doesn't properly handle the case where @prev_cpu is not part of the task's allowed CPUs. In this situation, it may return an idle CPU that is not usable by the task, breaking the assumption that the returned CPU must always be within the allowed cpumask, causing inefficiencies or even stalls in certain cases. This issue can arise in the following cases: - The task's affinity may have changed by the time the function is invoked, especially now that the idle selection logic can be used from multiple contexts (i.e., BPF test_run call). - The BPF scheduler may provide a @prev_cpu that is not part of the allowed mask, either unintentionally or as a placement hint. In fact @prev_cpu may not necessarily refer to the CPU the task last ran on, but it can also be considered as a target CPU that the scheduler wishes to use for the task. Therefore, enforce the right behavior by always checking whether @prev_cpu is in the allowed mask, when using scx_bpf_select_cpu_and(), and it's also usable by the task (@p->cpus_ptr). If it is not, try to find a valid CPU nearby @prev_cpu, following the usual locality-aware fallback path (SMT, LLC, node, allowed CPUs). This ensures the returned CPU is always allowed, improving robustness to affinity changes and invalid scheduler hints, while preserving locality as much as possible. Fixes: a730e3f ("sched_ext: idle: Consolidate default idle CPU selection kfuncs") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
1 parent 0f70f5b commit ee9a4e9

File tree

1 file changed

+11
-18
lines changed

1 file changed

+11
-18
lines changed

kernel/sched/ext_idle.c

Lines changed: 11 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -447,10 +447,17 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
447447
const struct cpumask *llc_cpus = NULL, *numa_cpus = NULL;
448448
const struct cpumask *allowed = cpus_allowed ?: p->cpus_ptr;
449449
int node = scx_cpu_node_if_enabled(prev_cpu);
450+
bool is_prev_allowed;
450451
s32 cpu;
451452

452453
preempt_disable();
453454

455+
/*
456+
* Check whether @prev_cpu is still within the allowed set. If not,
457+
* we can still try selecting a nearby CPU.
458+
*/
459+
is_prev_allowed = cpumask_test_cpu(prev_cpu, allowed);
460+
454461
/*
455462
* Determine the subset of CPUs usable by @p within @cpus_allowed.
456463
*/
@@ -465,21 +472,6 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
465472
cpu = -EBUSY;
466473
goto out_enable;
467474
}
468-
469-
/*
470-
* If @prev_cpu is not in the allowed CPUs, skip topology
471-
* optimizations and try to pick any idle CPU usable by the
472-
* task.
473-
*
474-
* If %SCX_OPS_BUILTIN_IDLE_PER_NODE is enabled, prioritize
475-
* the current node, as it may optimize some waker->wakee
476-
* workloads.
477-
*/
478-
if (!cpumask_test_cpu(prev_cpu, allowed)) {
479-
node = scx_cpu_node_if_enabled(smp_processor_id());
480-
cpu = scx_pick_idle_cpu(allowed, node, flags);
481-
goto out_enable;
482-
}
483475
}
484476

485477
/*
@@ -525,7 +517,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
525517
* then avoid a migration.
526518
*/
527519
cpu = smp_processor_id();
528-
if (cpus_share_cache(cpu, prev_cpu) &&
520+
if (is_prev_allowed && cpus_share_cache(cpu, prev_cpu) &&
529521
scx_idle_test_and_clear_cpu(prev_cpu)) {
530522
cpu = prev_cpu;
531523
goto out_unlock;
@@ -562,7 +554,8 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
562554
/*
563555
* Keep using @prev_cpu if it's part of a fully idle core.
564556
*/
565-
if (cpumask_test_cpu(prev_cpu, idle_cpumask(node)->smt) &&
557+
if (is_prev_allowed &&
558+
cpumask_test_cpu(prev_cpu, idle_cpumask(node)->smt) &&
566559
scx_idle_test_and_clear_cpu(prev_cpu)) {
567560
cpu = prev_cpu;
568561
goto out_unlock;
@@ -611,7 +604,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
611604
/*
612605
* Use @prev_cpu if it's idle.
613606
*/
614-
if (scx_idle_test_and_clear_cpu(prev_cpu)) {
607+
if (is_prev_allowed && scx_idle_test_and_clear_cpu(prev_cpu)) {
615608
cpu = prev_cpu;
616609
goto out_unlock;
617610
}

0 commit comments

Comments
 (0)