Skip to content

Commit 9709eb0

Browse files
Libo Chenakpm00
authored andcommitted
sched/numa: fix task swap by skipping kernel threads
Patch series "sched/numa: add statistics of numa balance task migration", v6. Introduce task migration and swap statistics in the following places: /sys/fs/cgroup/{GROUP}/memory.stat /proc/{PID}/sched /proc/vmstat These statistics facilitate a rapid evaluation of the performance and resource utilization of the target workload. This patch (of 2): Task swapping is triggered when there are no idle CPUs in task A's preferred node. In this case, the NUMA load balancer chooses a task B on A's preferred node and swaps B with A. This helps improve NUMA locality without introducing load imbalance between nodes. In the current implementation, B's NUMA node preference is not mandatory. That is to say, a kernel thread might be incorrectly chosen as B. However, kernel thread and user space thread that does not have mm are not supposed to be covered by NUMA balancing because NUMA balancing only considers user pages via VMAs. According to Peter's suggestion for fixing this issue, we use PF_KTHREAD to skip the kernel thread. curr->mm is also checked because it is possible that user_mode_thread() might create a user thread without an mm. As per Prateek's analysis, after adding the PF_KTHREAD check, there is no need to further check the PF_IDLE flag: : - play_idle_precise() already ensures PF_KTHREAD is set before adding : PF_IDLE : : - cpu_startup_entry() is only called from the startup thread which : should be marked with PF_KTHREAD (based on my understanding looking at : commit cff9b23 ("kernel/sched: Modify initial boot task idle : setup")) In summary, the check in task_numa_compare() now aligns with task_tick_numa(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/43d68b356b25d124f0d222ebedf3859e86eefb9f.1748493462.git.yu.c.chen@intel.com Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/eaacc9c9bd37bac92d43a671867d85b2fdad3b06.1748002400.git.yu.c.chen@intel.com Signed-off-by: Chen Yu <[email protected]> Signed-off-by: Libo Chen <[email protected]> Suggested-by: Michal Koutný <[email protected]> Tested-by: Ayush Jain <[email protected]> Tested-by: Venkat Rao Bagalkote <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Aubrey Li <[email protected]> Cc: "Chen, Tim C" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Madadi Vineeth Reddy <[email protected]> Cc: Mel Gorman <mgorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 83da212 commit 9709eb0

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

kernel/sched/fair.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2273,7 +2273,8 @@ static bool task_numa_compare(struct task_numa_env *env,
22732273

22742274
rcu_read_lock();
22752275
cur = rcu_dereference(dst_rq->curr);
2276-
if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur)))
2276+
if (cur && ((cur->flags & (PF_EXITING | PF_KTHREAD)) ||
2277+
!cur->mm))
22772278
cur = NULL;
22782279

22792280
/*

0 commit comments

Comments
 (0)