Skip to content

Commit 6e5a0c3

Browse files
committed
Merge tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: - Add cpufreq pressure feedback for the scheduler - Rework misfit load-balancing wrt affinity restrictions - Clean up and simplify the code around ::overutilized and ::overload access. - Simplify sched_balance_newidle() - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES handling that changed the output. - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch() - Reorganize, clean up and unify most of the higher level scheduler balancing function names around the sched_balance_*() prefix - Simplify the balancing flag code (sched_balance_running) - Miscellaneous cleanups & fixes * tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) sched/pelt: Remove shift of thermal clock sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure() thermal/cpufreq: Remove arch_update_thermal_pressure() sched/cpufreq: Take cpufreq feedback into account cpufreq: Add a cpufreq pressure feedback for the scheduler sched/fair: Fix update of rd->sg_overutilized sched/vtime: Do not include <asm/vtime.h> header s390/irq,nmi: Include <asm/vtime.h> header directly s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover sched/vtime: Get rid of generic vtime_task_switch() implementation sched/vtime: Remove confusing arch_vtime_task_switch() declaration sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized() sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded() sched/fair: Rename root_domain::overload to ::overloaded sched/fair: Use helper functions to access root_domain::overload sched/fair: Check root_domain::overload value before update sched/fair: Combine EAS check with root_domain::overutilized access sched/fair: Simplify the continue_balancing logic in sched_balance_newidle() ...
2 parents 17ca7fc + 97450eb commit 6e5a0c3

File tree

42 files changed

+550
-441
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+550
-441
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5826,6 +5826,7 @@
58265826
but is useful for debugging and performance tuning.
58275827

58285828
sched_thermal_decay_shift=
5829+
[Deprecated]
58295830
[KNL, SMP] Set a decay shift for scheduler thermal
58305831
pressure signal. Thermal pressure signal follows the
58315832
default decay period of other scheduler pelt

Documentation/scheduler/sched-domains.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,21 +31,21 @@ is treated as one entity. The load of a group is defined as the sum of the
3131
load of each of its member CPUs, and only when the load of a group becomes
3232
out of balance are tasks moved between groups.
3333

34-
In kernel/sched/core.c, trigger_load_balance() is run periodically on each CPU
35-
through scheduler_tick(). It raises a softirq after the next regularly scheduled
34+
In kernel/sched/core.c, sched_balance_trigger() is run periodically on each CPU
35+
through sched_tick(). It raises a softirq after the next regularly scheduled
3636
rebalancing event for the current runqueue has arrived. The actual load
37-
balancing workhorse, run_rebalance_domains()->rebalance_domains(), is then run
37+
balancing workhorse, sched_balance_softirq()->sched_balance_domains(), is then run
3838
in softirq context (SCHED_SOFTIRQ).
3939

4040
The latter function takes two arguments: the runqueue of current CPU and whether
41-
the CPU was idle at the time the scheduler_tick() happened and iterates over all
41+
the CPU was idle at the time the sched_tick() happened and iterates over all
4242
sched domains our CPU is on, starting from its base domain and going up the ->parent
4343
chain. While doing that, it checks to see if the current domain has exhausted its
44-
rebalance interval. If so, it runs load_balance() on that domain. It then checks
44+
rebalance interval. If so, it runs sched_balance_rq() on that domain. It then checks
4545
the parent sched_domain (if it exists), and the parent of the parent and so
4646
forth.
4747

48-
Initially, load_balance() finds the busiest group in the current sched domain.
48+
Initially, sched_balance_rq() finds the busiest group in the current sched domain.
4949
If it succeeds, it looks for the busiest runqueue of all the CPUs' runqueues in
5050
that group. If it manages to find such a runqueue, it locks both our initial
5151
CPU's runqueue and the newly found busiest one and starts moving tasks from it

Documentation/scheduler/sched-stats.rst

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22
Scheduler Statistics
33
====================
44

5+
Version 16 of schedstats changed the order of definitions within
6+
'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
7+
columns in show_schedstat(). In particular the position of CPU_IDLE
8+
and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
9+
510
Version 15 of schedstats dropped counters for some sched_yield:
611
yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
712
identical to version 14.
@@ -72,53 +77,53 @@ domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
7277

7378
The first field is a bit mask indicating what cpus this domain operates over.
7479

75-
The next 24 are a variety of load_balance() statistics in grouped into types
80+
The next 24 are a variety of sched_balance_rq() statistics in grouped into types
7681
of idleness (idle, busy, and newly idle):
7782

78-
1) # of times in this domain load_balance() was called when the
83+
1) # of times in this domain sched_balance_rq() was called when the
7984
cpu was idle
80-
2) # of times in this domain load_balance() checked but found
85+
2) # of times in this domain sched_balance_rq() checked but found
8186
the load did not require balancing when the cpu was idle
82-
3) # of times in this domain load_balance() tried to move one or
87+
3) # of times in this domain sched_balance_rq() tried to move one or
8388
more tasks and failed, when the cpu was idle
8489
4) sum of imbalances discovered (if any) with each call to
85-
load_balance() in this domain when the cpu was idle
90+
sched_balance_rq() in this domain when the cpu was idle
8691
5) # of times in this domain pull_task() was called when the cpu
8792
was idle
8893
6) # of times in this domain pull_task() was called even though
8994
the target task was cache-hot when idle
90-
7) # of times in this domain load_balance() was called but did
95+
7) # of times in this domain sched_balance_rq() was called but did
9196
not find a busier queue while the cpu was idle
9297
8) # of times in this domain a busier queue was found while the
9398
cpu was idle but no busier group was found
94-
9) # of times in this domain load_balance() was called when the
99+
9) # of times in this domain sched_balance_rq() was called when the
95100
cpu was busy
96-
10) # of times in this domain load_balance() checked but found the
101+
10) # of times in this domain sched_balance_rq() checked but found the
97102
load did not require balancing when busy
98-
11) # of times in this domain load_balance() tried to move one or
103+
11) # of times in this domain sched_balance_rq() tried to move one or
99104
more tasks and failed, when the cpu was busy
100105
12) sum of imbalances discovered (if any) with each call to
101-
load_balance() in this domain when the cpu was busy
106+
sched_balance_rq() in this domain when the cpu was busy
102107
13) # of times in this domain pull_task() was called when busy
103108
14) # of times in this domain pull_task() was called even though the
104109
target task was cache-hot when busy
105-
15) # of times in this domain load_balance() was called but did not
110+
15) # of times in this domain sched_balance_rq() was called but did not
106111
find a busier queue while the cpu was busy
107112
16) # of times in this domain a busier queue was found while the cpu
108113
was busy but no busier group was found
109114

110-
17) # of times in this domain load_balance() was called when the
115+
17) # of times in this domain sched_balance_rq() was called when the
111116
cpu was just becoming idle
112-
18) # of times in this domain load_balance() checked but found the
117+
18) # of times in this domain sched_balance_rq() checked but found the
113118
load did not require balancing when the cpu was just becoming idle
114-
19) # of times in this domain load_balance() tried to move one or more
119+
19) # of times in this domain sched_balance_rq() tried to move one or more
115120
tasks and failed, when the cpu was just becoming idle
116121
20) sum of imbalances discovered (if any) with each call to
117-
load_balance() in this domain when the cpu was just becoming idle
122+
sched_balance_rq() in this domain when the cpu was just becoming idle
118123
21) # of times in this domain pull_task() was called when newly idle
119124
22) # of times in this domain pull_task() was called even though the
120125
target task was cache-hot when just becoming idle
121-
23) # of times in this domain load_balance() was called but did not
126+
23) # of times in this domain sched_balance_rq() was called but did not
122127
find a busier queue while the cpu was just becoming idle
123128
24) # of times in this domain a busier queue was found while the cpu
124129
was just becoming idle but no busier group was found

Documentation/translations/zh_CN/scheduler/sched-domains.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,17 @@ CPU共享。任意两个组的CPU掩码的交集不一定为空,如果是这
3434
调度域中的负载均衡发生在调度组中。也就是说,每个组被视为一个实体。组的负载被定义为它
3535
管辖的每个CPU的负载之和。仅当组的负载不均衡后,任务才在组之间发生迁移。
3636

37-
在kernel/sched/core.c中,trigger_load_balance()在每个CPU上通过scheduler_tick()
37+
在kernel/sched/core.c中,sched_balance_trigger()在每个CPU上通过sched_tick()
3838
周期执行。在当前运行队列下一个定期调度再平衡事件到达后,它引发一个软中断。负载均衡真正
39-
的工作由run_rebalance_domains()->rebalance_domains()完成,在软中断上下文中执行
39+
的工作由sched_balance_softirq()->sched_balance_domains()完成,在软中断上下文中执行
4040
(SCHED_SOFTIRQ)。
4141

42-
后一个函数有两个入参:当前CPU的运行队列、它在scheduler_tick()调用时是否空闲。函数会从
42+
后一个函数有两个入参:当前CPU的运行队列、它在sched_tick()调用时是否空闲。函数会从
4343
当前CPU所在的基调度域开始迭代执行,并沿着parent指针链向上进入更高层级的调度域。在迭代
4444
过程中,函数会检查当前调度域是否已经耗尽了再平衡的时间间隔,如果是,它在该调度域运行
45-
load_balance()。接下来它检查父调度域(如果存在),再后来父调度域的父调度域,以此类推。
45+
sched_balance_rq()。接下来它检查父调度域(如果存在),再后来父调度域的父调度域,以此类推。
4646

47-
起初,load_balance()查找当前调度域中最繁忙的调度组。如果成功,在该调度组管辖的全部CPU
47+
起初,sched_balance_rq()查找当前调度域中最繁忙的调度组。如果成功,在该调度组管辖的全部CPU
4848
的运行队列中找出最繁忙的运行队列。如能找到,对当前的CPU运行队列和新找到的最繁忙运行
4949
队列均加锁,并把任务从最繁忙队列中迁移到当前CPU上。被迁移的任务数量等于在先前迭代执行
5050
中计算出的该调度域的调度组的不均衡值。

Documentation/translations/zh_CN/scheduler/sched-stats.rst

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -75,42 +75,42 @@ domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
7575
繁忙,新空闲):
7676

7777

78-
1) 当CPU空闲时,load_balance()在这个调度域中被调用了#次
79-
2) 当CPU空闲时,load_balance()在这个调度域中被调用,但是发现负载无需
78+
1) 当CPU空闲时,sched_balance_rq()在这个调度域中被调用了#次
79+
2) 当CPU空闲时,sched_balance_rq()在这个调度域中被调用,但是发现负载无需
8080
均衡#次
81-
3) 当CPU空闲时,load_balance()在这个调度域中被调用,试图迁移1个或更多
81+
3) 当CPU空闲时,sched_balance_rq()在这个调度域中被调用,试图迁移1个或更多
8282
任务且失败了#次
83-
4) 当CPU空闲时,load_balance()在这个调度域中被调用,发现不均衡(如果有)
83+
4) 当CPU空闲时,sched_balance_rq()在这个调度域中被调用,发现不均衡(如果有)
8484
#次
8585
5) 当CPU空闲时,pull_task()在这个调度域中被调用#次
8686
6) 当CPU空闲时,尽管目标任务是热缓存状态,pull_task()依然被调用#次
87-
7) 当CPU空闲时,load_balance()在这个调度域中被调用,未能找到更繁忙的
87+
7) 当CPU空闲时,sched_balance_rq()在这个调度域中被调用,未能找到更繁忙的
8888
队列#次
8989
8) 当CPU空闲时,在调度域中找到了更繁忙的队列,但未找到更繁忙的调度组
9090
#次
91-
9) 当CPU繁忙时,load_balance()在这个调度域中被调用了#次
92-
10) 当CPU繁忙时,load_balance()在这个调度域中被调用,但是发现负载无需
91+
9) 当CPU繁忙时,sched_balance_rq()在这个调度域中被调用了#次
92+
10) 当CPU繁忙时,sched_balance_rq()在这个调度域中被调用,但是发现负载无需
9393
均衡#次
94-
11) 当CPU繁忙时,load_balance()在这个调度域中被调用,试图迁移1个或更多
94+
11) 当CPU繁忙时,sched_balance_rq()在这个调度域中被调用,试图迁移1个或更多
9595
任务且失败了#次
96-
12) 当CPU繁忙时,load_balance()在这个调度域中被调用,发现不均衡(如果有)
96+
12) 当CPU繁忙时,sched_balance_rq()在这个调度域中被调用,发现不均衡(如果有)
9797
#次
9898
13) 当CPU繁忙时,pull_task()在这个调度域中被调用#次
9999
14) 当CPU繁忙时,尽管目标任务是热缓存状态,pull_task()依然被调用#次
100-
15) 当CPU繁忙时,load_balance()在这个调度域中被调用,未能找到更繁忙的
100+
15) 当CPU繁忙时,sched_balance_rq()在这个调度域中被调用,未能找到更繁忙的
101101
队列#次
102102
16) 当CPU繁忙时,在调度域中找到了更繁忙的队列,但未找到更繁忙的调度组
103103
#次
104-
17) 当CPU新空闲时,load_balance()在这个调度域中被调用了#次
105-
18) 当CPU新空闲时,load_balance()在这个调度域中被调用,但是发现负载无需
104+
17) 当CPU新空闲时,sched_balance_rq()在这个调度域中被调用了#次
105+
18) 当CPU新空闲时,sched_balance_rq()在这个调度域中被调用,但是发现负载无需
106106
均衡#次
107-
19) 当CPU新空闲时,load_balance()在这个调度域中被调用,试图迁移1个或更多
107+
19) 当CPU新空闲时,sched_balance_rq()在这个调度域中被调用,试图迁移1个或更多
108108
任务且失败了#次
109-
20) 当CPU新空闲时,load_balance()在这个调度域中被调用,发现不均衡(如果有)
109+
20) 当CPU新空闲时,sched_balance_rq()在这个调度域中被调用,发现不均衡(如果有)
110110
#次
111111
21) 当CPU新空闲时,pull_task()在这个调度域中被调用#次
112112
22) 当CPU新空闲时,尽管目标任务是热缓存状态,pull_task()依然被调用#次
113-
23) 当CPU新空闲时,load_balance()在这个调度域中被调用,未能找到更繁忙的
113+
23) 当CPU新空闲时,sched_balance_rq()在这个调度域中被调用,未能找到更繁忙的
114114
队列#次
115115
24) 当CPU新空闲时,在调度域中找到了更繁忙的队列,但未找到更繁忙的调度组
116116
#次

arch/arm/include/asm/topology.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@
2222
/* Enable topology flag updates */
2323
#define arch_update_cpu_topology topology_update_cpu_topology
2424

25-
/* Replace task scheduler's default thermal pressure API */
26-
#define arch_scale_thermal_pressure topology_get_thermal_pressure
27-
#define arch_update_thermal_pressure topology_update_thermal_pressure
25+
/* Replace task scheduler's default HW pressure API */
26+
#define arch_scale_hw_pressure topology_get_hw_pressure
27+
#define arch_update_hw_pressure topology_update_hw_pressure
2828

2929
#else
3030

arch/arm/kernel/topology.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
* can take this difference into account during load balance. A per cpu
4343
* structure is preferred because each CPU updates its own cpu_capacity field
4444
* during the load balance except for idle cores. One idle core is selected
45-
* to run the rebalance_domains for all idle cores and the cpu_capacity can be
45+
* to run the sched_balance_domains for all idle cores and the cpu_capacity can be
4646
* updated during this sequence.
4747
*/
4848

arch/arm64/include/asm/topology.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ void update_freq_counters_refs(void);
3535
/* Enable topology flag updates */
3636
#define arch_update_cpu_topology topology_update_cpu_topology
3737

38-
/* Replace task scheduler's default thermal pressure API */
39-
#define arch_scale_thermal_pressure topology_get_thermal_pressure
40-
#define arch_update_thermal_pressure topology_update_thermal_pressure
38+
/* Replace task scheduler's default HW pressure API */
39+
#define arch_scale_hw_pressure topology_get_hw_pressure
40+
#define arch_update_hw_pressure topology_update_hw_pressure
4141

4242
#include <asm-generic/topology.h>
4343

arch/powerpc/include/asm/Kbuild

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,4 @@ generic-y += agp.h
66
generic-y += kvm_types.h
77
generic-y += mcs_spinlock.h
88
generic-y += qrwlock.h
9-
generic-y += vtime.h
109
generic-y += early_ioremap.h

arch/powerpc/include/asm/cputime.h

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -32,23 +32,10 @@
3232
#ifdef CONFIG_PPC64
3333
#define get_accounting(tsk) (&get_paca()->accounting)
3434
#define raw_get_accounting(tsk) (&local_paca->accounting)
35-
static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
3635

3736
#else
3837
#define get_accounting(tsk) (&task_thread_info(tsk)->accounting)
3938
#define raw_get_accounting(tsk) get_accounting(tsk)
40-
/*
41-
* Called from the context switch with interrupts disabled, to charge all
42-
* accumulated times to the current process, and to prepare accounting on
43-
* the next process.
44-
*/
45-
static inline void arch_vtime_task_switch(struct task_struct *prev)
46-
{
47-
struct cpu_accounting_data *acct = get_accounting(current);
48-
struct cpu_accounting_data *acct0 = get_accounting(prev);
49-
50-
acct->starttime = acct0->starttime;
51-
}
5239
#endif
5340

5441
/*

0 commit comments

Comments
 (0)